Recently, I switched from learning Racket to Chez Scheme. I wanted to try to repeat some of my previous Racket exercises in Chez Scheme, but quickly ran into a barrier when my first choice required drawing random variates from a normal distribution. I looked for existing Chez Scheme libraries but came up empty. I considered SRFI 27: Sources of Random Bits, which includes example code for generating random numbers from a normal distribution, and reached out for guidance. Ultimately, I decided that it would be a good exercise to write a library for generating random variates from different distributions. As I started to write the random variate procedures, I realized that I minimally needed procedures for calculating mean and variance to test the output of the random variate procedures. And, thus, the scope of the library started to expand and the
chez-stats library was born.
chez-stats has been a great learning experience. Even though I have lots of applied statistics experience, I had no idea that accurately calculating sample variance is challenging or that there are nine algorithms to choose from when using
quantile in R1 (and now also in
When I was choosing a new programming language to learn, I did not make the size of the package ecosystem a key consideration, but, in retrospect, I think it was clearly a factor in choosing Racket. And, as I started to switch my attention to Chez Scheme, I had several moments where I almost went running back to Racket when faced with the lack of third-party libraries for Chez Scheme. I’m not keen on reinventing the wheel, but I previously underestimated how valuable that experience could be.2 As I gain experience with Chez Scheme, it will be interesting to see if I continue to value writing my own libraries3 or start to lament the lack of libraries.
When I started working on
chez-stats, I made a couple of decisions to simplify my efforts. For one, I made it specific to Chez Scheme rather than try to write portable scheme code. Second, I stuck to the list as the primary data structure. It would be nice to have the flexibility to also work with vectors, but I am kicking that can down the road.
I’m still not sure if I chose a sensible structure for the files and procedures. I separated
random-variates into two separate libraries4 that need to be imported separately.
(import (chez-stats statistics) (chez-stats random-variates))
I also wrote a bunch of assertion procedures for checking inputs and put them in another library,
(chez-stats assertions). It feels weird to expose the assertion procedures as a library, and, honestly, I can’t remember why I chose to do that over just loading them from a file. If I continue to write libraries for Chez Scheme, though, it will probably make sense to pull
assertions out of
chez-stats into a standalone library.
When writing the procedures in
chez-stats, I primarily consulted R source code for
statistics and slides by Raj Jain for
random-variates. I used SRFI 64: A Scheme API for Test Suites to write a test suite for
chez-stats. In over ten years of writing R code, I have never written any tests. It’s definitely more tedious and less fun than writing the core
chez-stats procedures, but I find it very satisfying to see the test suite run without any failures. I used markdown to write some basic documentation in the README hosted on the GitHub repo. By far, writing documentation is my least favorite part of this whole process.
As of right now, I have no plans to expand the functionality of
chez-stats. My next steps will be to try to put
chez-stats to work for me and, in the process, identify friction points and missing features.