Exploratory data analysis in Scheme

When I started learning Scheme (R6RS), I took the common approach of learning a new language by implementing features from familiar languages (namely R). That approach sent me down the path of writing the chez-stats and dataframe libraries and porting gnuplot-pipe from Chicken to Chez Scheme. Those three libraries now allow me to conduct simple exploratory data analysis (EDA) in Scheme that should feel relatively familiar to R programmers. In this post, I will work through a simple example, which mostly serves to reinforce how much better suited R is for these types of tasks.

Getting started with Akku package manager for Scheme

Akku is a package manager for Scheme that currently supports numerous R6RS and R7RS Scheme implementations [1]. I was slow to embrace Akku because I encountered some initial friction with installation and setup. Moreover, coming from R, I was more familiar with a global package management model than Akku's project-based workflow. In the meantime, I was content to manually manage the few libraries that I had downloaded from different repos and placed in a directory found by Chez's (library-directories).

ASCII progress bar in Chez Scheme

As an impatient person, I typically use progress bars for any code that takes more than a few minutes to run. In a previous post, I wrote about creating ASCII progress bars in R and Racket. The Racket version depended on the raart module, which "provides an algebraic model of ASCII that can be used for art, user interfaces, and diagrams." Because I'm not aware of any such library for Chez Scheme [1], I was left feeling stuck.

A dataframe record type for Scheme

As an exercise in my Scheme (R6RS) learning journey, I have implemented a dataframe record type and procedures to work with the dataframe record type. Dataframes are column-oriented, tabular data structures useful for data analysis found in several languages including R, Python, Julia, and Go. In this post, I will introduce the dataframe record type and basic procedures for working with dataframes. In subsequent posts, I will describe other dataframe procedures, e.g., filter, sort, aggregate, etc.