Select, drop, and rename dataframe columns in Chez Scheme

This post is the second in a series on the dataframe library for Chez Scheme. In this post, I will contrast the dataframe library with functions from the dplyr R package for selecting, dropping, and renaming columns.

Set up

First, let’s create a very simple dataframe in both languages.

df <- data.frame("a" = 1:3, "b" = 4:6, "c" = 7:9)

(define df (make-dataframe '((a 1 2 3) (b 4 5 6) (c 7 8 9))))

Select

With dplyr::select, we can select and re-order columns in a single statement using bare column names.

> dplyr::select(df, c, a)
  c a
1 7 1
2 8 2
3 9 3

With dataframe-select, we can also select and re-order columns in a single statement using symbols for column names.

> (dataframe-display (dataframe-select df 'c 'a))
         c         a
         7         1
         8         2
         9         3

Drop

dplyr::select also allows for dropping columns by prefixing column names with -.

> dplyr::select(df, -b)
  a c
1 1 7
2 2 8
3 3 9

In dataframe, dropping columns requires a separate procedure, dataframe-drop.

> (dataframe-display (dataframe-drop df 'b))
         a         c
         1         7
         2         8
         3         9

Rename

With dplyr::select, columns can be renamed during selection, but dplyr::rename allows for renaming without selection.

> dplyr::select(df, Bee = b, c)
  Bee c
1   4 7
2   5 8
3   6 9

> dplyr::rename(df, Bee = b, Sea = c)
  a Bee Sea
1 1   4   7
2 2   5   8
3 3   6   9

dataframe-select does not allow for renaming during selection, but dataframe-rename works similarly to dplyr::rename. However, in the absence of the = syntax (where I think it is intutive for the new name to be on the left), I decided that it was more natural to write '((old-name new-name)).

> (dataframe-display (dataframe-rename df '((b Bee) (c Sea))))
         a       Bee       Sea
         1         4         7
         2         5         8
         3         6         9

When renaming all of the columns, dataframe-rename-all allows for specifying all new names as a list rather than (old-name new-name) pairs.

> (dataframe-display (dataframe-rename-all df '(A B C)))
         A         B         C
         1         4         7
         2         5         8
         3         6         9

Final thoughts

I haven’t included any code showing how the procedures from the dataframe library are implemented becuse they are so simple. They are simple because they don’t do much. For example, dplyr::select includes functionality that requires three procedures: dataframe-select, dataframe-drop, and dataframe-rename. However, with simple Scheme code, I was able to implement procedures that cover cases representing 90% of my usage of dplyr::select and dplyr::rename.

In the next post, I will show how to split, bind, and append dataframes.