This post is the second in a series on the dataframe library for Scheme (R6RS). In this post, I will contrast the dataframe library with functions from the dplyr R package for selecting, dropping, and renaming columns.
Set up
First, let’s create a very simple dataframe in both languages.
df <- data.frame("a" = 1:3, "b" = 4:6, "c" = 7:9)(define df (make-df* (a 1 2 3) (b 4 5 6) (c 7 8 9)))Select
With dplyr::select, we can select and re-order columns in a single statement using bare column names.
> dplyr::select(df, c, a)
c a
1 7 1
2 8 2
3 9 3With dataframe-select*, we can also select and re-order columns in a single statement using bare column names.
> (dataframe-display (dataframe-select* df c a))
dim: 3 rows x 2 cols
c a
<num> <num>
7. 1.
8. 2.
9. 3. Drop
dplyr::select also allows for dropping columns by prefixing column names with -.
> dplyr::select(df, -b)
a c
1 1 7
2 2 8
3 3 9In dataframe, dropping columns requires a separate procedure, dataframe-drop*.
> (dataframe-display (dataframe-drop* df b))
dim: 3 rows x 2 cols
a c
<num> <num>
1. 7.
2. 8.
3. 9. Rename
With dplyr::select, columns can be renamed during selection, but dplyr::rename allows for renaming without selection.
> dplyr::select(df, Bee = b, c)
Bee c
1 4 7
2 5 8
3 6 9
> dplyr::rename(df, Bee = b, Sea = c)
a Bee Sea
1 1 4 7
2 2 5 8
3 3 6 9dataframe-select* does not allow for renaming during selection, but dataframe-rename works similarly to dplyr::rename. However, in the absence of the = syntax (where I think it is intuitive for the new name to be on the left), I decided that it was more natural to write (old-name new-name).
> (dataframe-display (dataframe-rename* df (b Bee) (c Sea)))
dim: 3 rows x 3 cols
a Bee Sea
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9. When renaming all of the columns, dataframe-rename-all allows for specifying all new names as a list rather than (old-name new-name) pairs.
> (dataframe-display (dataframe-rename-all df '(A B C)))
dim: 3 rows x 3 cols
A B C
<num> <num> <num>
1. 4. 7.
2. 5. 8.
3. 6. 9. Final thoughts
I haven’t included any code showing how the procedures from the dataframe library are implemented becuse they are so simple. They are simple because they don’t do much. For example, dplyr::select includes functionality that requires three procedures: dataframe-select, dataframe-drop, and dataframe-rename. However, with simple Scheme code, I was able to implement procedures that cover cases representing 90% of my usage of dplyr::select and dplyr::rename.