In learning about reading CSV files in Racket, I have started to reconsider whether storing small(ish) datasets in CSV files is the best default behavior [1]. My default was set by primarily working in R, where reading and writing CSV files plays a central role in data analysis. When working solely in R, I expect that my old habits will die hard and CSV files will continue to play a prominent role. However, when passing small(ish) data between R and Racket, I think JSON might be a better alternative [2].

Before we jump into a bunch of code examples, let's define some wrapper functions to make the subsequent code a bit more concise. In R, we are using the jsonlite package and a little function to write the JSON object to file.

library(jsonlite)

write_json <- function(object, filename){
  writeLines(toJSON(object), filename)
}

In Racket, we are using the json module and have simple wrapper functions for both reading and writing JSON. A jsexpr is a subset of Racket values (e.g., list, hash table, etc.) that can be represented as JSON strings.

#lang racket

(require json)

(define (read-json-wrapper filename)
  (call-with-input-file filename read-json))

(define (write-json-wrapper jsexpr filename)
  (call-with-output-file filename
    (λ (x) (write-json jsexpr x))
    #:exists 'replace))

R Data Structures

Data Frame

The data frame is the central data structure in R, which corresponds well to flat, delimited text files (in contrast to JSON). In this code chunk, we use the same example CSV file as in the CSV reading post.

> write_json(head(read.csv("example.csv")), "R_dataframe.json")
> fromJSON("R_dataframe.json")
        date integer         float  bool char      word yn
1 12/25/2060 -958838  590131109036  true    u      zosu  Y
2 02/18/2023 -165090  528052918839  true    1        em  Y
3 11/27/2064 -207296   99397538939 false    w    doptah  Y
4 06/28/2044  316236  590216172038 false    x gutijenel  N
5 08/12/2045  722897 -309360363517  true    7      hueh  N
6 07/23/1904 -903116 -509332808532  true    0   fufcora  Y

Racket reads the data frame as a list of hash tables. Note that keys (e.g., bool, char, date, etc.) in the hash table are listed in alphabetical order. Also, the redundancy of the JSON format is evident for this tabular data via the repetition of the column names.

> (read-json-wrapper "R_dataframe.json")
'(#hasheq((bool . "true")
          (char . "u")
          (date . "12/25/2060")
          (float . 590131109036.032)
          (integer . -958838)
          (word . "zosu")
          (yn . "Y"))
  #hasheq((bool . "true")
          (char . "1")
          (date . "02/18/2023")
          (float . 528052918838.886)
          (integer . -165090)
          (word . "em")
          (yn . "Y"))
  #hasheq((bool . "false")
          (char . "w")
          (date . "11/27/2064")
          (float . 99397538938.88)
          (integer . -207296)
          (word . "doptah")
          (yn . "Y"))
  #hasheq((bool . "false")
          (char . "x")
          (date . "06/28/2044")
          (float . 590216172037.734)
          (integer . 316236)
          (word . "gutijenel")
          (yn . "N"))
  #hasheq((bool . "true")
          (char . "7")
          (date . "08/12/2045")
          (float . -309360363516.723)
          (integer . 722897)
          (word . "hueh")
          (yn . "N"))
  #hasheq((bool . "true")
          (char . "0")
          (date . "07/23/1904")
          (float . -509332808531.968)
          (integer . -903116)
          (word . "fufcora")
          (yn . "Y")))

Vector

It is unlikely that I would want to write a vector to file, but it is straightforward with jsonlite.

> write_json(vector(mode = "numeric", length = 10), "R_vector.json")
> fromJSON("R_vector.json")
 [1] 0 0 0 0 0 0 0 0 0 0

Racket reads the vector as a list.

> (read-json-wrapper "R_vector.json")
'(0 0 0 0 0 0 0 0 0 0)

Matrix/Array

The jsonlite package handles both matrices and arrays.

> write_json(matrix(data = runif(25), nrow = 5, ncol = 5), "R_matrix.json")
> fromJSON("R_matrix.json")
       [,1]   [,2]   [,3]   [,4]   [,5]
[1,] 0.1451 0.3178 0.3240 0.9761 0.2404
[2,] 0.9971 0.1716 0.3767 0.1735 0.7416
[3,] 0.5081 0.7600 0.6993 0.2717 0.1228
[4,] 0.2520 0.5952 0.1561 0.4297 0.4912
[5,] 0.6209 0.8862 0.4749 0.4307 0.2534

Racket reads a matrix as a list of lists. It can also read multidimensional arrays by further nesting of lists.

> (read-json-wrapper "R_matrix.json")
'((0.1451 0.3178 0.324 0.9761 0.2404)
  (0.9971 0.1716 0.3767 0.1735 0.7416)
  (0.5081 0.76 0.6993 0.2717 0.1228)
  (0.252 0.5952 0.1561 0.4297 0.4912)
  (0.6209 0.8862 0.4749 0.4307 0.2534))

List

JSON is a hierarchical data format that is well suited for nested lists.

> write_json(list("A" = list("One" = 1, 
+                            "Two" = 2,
+                            "Three" = 3),
+                 "B" = list("Four" = 4, 
+                            "Five" = 5,
+                            "Six" = 6)),
+            "R_list.json")
> fromJSON("R_list.json")
$A
$A$One
[1] 1

$A$Two
[1] 2

$A$Three
[1] 3


$B
$B$Four
[1] 4

$B$Five
[1] 5

$B$Six
[1] 6

Racket reads the nested list as a nested hash table [3] rather than a list of hash tables as with the data frame. As with reading the data frame above, the keys are listed in alphabetical order.

> (read-json-wrapper "R_list.json")
'#hasheq((A . #hasheq((One . (1)) (Three . (3)) (Two . (2))))
         (B . #hasheq((Five . (5)) (Four . (4)) (Six . (6)))))

Racket Data Structures

List

In this first example, I've just thrown together a bunch of the different Racket values that are valid as jsexpr.

> (write-json-wrapper (list 1 2.1 #t "B" "word" (list 3 4) (hash 'C 5 'D '(6 -7))) "Racket_list.json")
> (read-json-wrapper "Racket_list.json")
'(1 2.1 #t "B" "word" (3 4) #hasheq((C . 5) (D . (6 -7))))

The jsonlite package reads the Racket list as an R list. Note that the boolean value is automatically converted to a logical value in R (TRUE). Also, the keys (C, D) in the hash table are used as names in the nested list.

> fromJSON("Racket_list.json")
[[1]]
[1] 1

[[2]]
[1] 2.1

[[3]]
[1] TRUE

[[4]]
[1] "B"

[[5]]
[1] "word"

[[6]]
[1] 3 4

[[7]]
[[7]]$D
[1]  6 -7

[[7]]$C
[1] 5

Hash Table

If you've read this far, there should be no surprise about how suitable JSON is for a nested hash table.

> (write-json-wrapper (hasheq 'A (hasheq 'C 2 'E 6) 'B (hasheq 'D 4 'F 8)) "Racket_hash.json")
> (read-json-wrapper "Racket_hash.json")
'#hasheq((A . #hasheq((C . 2) (E . 6))) (B . #hasheq((D . 4) (F . 8))))

The nested hash table becomes a nested (and named) list in R.

> fromJSON("Racket_hash.json")
$A
$A$E
[1] 6

$A$C
[1] 2


$B
$B$D
[1] 4

$B$F
[1] 8

Table

This table is actually a list of lists where the first row of the list contains the headers for the columns.

> (write-json-wrapper (list '("A" "B" "C") '(1 2 3) '(4 5 6)) "Racket_table.json")
> (read-json-wrapper "Racket_table.json")
'(("A" "B" "C") (1 2 3) (4 5 6))

jsonlite reads this "table" as a matrix and converts all elements to characters because all elements of an R matrix need to be the same type.

> fromJSON("Racket_table.json")
     [,1] [,2] [,3]
[1,] "A"  "B"  "C" 
[2,] "1"  "2"  "3" 
[3,] "4"  "5"  "6" 

List of Hash Tables

A list of hash tables was how the json module read the JSON file created by jsonlite for a data frame.

> (write-json-wrapper (read-json-wrapper "R_dataframe.json") "Racket_list_hash.json")
> (read-json-wrapper "Racket_list_hash.json")
'(#hasheq((bool . "true")
          (char . "u")
          (date . "12/25/2060")
          (float . 590131109036.032)
          (integer . -958838)
          (word . "zosu")
          (yn . "Y"))
  #hasheq((bool . "true")
          (char . "1")
          (date . "02/18/2023")
          (float . 528052918838.886)
          (integer . -165090)
          (word . "em")
          (yn . "Y"))
  #hasheq((bool . "false")
          (char . "w")
          (date . "11/27/2064")
          (float . 99397538938.88)
          (integer . -207296)
          (word . "doptah")
          (yn . "Y"))
  #hasheq((bool . "false")
          (char . "x")
          (date . "06/28/2044")
          (float . 590216172037.734)
          (integer . 316236)
          (word . "gutijenel")
          (yn . "N"))
  #hasheq((bool . "true")
          (char . "7")
          (date . "08/12/2045")
          (float . -309360363516.723)
          (integer . 722897)
          (word . "hueh")
          (yn . "N"))
  #hasheq((bool . "true")
          (char . "0")
          (date . "07/23/1904")
          (float . -509332808531.968)
          (integer . -903116)
          (word . "fufcora")
          (yn . "Y")))

Apparently, the json module keeps track of the original order of the keys (even though they are sorted in the hash table) because the order of the columns in the data frame is preserved.

> fromJSON("Racket_list_hash.json")
        date  bool integer      word yn char         float
1 12/25/2060  true -958838      zosu  Y    u  590131109036
2 02/18/2023  true -165090        em  Y    1  528052918839
3 11/27/2064 false -207296    doptah  Y    w   99397538939
4 06/28/2044 false  316236 gutijenel  N    x  590216172038
5 08/12/2045  true  722897      hueh  N    7 -309360363517
6 07/23/1904  true -903116   fufcora  Y    0 -509332808532

Conclusion

My primary use case for JSON files would be to create lookup tables in R for input values that are used in simulation models built in Racket. In R, I have tended to use data frames for lookup tables because a data frame is the typical output of a data analysis pipeline. I will need to learn more about working with lists of hash tables and nested hash tables in Racket to decide if it makes sense to reformat a data frame into a nested list before writing the object to JSON.


[1] I even used CSV files as a database replacement in my Shiny Scorekeeper app.

[2] See also this post for more on reading and writing JSON files.

[3] See also previous post on the similarity of nested lists in R and nested hash tables in Racket.