I have recently started learning Racket. For a first task, I tried to build a simple age-structured population model. I hit a stumbling block and reached out to the helpful folks on the Racket mailing list. In this post, I recap the mailing list exchange with a target audience of R programmers that are interested in learning more about Racket.

The simple model used for this exercise is a deterministic multistage Beverton-Holt model. In this example, we consider a hypothetical population with five annual age classes.

R Version

The R code follows a conventional imperative approach where an empty matrix is initialized and nested for loops are used to move through the elements of the matrix and propagate the population forward through time. In the results matrix, each row is one year and each column is an age class.

years = 30
prop_female = 0.5
egg_surv = 0.6
fecundity = c(0, 0, 200, 400, 800)
survival = c(0.2, 0.4, 0.6, 0.8, 0)
capacity = c(1e6, 1e5, 1e4, 1e3, 1e2)

beverton_holt <- function(N, p, c){
  N / ((1/p) + (N/c))
} 

results = matrix(data = 0, nrow = years, ncol = length(fecundity))
results[1,] = 10
for (i in 1:(nrow(results) - 1)){
  for (j in 1:(ncol(results))){ 
    N = results[i,j]
    fec_age_j = fecundity[j]
    surv_age_j = survival[j]
    # spawning
    if (fec_age_j > 0){
      spawn_age_j = N * prop_female
      results[i+1, 1] = results[i+1, 1] + beverton_holt(spawn_age_j, fec_age_j * egg_surv, capacity[1] - results[i+1, 1])
    }
    # survival
    if (surv_age_j > 0){
      results[i+1, j+1] = results[i+1, j+1] + beverton_holt(N, surv_age_j, capacity[j+1] - results[i+1, j+1])
    }
  }
}

Racket Translation

Now, we will slowly work through my translation of this R code into Racket code. Racket comes with an IDE called DrRacket. In DrRacket, the definitions and interactions panes correspond to the source and console panes in RStudio, respectively. At the top of the definitions pane, you need to specify the language because Racket is a programming language laboratory.

#lang racket/base

In this exercise, I chose to work with racket/base rather than the batteries-included racket. At the time, I was thinking that was a useful simplification. In retrospect, it was a strange choice because I liberally use R packages to avoid reinventing the wheel. I later discovered that the Racket documentation provides guidance on how to choose between racket and racket/base.

In the next few lines, we define the parameters used in the model. The define function assigns, for example, the value of 30 to the variable years. The # creates a vector; it is shorthand for the vector function. #(1 2 3) is equivalent to (vector 1 2 3).

;; scalar constants
(define years 30)
(define prop-female 0.5)
(define egg-surv 0.6)

;; age-specific fecundity and survival
(define fecundity #(0 0 200 400 800))
(define survival #(0.2 0.4 0.6 0.8 0)) 
(define capacity #(1e6 1e5 1e4 1e3 1e2))

A Racket vector is more like a list in R because it can contain heterogenous data types, e.g., #(1 "cat" #(2 "dog")). But Racket also has a list data structure. Needless to say, I still don't have a good grasp of the similarities and differences between data structures in R and Racket.

The Beverton-Holt function definition illustrates Racket's use of prefix operators. Operations are read from inside out [1].

;; multistage Beverton-Holt model 
(define (beverton-holt N p c)
  (/ N (+ (/ 1 p) (/ N c))))

Lisp-family languages have the reputation of being hard to read because of all the parentheses. I don't really mind the parentheses. In fact, I tend to overuse them in my R code because I find it more readable. My R version of the Beverton-Holt function includes 8 parentheses [2] and 2 curly braces. The Racket version has 12 parantheses.

If you read through the mailing list thread, you would see that this next piece of code was the source of my problem. I was trying to make a results matrix out of a vector of vectors.

(define results (make-vector years (make-vector (vector-length fecundity) 0)))

The make-vector function seemed similar to vector in R and printing the initialized results "matrix" produced something that looked like the expected result. However, Alex Harsanyi explained...

The (make-vector (vector-length fecundity) 0) expression will create a single vector, then it creates the outer vector with all elements pointing to it. It is not a matrix, but a "column" vector where each element is referencing the same row vector. This means that if you update an element in one of the rows, the same value will "appear" in all other rows.

I can mostly understand the how of that explanation but why that behavior is desirable is currently beyond my level of understanding. A couple of folks chimed in on the mailing list with examples contrasting make-vector, build-vector, and for/vector to try to help me understand. Those examples went over my head, but I admittedly have not yet tried very hard to understand them.

Alex provided the following code for creating a vector of vectors.

;; initialize empty results "matrix"
(define results
  (for/vector ([i (in-range years)])
    (make-vector (vector-length fecundity) 0)))

The in-range function creates a sequence; comparable to seq in R. Iterating through a sequence with for/vector [3] produces a vector [4]. For example, (for/vector ([i (in-range 5)]) i) produces '#(0 1 2 3 4). By replacing i in that simple example with (make-vector (vector-length fecundity) 0), we get a "matrix."

'#(#(0 0 0 0 0)
   #(0 0 0 0 0)
   #(0 0 0 0 0)
   #(0 0 0 0 0)
   #(0 0 0 0 0))

Next, we use vector-set! to set the first "row" to a vector with abundances for each age class set to 10.

;; initialize abundances in first year to arbitrary non-zero value
(vector-set! results 0 (make-vector (vector-length fecundity) 10))

In Racket, nested for loops are specified with for*. My Racket code using nested for loops is a bit more verbose than the R version because it wasn't obvious to me how to write the equivalent of results[i,j] with a vector of vectors [5]. Instead, on every iteration, I created a vector Nt to hold the next year abundances, updated the Nt vector with vector-set!, and replaced that row in results with Nt (using vector-set! again).

;; iterate over results to fill "matrix"
(for* ([i (in-range (sub1 years))]
       [j (in-range (vector-length fecundity))])
  ;; current abundance vector
  (define N (vector-ref results i))
  ;; next year abundance vector
  (define Nt (vector-ref results (add1 i)))
  ;; reproduction
  (define fecundity-age-j (vector-ref fecundity j))
  (when (> fecundity-age-j 0)  ;; not all age classes reproduce
    (define N-female (* (vector-ref N j) prop-female))
    ;; next year age-0
    (define Nt-age-0 (vector-ref Nt 0)) 
  
    (define new-age-0 (beverton-holt
                       N-female
                       (* fecundity-age-j egg-surv)
                       (- (vector-ref capacity 0) Nt-age-0)))
    (vector-set! Nt 0 (+ Nt-age-0 new-age-0))
    )
  ;; survival
  (define survival-age-j (vector-ref survival j))
  (when (> survival-age-j 0)
    (define Nt-age-j (vector-ref Nt (add1 j))) 
    (define new-age-j (beverton-holt
                       (vector-ref N j)
                       survival-age-j
                       (- (vector-ref capacity (add1 j)) Nt-age-j)))
    (vector-set! Nt (add1 j) (+ Nt-age-j new-age-j))

    )
  (vector-set! results (add1 i) Nt)
  )

A couple of other differences from the R code involve if and indexing. when replaces if because Racket's if is similar to ifelse in R [6]. A vector is indexed with vector-ref and Racket uses 0-based indexing. For example, if we have (define x #(1 3 9)) and x = c(1, 3, 9) in Racket and R, respectively, then (vector-ref x 1) and x[2] both return 3.

Idiomatic Racket Version

In my initial post to the Racket mailing list, I asked both for help fixing my broken Racket code and suggestions for more idiomatic Racket alternatives. Daniel Prager offered the following alternative based on recursion.

#lang racket

(define years 30)
(define prop-female 0.5)
(define egg-surv 0.6)

(define fecundity '(0 0 200 400 800))
(define survival '(0.2 0.4 0.6 0.8 0))
(define capacity '(1e6 1e5 1e4 1e3 1e2 -9999))
(define cap0 (first capacity))

(define (beverton-holt N p c) (/ N (+ (/ 1 p) (/ N c))))

(define (evolve N [f fecundity] [s survival] [cap (rest capacity)] [Nt0 0] [Nt null])
    (if (null? f)
        (cons Nt0 (reverse Nt))
        (evolve (rest N) (rest f) (rest s) (rest cap)
                ;; reproduction
                (+ Nt0 (if (= (first f) 0)
                           0
                           (beverton-holt (* (first N) prop-female)
                                          (* (first f) egg-surv)
                                          (- cap0 Nt0))))
                ;; survival
                (if (= (first s) 0)
                    Nt
                    (cons (beverton-holt (first N) (first s) (first cap)) Nt)))))

(define (iterate N n [i 1])
  (displayln (list i N))
  (unless (= i n) (iterate (evolve N) n (+ i 1))))

(iterate (make-list (length fecundity) 10) years)

The first thing to notice is that Daniel's approach uses lists, not vectors. Like vectors, lists can be created with shorthand notation, e.g., '(1 2 3) versus (list 1 2 3). The capacity list includes an extra element (-999) to avoid passing an empty list in evolve but that element is never used.

The evolve function uses default values for every argument except N. evolve is a "list-eater" function that represents the inner loop from the nested for loop structure. When the fecundity list is empty, evolve returns the abundances of each age class in the next time step. When the second argument to cons is a list, cons return a list with the first argument appended to the beginning of the list. The last argument to the recursive call to evolve (i.e., code after survival) builds up Nt in reverse (i.e., left to right is oldest to youngest age class), which is why Nt needs to be reversed before appending Nt0 to the front of the list.

evolve is the workhouse in this solution. It is a relatively simple matter to write another recursive function (iterate) to repeat evolve for the specified number of years. In Daniel's solution, iterate displays the output without storing it. I've modified iterate below to return a list of lists.

(define (iterate N n [i 1])
  (if (= i n)
      (list N)
      (cons N (iterate (evolve N) n (+ i 1)))))
      
(define results (iterate (make-list (length fecundity) 10) years))

Unfortunately, I can't really explain why this works. I was simply following an example provided in response to my follow-up question about how to build up a data structure in recursive solution (where I provided a different little example).

Another response in that thread used an accumulator to build up the result, which is easier (for me) to understand. In the next code block, I apply that solution to the iterate function. The list of age-specific abundances (N) is accumulated via cons in every time step and finally the outer list is reversed

(define (iterate N iter [result '()])
  (if (zero? iter)
      (reverse (cons N result))
      (iterate (evolve N) (- iter 1) (cons N result))))

(define results (iterate (make-list (length fecundity) 10) years))

Conclusions

With a more complicated model, where a population is tracked in many dimensions (e.g., age class, habitat, etc.), I would still reach for looping through a multidimensional array rather than try to work out a recursive solution. But that is mostly because I still have a very poor understanding of recursion. I had a great experience seeking feedback from the Racket mailing list and learned a lot even if I didn't fully understand all of the shared wisdom.


[1] The threading module allows for writing Racket code in "pipelines" to reduce deeply nested code.

[2] Admittedly, four of those parantheses are superfluous but I find it more readable to make the order of operations explicit.

[3] Square brackets can be used in place of parantheses to improve readability. Iterating through a sequence is one of the contexts where the convention is to use square brackets.

[4] Just as for/list produces a list.

[5] I subsequently wrote a version using math/array that is a more direct translation of the R code.

[6] Well, not exactly. See this post for clarification.