Generating random numbers in R and Racket
R makes it easy to generate random numbers from a wide variety of distributions with a consistent interface. For example,
rpois generate random numbers from uniform, normal, and Poisson distributions, respectively.
> x = runif(n = 1000, min = 4.6, max = 9.3) > min(x)  4.60374 > max(x)  9.288063 > > y = rnorm(n = 1000, mean = 81, sd = 9) > mean(y)  81.2548 > sd(y)  9.179427 > > z = rpois(n = 1000, lambda = 7.11) > mean(z)  7.187 > var(z)  6.954986
math library provides much of the same functionality as the base R functions, but I first want to work through a couple of examples without using the
math library because I think it is useful for better understanding Racket.1
Let’s create a Racket function that is similar to
runif in R.
#lang racket/base (define (runif n [min 0] [max 1]) (for/list ([i (in-range n)]) (+ (* (- max min) (random)) min)))
random function draws a random number from a uniform distribution between 0 and 1. A list of
n random samples is drawn by looping through a sequence with
for/list. To draw a random number within a specified interval, we multiply the random number by the specified range
(- max min) and shift by the
> (define unif (runif 1000 -7 33)) > (apply min unif) ; check that min is about -7 -6.987705647350004 > (apply max unif) ; check that max is about 33 32.97250528872971
The need to use
apply to calculate the
min of a list was a surprise to me. The
min function takes any number of real numbers as arguments but not a list.
apply uses the contents of the list as the arguments for
> (min 3 1 2) 1 > (min '(3 1 2)) min: contract violation expected: real? given: '(3 1 2)
Another wrinkle is introduced if you want to generate a vector (with
for/vector) of random numbers rather than a list.
(define (runif-vec n [min 0] [max 1]) (for/vector ([i (in-range n)]) (+ (* (- max min) (random)) min)))
We can’t use
apply with a vector.
> (define unif-vec (runif-vec 1000 0.5 1.5)) > (apply min unif-vec) apply: contract violation expected: list? given: '#(1.3465423772765357 1.0623864994795045 1.3437265664094888 1.1048441535810904 0.7861090566755002 0.6470981723620602 0.5546597487687198 0.552946824350613 0.5251621516034304 1.3568612624022975 1.1217164560959263 1.3334770245391927 0.890117942389224... argument position: 2nd other arguments...:
StackOverflow provides a solution based on
> (for/fold ([m +inf.0]) ([x (in-vector unif-vec)]) (min m x)) 0.5014616191163698 > (for/fold ([m -inf.0]) ([x (in-vector unif-vec)]) (max m x)) 1.4994848021056595
m is an accumulator that is initialized as positive (
+inf.0) or negative infinity (
-inf.0) for min and max, respectively. When iterating through
m is compared to
x and the value of
m is updated with the output of
Random values can also be drawn from a normal distribution using draws from the standard uniform distribution with
random and a formula from Rosetta Code
> (define (rnorm n [mean 0] [sd 1]) (for/list ([i (in-range n)]) (+ mean (* (sqrt (* -2 (log (random)))) (cos (* 2 pi (random))) sd)))) > (define norm (rnorm 1000 2 0.5)) > (require math) ; need math library for mean and standard deviation > (mean norm) ; check that mean is about 2 1.9797112583083123 > (stddev norm) ; check that sd is about 0.5 0.5022756727111091
Now, let’s simplify our life and use the functions provided by the
(require math) (define (rnorm-2 n [mean 0] [sd 1]) (if (and (real? mean) (real? sd)) (flnormal-sample (real->double-flonum mean) (real->double-flonum sd) n) (error "mean and sd arguments must be real numbers")))
flnormal-sample, requires that the mean and standard deviation are passed as floating-point numbers (and returns
rnorm-2, we check if the
sd arguments are real numbers and, if so, convert them to floating-point numbers (
real->double-flonum) before calling
flnormal-sample, which is otherwise similar to R’s
> (define norm-2 (rnorm-2 1000 2 0.5)) > (mean norm-2) 2.024086073688603 > (stddev norm-2) 0.5039448247564015
math library contains similar sample functions for numerous types of distributions, including uniform, Poisson, beta, binomial, gamma, and more.
Finally, let’s take a look at the
statistics object provided by the
> (define s (update-statistics* empty-statistics norm-2)) > (statistics-mean s) 1.9947322187693184 > (statistics-stddev s) 0.4974799838849746
empty-statistics is an empty statistics object.
update-statistics* populates the statistics object by iterating through
norm-2 to compute summary statistics, which are extracted with accessor functions, e.g.,
The question is why would you use the more verbose statistics object. One reason is that
statistics-max are actually less verbose than using
max. Also, if you are computing several summary statistics on a large sequence, then
update-statistics* is faster than the other assorted functions.
> (define norm-3 (rnorm-2 10000000 1000 100)) > (time (mean norm-3) (stddev norm-3) (variance norm-3) (for/fold ([m +inf.0]) ([x (in-flvector norm-3)]) (min m x)) (for/fold ([m -inf.0]) ([x (in-flvector norm-3)]) (max m x))) cpu time: 72618 real time: 108130 gc time: 34828 > (time (define s2 (update-statistics* empty-statistics norm-3)) (statistics-mean s2) (statistics-stddev s2) (statistics-variance s2) (statistics-min s2) (statistics-max s2)) cpu time: 12478 real time: 22668 gc time: 5512
Welcome back, Future Me.↩