In three previous posts, I wrote about different programming languages that I have considered learning. I mentioned about 15 different languages in those posts. F# was not on the list. Because my background is in R, I thought I was better off sticking to learning dynamically typed languages at this point. Moreover, I hold a longstanding bias against Microsoft and Windows and that bias was easy to transfer to F#.
A few months ago, I started a new job. My work primarily involves R, but there is a software developer at the new company who primarily uses C#. Because I'm interested in eventually shifting my role to involve more software development, I thought it might be worthwhile to invest some time in learning C#. But, after a little more reading about C# and .NET, I found myself more drawn to F# as a functional-first language with strengths in data science (relative to C#).
Around the time that I started following the F# community, I saw a call for applications to the F# Mentorship Program. I decided to apply and was assigned a mentor. I've thoroughly enjoyed the mentorship program and, although I still know very little F#, I can see my appreciation of F# growing to the level of R and Scheme.
I had previously written about implementing a stochastic population model in Racket based on an R example found in this blog post. As a first exercise in F#, I've translated the stochastic population model into a few F# versions.
I'm working with F# in Visual Studio Code with Ionide on Linux (Ubuntu), but I think that the code shared in this post will work on other platforms. We can initialize a new console app with the following line.
dotnet new console -lang "F#" -o StochasticLogistic
That line creates a folder
StochasticLogistic containing the files
StochasticLogistic.fsproj and a folder
Program.fs file for the code described in this blog post is available in this gist.
We will use the
MathNet.Numerics package for drawing random numbers. One way to install the package is with
dotnet add package MathNet.Numerics --version 4.12.0
The first lines of
Program.fs open the
Distributions namespace of
MathNet.Numerics and set the values of the parameters used in the model.
open MathNet.Numerics.Distributions [<EntryPoint>] let main argv = let yinit = 1.0 // initial population size let r = 1.4 // maximum population growth rate let k = 20.0 // carrying capacity let thetasd = 0.1 // standard deviation for adding noise to the population let t = 4 // number of years of growth to simulate
The first version of the function,
logmodFor, follows the R example by using a
for loop to fill an array. In F#, we draw random numbers by initializing a distribution and then sampling from the distribution. The
theta variable is the stochastic part of the model. In this version, we initialize a 1D array with random draws from a normal distribution (following the R example). Alternatively, we could have drawn one value every time through the
Array.init illustrates the syntax for anonymous functions in F#, e.g., (fun x -> x + 1). F# infers types from context. In the case of
Array.zeroCreate, though, we need to specify the type (
float) of the zeros in the array. We initialize the 0th element of the output array
ys to the initial population value
y and fill the rest of the array each time through the loop.
let logmodFor t y r k thetasd = let normalDist = Normal(0.0, thetasd) let theta = Array.init (t - 1) (fun _ -> normalDist.Sample()) let ys: float array = Array.zeroCreate t Array.set ys 0 y for i in 1 .. (t - 1) do Array.set ys i (ys.[i-1] * (r - r * (ys.[i-1] / k)) * exp(theta.[i-1])) ys
The next version of the function,
logmodRecIf, uses recursion and is written in a style similar to how I tend to write recursive functions in Scheme. I moved the main calculation into a separate function
calc to improve readability. Note the use of
rec to indicate the recursive function
:: is the cons operator.
calc acc.Head could also have been written as
calc (List.head acc), but dot notation is one of the recommended object-oriented features to include in your F# code.
let logmodRecIf t y r k thetasd = let normalDist = Normal(0.0, thetasd) let calc y = y * (r - r * (y / k)) * exp(normalDist.Sample()) let rec loop acc t = if t = 1 then List.rev acc else loop ((calc acc.Head) :: acc) (t - 1) loop [y] t
The next version, logmodRecMatch, is only slightly different than the last. In this version, we are using pattern matching instead of
if then else. Even in this simple example, which doesn't require the power of
match, I find the pattern matching version more readable. With
match, we are first checking if the value of
1 and, if not, recursing through the rest of the list.
let logmodRecMatch t y r k thetasd = let normalDist = Normal(0.0, thetasd) let calc y = y * (r - r * (y / k)) * exp(normalDist.Sample()) let rec loop acc t = match t with | 1 -> List.rev acc | _ -> loop ((calc acc.Head) :: acc) (t - 1) loop [y] t
The next version,
logmodUnfold, was provided by my F# mentor. I don't fully understand it, yet. For example, I'm not sure (and don't remember what I was told) if using
Some was a convenient way to make the function type check or if it represents best practice for this use case. This approach uses sequences, which are lazy. F# also provides
List.take, but they are not drop-in replacements because lists are not lazy and
calc as written here never terminates.
let logmodUnfold t y r k thetasd = let normalDist = Normal(0.0, thetasd) let calc y = let result = y * (r - r * (y / k)) * exp(normalDist.Sample()) Some (y, result) Seq.unfold calc y |> Seq.take t
Now that we have our functions we need call them in our program and display the output in the console. Because the model has a stochastic component, it is not obvious if all four versions of the function are performing the same calculation. I wanted to be able to run deterministic or stochastic versions of the functions when calling the program from the command line. I also decided to provide the number of years
t as a command line argument to the program.
The arguments provided to the program are collected in a string list
argv. We will pass
t as the first argument and convert the string to an integer with
int. If the
t argument can't be converted to an integer, then the program will produce a runtime error.
[<EntryPoint>] let main argv = let yinit = 1.0 // initial population size let r = 1.4 // maximum population growth rate let k = 20.0 // carrying capacity let thetasd = 0.1 // standard deviation for adding noise to the population let t = int argv. // number of years of growth to simulate let seedType = argv. // "fixed" or "random" for setting deterministic behavior
argv allows for specifying fixed or random behavior. To use
seedType, I added two new lines to each function and changed the
Normal function to use the
let random = System.Random() let seed = if seedType = "fixed" then 2005 else random.Next() let normalDist = Normal(0.0, thetasd, RandomSource = System.Random(seed))
The following block of code runs our four function versions and pipes
|> the output into a
printfn. Note, the
|> operator pipes the previous output into the last argument of the next function.
"%A" is used to print any F# object.
printfn "for loop" logmodFor t yinit r k thetasd seedType |> printfn "%A" printfn "---" printfn "recursion (if)" logmodRecIf t yinit r k thetasd seedType |> printfn "%A" printfn "---" printfn "recursion (match)" logmodRecMatch t yinit r k thetasd seedType |> printfn "%A" printfn "---" printfn "unfold" logmodUnfold t yinit r k thetasd seedType |> printfn "%A"
We can run our program with
dotnet run and the required arguments.
$ dotnet run 4 fixed for loop [|1.0; 1.305926731; 1.454837341; 1.798817442|] --- recursion (if) [1.0; 1.305926731; 1.454837341; 1.798817442] --- recursion (match) [1.0; 1.305926731; 1.454837341; 1.798817442] --- unfold seq [1.0; 1.305926731; 1.454837341; 1.798817442]
dotnet run compiles the program before running it. If you want to re-run the program with new arguments, but no changes to the program, you can use
dotnet run 4 fixed --no-build. Alternatively, you can call the previously built
dll directly with
dotnet bin/Debug/netcoreapp3.1/StochasticLogistic.dll. The last approach seemed to run the program with the least lag time. It's also easy to create cross-platform executables of your program (described here).
Next, we want to run each
logmod function many times. This
repeat function repeats an anonymous function,
n times. The anonymous function is called with
f() and the output is cons'd onto the accumulator.
let repeat n f = let rec loop n acc = match n with | 0 -> acc | _ -> loop (n - 1) (f() :: acc) loop n 
Because I wanted to conduct some informal performance comparisons among the functions, I added a couple of more arguments to the top of the program.
let funType = argv. // "for", "recif", "recmatch", "unfold" let reps = int argv. // number of replications
And added this code to the bottom of the program. Because we are not interested in the actual output (only the execution time), the output is piped to
if funType = "for" then repeat reps (fun () -> logmodFor t yinit r k thetasd seedType) |> ignore elif funType = "recif" then repeat reps (fun () -> logmodRecIf t yinit r k thetasd seedType) |> ignore elif funType = "recmatch" then repeat reps (fun () -> logmodRecMatch t yinit r k thetasd seedType) |> ignore elif funType = "unfold" then repeat reps (fun () -> logmodUnfold t yinit r k thetasd seedType) |> ignore else invalidArg "funType" "must be one of for, recif, recmatch, unfold"
First, we will run the program with a small number of years and reps to compile it.
dotnet run 4 random for 2
And then we can do some informal timing tests.
$ time dotnet bin/Debug/netcoreapp3.1/StochasticLogistic.dll 100 random for 50000 real 0m1.355s user 0m1.280s sys 0m0.073s $ time dotnet bin/Debug/netcoreapp3.1/StochasticLogistic.dll 100 random recif 50000 real 0m2.244s user 0m2.211s sys 0m0.111s $ time dotnet bin/Debug/netcoreapp3.1/StochasticLogistic.dll 100 random recmatch 50000 real 0m2.237s user 0m2.231s sys 0m0.108s $ time dotnet bin/Debug/netcoreapp3.1/StochasticLogistic.dll 100 random unfold 50000 real 0m0.516s user 0m0.492s sys 0m0.028s
I was not surprised to see that using arrays in a
for loop was nearly 2x as fast as recursion. But I was surprised that using sequences with
unfold was over 2x as fast as the
for. I'm wondering if sequences are so lazy that this code is only timing the creation of a lists of sequences to be evaluated later. When I explicitly convert the sequence to a list in
Seq.unfold calc y |> Seq.take t |> List.ofSeq
the function is 2x as slow as recursion.
$ time dotnet bin/Debug/netcoreapp3.1/StochasticLogistic.dll 100 random unfold 50000 real 0m4.049s user 0m4.055s sys 0m0.185s
I don't know how much of the extra time is related to list conversion versus executing the lazy sequences. When I learn more about sequences, I will update this post with an answer.