Haskell in R? An experiment with the R package lambda.r

Clemens Schmid
9 min readMay 6, 2021

--

TL;DR: Feel free to directly jump to The lambda.r implementation if you only want to see that. The full code is linked as a Github Gist at the end of the article.

Haskell and R are quite different programming languages. One is purely functional, statically typed and prominently features some of the most obscure abstractions in Computer Science. The other one lives at a particularly weird spot at the crossroad of the object-oriented, imperative and functional paradigms, has a ductile and dynamic type system and is optimized for the pragmatic needs of data analysis.

But still these two languages share some interesting features. For example both can be run interactively in an interpreter environment. And both consider functions first-class citizens — thus offering higher-order functions — and allow the definition of custom infix operators. And that’s why something like lambda.r is possible in the first place.

lambda.r (here v.1.2.4) is an R package that provides syntax extensions to write functional, Haskell-like code in R. It implements an astonishing number of features including type and function definition, pattern matching, guard statements and even monads! True functional programming available at your fingertips in R. All while maintaining a surprisingly Haskell-like syntax and incorporating powerful bonus features from R. Even a custom debugging system is part of the package.

The author Brian Lee Yung Rowe did an incredible job and also maintained the package over a commendable time span — the first commit on Github is from 2012 and the last change was pushed 2019.

Of course the package has some known limitations and rough edges. In my opinion it’s an extremely clever proof of concept and I enjoyed very much playing with it, but I’m not sure if I would recommend it for use in production. I’ll leave that to you and instead show you what I managed to build with it.

The experiment

Recently I wanted to implement a simple but specific logic in a bioinformatics context — so this is a real world example. But it would be tedious to explain the background, so I’ll instead replace the entities with something more digestible: Apples.

pixabay: Apples and Pears

Let’s say we have two sets of apple varieties and then a number of other fruit variety sets (varieties of pears, plums, strawberries, …). The first apple collection is large and covers all sorts of types: Ambrosia, Granny Smith, Red Delicious, Jonagold, Rome, Honeycrisp and many more. The second apple collection is much smaller, but a strict subset of the first one. It only includes the three varieties Granny Smith, Red Delicious and Honeycrisp. We don’t really care about the other fruits, honestly.

Merging fruit variety sets in Haskell

How could we model these sets in Haskell? We don’t need to consider the individual varieties here. Only the variety collections. So we could create the type FruitSet with three data constructors for the three different relevant sets. For the sake of simplicity let’s shorten their names to

  • LAS = Large Apple Set
  • SAS = Small Apple Subset
  • OFS = Other Fruit Set
data FruitSet =
LAS
| SAS
| OFS
deriving (Eq, Show)

Now about the issue we have to solve for these sets: We need a function that merges a list of fruit sets according to a very specific logic into only one output fruit set. This has to adhere to the following pair-wise (and undirected) merging rules:

  • If we merge two identical sets then the output should just be that set. That makes sense: Consider for example two Large Apple Sets. All the Ambrosia, Rome, Red Delicious and so forth apple varieties are present in both of the input sets in a pair-wise comparison.
  • If we merge any set with one of the Other Fruit Sets then the output should always be an Other Fruit Set. Of course: we have a weird mixture of species and fruit varieties afterwards.

For the final two rules I have to add now, that we also have to consider two different kind of merges: A union merge and an intersect merge.

  • If we merge a Large Apple Set and a Small Apple Subset with a union merge, then a Large Apple Set should be returned. That makes sense: The varieties in the small subset — Granny Smith, Red Delicious and Honeycrisp — are already part the large superset.
  • If we merge a Large Apple Set and a Small Apple Subset with an intersect merge, then we should get a Small Apple Subset. That just follows the same logic as in the previous rule.

I think these rules are an excellent application for pattern matching in Haskell. We could implement them in a function like this:

fSMerge :: FruitSet -> FruitSet -> Bool -> FruitSet
fSMerge LAS LAS _ = LAS
fSMerge SAS SAS _ = SAS
fSMerge OFS _ _ = OFS
fSMerge _ OFS _ = OFS
fSMerge LAS SAS True = SAS
fSMerge SAS LAS True = SAS
fSMerge LAS SAS False = LAS
fSMerge SAS LAS False = LAS

Even if you’re not familiar with Haskell you may appreciate how the different pair-wise comparison cases are expressed here. The function takes two FruitSets and a logical to distinguish union (False) and intersect (True) merges. For many of these rules it does not even matter which kind of merge is applied. Here we can replace the pattern with the wildcard symbol _.

Now that we have these rules, we can also implement the function that applies them to an arbitrary list of FruitSets to determine the appropriate superset.

fSMergeList :: [FruitSet] -> Bool -> FruitSet
fSMergeList (x:xs) intersect =
foldr (\a b -> fSMerge a b intersect) x xs

It uses a fold to combine the list elements into one. Folds are operations that look at two elements of a list, apply some binary function to them, take the result and apply the same function again to that and a new list element. Just until only one result remains and the list is gone. Folds usually need a starting value that serves also as an “accumulator” to track the list-condensing result along the fold’s way through the list.

Here I used Haskell’s clever pattern matching on lists (x:xs) to separate the input list’s head and tail. That makes it straight forward to set the head element as the starting value for the fold. We will see below that lambda.r is less elegant here.

Finally we can test our code:

> fSMergeList [LAS] True
LAS
> fSMergeList [LAS, LAS] True
LAS
> fSMergeList [LAS, LAS, SAS] True
SAS
> fSMergeList [LAS, LAS, SAS] False
LAS
> fSMergeList [LAS, LAS, OFS] False
OFS

Works like a charm! Let’s compare that with lamda.r now.

The lambda.r implementation

lambda.r provides some functions, mostly clever infix operators, to enable a Haskell-like logic and syntax in R. To access them we have to install and load the package first.

install.packages(“lambda.r”)
library(lambda.r)

Just as in the Haskell code above we have to find a way to represent fruit sets. With lambda.r, types are defined by their constructor functions. Each function has a name and input arguments separated from a return value or operation with the %as% infix operator.

FruitSet("LAS") %as% "LAS"
FruitSet("SAS") %as% "SAS"
FruitSet("OFS") %as% "OFS"

A distinction of type and data constructor as in Haskell does not exist to my knowledge. Also no nullary data constructor (“constants”). So I decided to be creative and use pattern matching on strings to simulate a data type for different fruit sets. lambda.r understands this syntax perfectly fine and prints the resulting type as follows:

<type constructor>
[[1]]
FruitSet("LAS") %:=% ...
[[2]]
FruitSet("SAS") %:=% ...
[[3]]
FruitSet("OFS") %:=% ...

With that data type we can define the pair-wise merging logic as laid out above.

fsMerge(a,b,intersect) %::% FruitSet : FruitSet : logical : FruitSet
fsMerge("LAS", "LAS", intersect) %as% FruitSet("LAS")
fsMerge("SAS", "SAS", intersect) %as% FruitSet("SAS")
fsMerge("OFS", b, intersect) %as% FruitSet("OFS")
fsMerge(a, "OFS", intersect) %as% FruitSet("OFS")
fsMerge("LAS", "SAS", TRUE ) %as% FruitSet("SAS")
fsMerge("SAS", "LAS", TRUE ) %as% FruitSet("SAS")
fsMerge("LAS", "SAS", FALSE ) %as% FruitSet("LAS")
fsMerge("SAS", "LAS", FALSE ) %as% FruitSet("LAS")

Note how extremely similar this syntax is to Haskell. The type interface definition follows exactly the same principle, short of some minor deviations when :: became %::% in R and -> is replaced by :. R has some limitations regarding infix operators.

One key take-away is, that this function will not run with input that is not exactly as specified. lambda.r thus introduces a static type system into R.

The pattern matching in the function definition is just as in Haskell, except of course for a number of syntactic details like the parentheses, commas, string-based values and lack of explicit wildcards. It’s another language after all!

With this function implemented, we only lack the last component: The function to apply the pair-wise comparisons with a fold on a list of FruitSets. And here things start to become a bit more tricky, unfortunately. Let’s start with the result:

fsMergeList(xs, intersect) %::% FruitSetList : logical : FruitSet
fsMergeList(xs, intersect) %as%
Reduce(
function(a, b) { fsMerge(a, b, intersect) },
xs[tail(seq_along(xs), n = -1)],
init = xs[[1]]
)

The general structure is again very Haskell-like. For the folding we use the Reduce function from the R base package (which is something like the Prelude in Haskell). One major difference between lambda.r and Haskell is though, that lambda.r lacks a good default way to handle lists. Maybe I just missed the relevant documentation or overlooked something else, but I struggled a bit with that.

In the end I decided to come up with my own list type.

FruitSetList(…) %::% FruitSet… : FruitSetList
FruitSetList(…) %as% asFruitSetList(list(…))
asFruitSetList(xs) %::% list : FruitSetList
asFruitSetList(xs) %as% {
class(xs) <- c(“FruitSetList”)
xs
}

This constructor makes use of the Ellipsis type ..., a weird feature of R well integrated into lambda.r. It’s a single input argument that can represent a set of multiple arguments. In lambda.r it can be combined with a type constraint to make sure that the function takes an arbitrary amount of arguments, but only of this type. So here of type FruitSet.

That allows for a pretty cool constructor syntax:

> FruitSetList(FruitSet(“LAS”), FruitSet(“SAS”), FruitSet(“OFS”))[[1]]
[1] "LAS"
attr(,"class")
[1] "FruitSet" "character"
[[2]]
[1] "SAS"
attr(,"class")
[1] "FruitSet" "character"
[[3]]
[1] "OFS"
attr(,"class")
[1] "FruitSet" "character"
attr(,"class")
[1] "FruitSetList"

Unforturnately I found no direct way to catch the ellipsis and make it a FruitSetList. With list(...) I could indeed transform it to a list, but that’s only half the job. I resorted to the rather ugly asFruitSetList that “manually” adds the “FruitSetList” label to the class attribute of the output object. That works because lambda.r utilizes R S3 classes for its magic.

With that out of the way there was still one issue to address. I could not use Haskell’s pattern matching on lists to separate the head and tail elements for the Reduce input. It’s easy to get the first element of a list in R, but the tail requires some more advanced indexing:

xs[tail(seq_along(xs), n = -1)]

All issues should be solved now. It’s time for a final test run of our code:

> fsMergeList(FruitSetList(FruitSet("LAS")), TRUE)
[1] "LAS"
> fsMergeList(FruitSetList(FruitSet("LAS"), FruitSet("LAS")), TRUE)
[1] "LAS"
> fsMergeList(FruitSetList(FruitSet("LAS"), FruitSet("LAS"), FruitSet("SAS")), TRUE)
[1] "SAS"
> fsMergeList(FruitSetList(FruitSet("LAS"), FruitSet("LAS"), FruitSet("SAS")), FALSE)
[1] "LAS"
> fsMergeList(FruitSetList(FruitSet("LAS"), FruitSet("LAS"), FruitSet("OFS")), FALSE)
[1] "OFS"

Excellent! The Syntax is more verbose as the one in Haskell, but the results are the same.

Recap

  • Haskell and R are both versatile languages with large communities that regularly suggest and discuss new abstractions. Haskell is a real innovation machine and carries many functional programming concepts into other languages.
  • lambda.r is a syntax extension to make some of the power of Haskell (or similar functional programming languages) available in R.
  • lambda.r works and is extremely fun to play with, but it’s pretty verbose and lacks (at least to my understanding) a good list implementation. I also suspect it not to be optimized for performance — probably quite the opposite.

I personally would love to see some of the concepts demonstrated with lambda.r to find their way into regular, base R. Especially a way to switch on static typing! That could avoid a lot of unexpected behavior. R interfaces often feel flimsy and not as rock solid as comparable code in Haskell. The approach lambda.r took here — e.g. with the Don’t-Care Type ., which I did not introduce — could be a way to combine dynamic and static typing. Ideally we want more sturdy interfaces without sacrificing R’s great flexibility for rapid prototyping.

Acknowledgements: I got some valuable feedback by my colleague James Fellows Yates (@jfy133) for this post.

--

--

Clemens Schmid
Clemens Schmid

Written by Clemens Schmid

Computational archaeologist and PhD student at the MPI SHH/MPI EVA

No responses yet