Hey folks :wave:
I was wondering if Roc had tail recursion, and if so to what extent? Does it work the same way as in Elm, or are other patterns supported (like tail recursion modulo cons, etc.)?
(I'm currently supporting more cases for tail recursion for Elm with elm-optimize-level-2, so I was hoping we could exchange pointers :) )
I think @Folkert de Vries would know best!
we currently do a pretty naive sort of tail recursion; calls to self in tail position are turned into a loop/jumps
we use join points to represent these (see https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/join-points-pldi17.pdf)
then, llvm has more general support for tail calls. These need not be recursive, but recursive ones also work. llvm inserts these as it sees fit
I don't think I ever properly understood tail calls modulo cons. Always thought I was missing something about ocaml's implementation details to understand how that works. But you do it in elm's generated code too?
the idea as I remember it was to transform this into something that takes one stack frame, right?
map f xs =
case xs of
Nil -> Nil
Cons x xx -> Cons (f x) (map f xx)
I don't think modulo cons is implemented in OCaml, I've seen proposals though. The way I see TRMC is basically that you can compute the data wrapper with a "hole", and the hole will be filled by a future iteration.
So if you have:
map : (a -> b) -> List a -> List b
map fn list =
case list of
[] -> []
x :: xs -> fn x :: map fn xs
then you can (and I now do :sunglasses: ) transform this to:
var map = F2(
function (fn, list) {
var $start = _List_Cons(undefined, _List_Nil);
var $end = $start;
map:
while (true) {
if (!list.b) {
return $start.b;
} else {
var x = list.a;
var xs = list.b;
$end.b = _List_Cons(fn(x), _List_Nil);
$end = $end.b;
list = xs;
continue map;
}
}
});
So we keep references to the start (start.b, start is just a wrapper to keep a reference) and the end of the list. At every iteration, we create a new list item that points to nothing, and replace the "next item" of the current end.
Your example would be compiled to
var $something$Nil = {$: 0};
var $something$Cons = F2(
function (a, b) {
return {$: 1, a: a, b: b};
});
var $something$map = function (pairs) {
var $start = { b: null };
var $end = $start;
map:
while (true) {
if (!pairs.$) {
$end.b = $something$Nil;
return $start.b;
} else {
var x = pairs.a;
var ps = pairs.b;
$end.b = A2($something$Cons, f(y), null);
$end = $end.b;
pairs = ps;
continue map;
}
}
};
So here our hole is currently filled with null, and filled at the next iteration.
well what we currently do is really this
map f xs =
if xs == NULL then
NULL
else
let ( x, xx ) = *xs;
a1 = f x
a2 = map f xx
cell = malloc(..)
cell.a1 = a1;
cell.a2 = a2;
cell
so we represent a linked list as a nullable pointer, NULL is the empty list, any non-null pointer is a Cons cell. I don't think LLVM is smart enough to make this into a loop today
what this does do actually is re-use the memory if we determine that is safe, so this linked list map could run in constant space
well, for the list, clearly not constant stack space, at least today
Gotcha, that's pretty nice already, but yeah, this could probably be improved to be made into a loop I'm guessing.
If I understand right we could do something like this (where &foo.1 in this case gets a pointer to the second field of the Cons cell)
map f list is
when list is
Nil -> Nil
Cons p ps ->
result = Cons (f p) NULL
join go = \xs, destination ->
when xs is
Nil ->
return result
Cons y yy ->
*destination = Cons (f y) NULL
jump go yy (&destination.1)
in
jump go ps (&result.1)
so here I give the next iteration of the loop an adress to store its result in (if it has any)
the question is then how this would interact with refcounting and our alias analysis
and the reset-reuse mechanism that allows us to reuse stack space
combining all of those sounds like an excellent master thesis project for someone
Yes, I think that would work. You could maybe reduce some duplication, but that's the idea at least.
well in the elm case you always return a heap-allocated object, even in the empty list case
we don't
so that's why the outer case is needed
but, I'd imagine, the hard part is figuring out when you can perform this trick
a tail call is relatively easy to recognize, but this seems quite a bit harder
It's not that bad, but yes, it's a bit harder :grinning_face_with_smiling_eyes:
So this recursion seems to work, but this is Roc code right? So you're doing manual tail recursion here. So my original question was, could the equivalent of
map : (a -> b) -> List a -> List b
map fn list =
case list of
[] -> []
x :: xs -> fn x :: map fn xs
be transformed and compiled to tailrecursive call? Because this would super simple for users of the language (I mean, it's textbook recursion)
well it could given the transformation we just discussed by sort of "forward-passing" the destination to write the result into. But we don't make any attempt to do that today
we currently transform this into
map : (a -> b) -> List a -> List b
map fn list =
case list of
[] -> []
x :: xs ->
a1 = fn x
a2 = map fn xs
a1 :: a2
at which point there is no call in tail position, so we just don't touch the function
so I guess you can see that the thing in tail position here is a cons, know that the function is recursive, and then do some further analysis to see if you can do the TC modulo cons transform?
Yes exactly. In my case (but I'm analyzing the compiled JS code), I check whether the return value is a cons, and then if the second argument is a recursive call, then I apply the transformation (I don't need to check anything more).
ah, so your transform only works for the builtin elm list?
It has a special case, but I think that when I will have a more general implementation, it won't be special. For now I special case custom type constructors and lists, which are like most of what I need to cover. But if I want to handle nested recursion, the best way will be to analyze all the functions and see if they respond to a few rules that enable this optimisation, which will then enable a lot more cases.
For now, I support:
isOkay x || recursive call x y+ recursion (imagine a naive List.sum implementation)* recursion (imagine a naive List.product or factorial implementation)++ recursion (working on it), to do things like "prefix " ++ recursiveCall x ++ " suffix"what makes + or * different from an arbitrary function call?
They're operators/binary operations (at least in the compiled JS), so they need to be handled slightly differently. Also, they're associative and have a neutral value (so kind of similar to a "hole"), so we can compute their sum/product before continuing with the loop, and thee add/multiply them to the accumulated value once we reach a non-recursive return statement.
ah I forgot you're dealing with the compiled output
I read the ocaml paper and got a better understanding. "destination-passing style" made it click for me. That transform might be beneficial in non-recursive contexts too. Tail calls don't have the overhead of a function call (like making a new stack frame), at least when the system stack is used (this will probably be very different for JS or with WASM). With that, I think we could represent TRMC roughly as follows in our IR
map : LinkedList a, (a -> b) -> LinkedList b
map = \list, f ->
when list is
Nil ->
return Nil;
Cons x xs ->
# basically, an uninitialized linkedlist cell
let head = Alloc;
join go = \first, rest, destination ->
when rest is
Nil ->
destination @ Cons(f first, Nil);
return head;
Cons y ys ->
let next = Alloc;
destination @ Cons(f first, next);
jump go y ys next;
in
jump go x xs head;
We assume some novel Alloc construct that allocates space for a value, but leaves the memory uninitialized. We already have location @ Constructor syntax to put a constructor value at a particular memory location; we can reuse that here.
cc @Jeroen Engels if you're interested!
@Folkert de Vries I made a PR for elm-optimize-level-2: https://github.com/mdgriffith/elm-optimize-level-2/pull/82
I didn't do the DPS style because that's less performant, but I wouldn't know what's best or most applicable in your use-case. Happy to discuss it though!
with DPS style you mean introducing an extra function?
5000 line PR, nice!
Lots of tests and docs though! If you have any remarks on either, let me know. I want it all to be good to make it easier to add to the Elm compiler
If I remember right the elm compiler already has a "looplike" construct (similar to join points?) to express tail recursion
so you'd need to formulate the transform in terms of that
aside: what do you do for trees? just pick the first opportunity for TRMC?
The OCaml implementation chose to only support data construction, whereas we also support arithmetic operators,
string concatenation and list concatenation.
I'm assuming this is related to ocaml implementation details. We use LLVM which already gives us the benefits of tail calls in simple scenarios like arithmetic operations. I wouldn't be surprised if ocaml has something similar on a lower level.
I don't think it has, I think developers have to write their recursive functions in a normal tail safe manner. I am not sure why they ended up only doing it for data constructors, and I don't know if they want to do more later.
For trees, yeah, just take a single one. But surprisingly, this is one of the only cases where performance seems to get worse with TRMC. I still need to benchmark this properly but I don't know enough.
For the Elm compiler and the loop like construct, do you mean in the output code, or in the compiler implementation?
yeah, there is (in AST/Optimized.hs
| Call Expr [Expr]
| TailCall Name [(Name, Expr)]
The Elm compiler has a TailCall variant in the "Optimized" AST which is the last intermediate representation before code gen. Then in code gen that gets transformed into a JavaScript while loop. That AST node is specifically used for tail calls and can't really be repurposed for other forms of recursion, at least not easily.
right
join points are more powerful than that
It's quite different from Roc in that Roc has an imperative IR which you can do more stuff with
yeah exactly
you could add join points to that data structure, the original paper adds them to one of GHC's intermediate languages
which is still pure functional
and order is given by data dependency
Yeah I only meant that Roc's IR can do more types of control flow without specific changes to the IR whereas Elm would need bigger changes
what you'd still need is to reason about the destination somehow, within that AST. So I wonder what the minimal primitives are for that
we already have the idea of "store a tag at a particular memory location" in our IR
but not yet the "allocate space for a tag" idea
Yeah Elm doesn't have anything explicitly about allocation
and I don't think you'd want to make it too explicit. What doesn't help is that in elm at this point you have lost type info
I think it would need some way of doing mutation. You'd need to make a Cons cell and then later change what its tail points to. So if you could do some kind of MutateField then you could initialise it to Unit or Nil and then replace it later.
but what is a "cons cell" here? you don't know the type info, so you can't just make something out of thin air ?!
Yeah but you'd be inserting this node before the type info is dropped
The tail calls are detected at an earlier stage
is the optimized ast generated before type checking ?
no, after type checking
I thought there was no relation between the variables and the AST in the elm compiler?
Well you can tell that it's a Union type, you'd have to work in those terms rather than specifically for List
ok, I just remember that it turned out to be fast to have the same representation for e.g. Cons and Nil nodes
in JS with the jit and such
so you'd want to "allocate" a value of the most optimal shape
Yeah. You'd have to detect the modulo cons pattern and replace it with some special node. Maybe detecting that pattern would be too hard to do without full type info. Because as you mentioned, Elm only checks that the types are valid and says "OK" or "error", it doesn't attach type info to any AST nodes.
Folkert de Vries said:
ok, I just remember that it turned out to be fast to have the same representation for e.g. Cons and Nil nodes
Yeah and that would also apply to any other Union. You want to find the maximum number of fields in any variant and then always use that number of fields for every variant, even if you're setting some of them to null or undefined.
Then the JS engine sees them as all being the same shape
So yeah if you were doing this it might be part of a whole set of changes like this... which is how Evan likes to work generally too!
oh elm itself does not yet do that "object with max number of fields" optimization?
Not as far as I know. It certainly didn't in 0.19.1
And no releases since then, so no
right
That was some analysis that Robin did a while back but it wasn't implemented
...yet?
(suspenseful music)
right but that felt like a long time ago
yup
yeah surely that must make it in at some point
Yeah if there is a round of JS optimisations at some point then this would be one of the obvious ones to do.
The currying helpers are a major factor as well.
A2, F2, etc.
Those two things would probably be simpler than modulo cons transformations and would probably affect more code.
I am guessing...
on the other hand stack overflows actually crash your program, slow currying does not
anyhow, I want Roc to have this; it's a fun bit of CS research with some nice benefits
even if we won't be using linked lists in roc a lot
Yeah absolutely! I think it's a good time for it in Roc.
In Elm there are other things that might be higher priority, because it only has one or two optimizations currently. Most of the 0.19 optimizations were in compiler speed rather than program speed.
@Brian Carroll yeah the optimization about always the same amount of fields is not in eol2 nor the compiler
A bit hard to implement in eol2 since we don't know what the original types are though...
but then neither does the current elm compiler
That's true, but it could change that. since eol2 can be run with only the JS, it can't rely on looking at the Elm code
I am super happy and excited about TRMC in Elm because it makes a lot of recursive functions simpler to write. Here are a few rewrites I did:
https://github.com/elm-community/list-extra/compare/master...jfmengels:new-tail-recursion?expand=1
But I don't know hoW applicable it would be to Roc if you don't use lists all that much.
well for us I think it clearly would help for trees
also putting functions in tail position in general (even without recursion) can have benefits
and I suspect that thinking a bit about memory locations in our IR can expose other optimizations down the line
Last updated: Jun 16 2026 at 16:19 UTC