Stream: ideas

Topic: Recursion


view this post on Zulip Jeroen Engels (Jan 08 2022 at 13:35):

Hey folks :wave:
I was wondering if Roc had tail recursion, and if so to what extent? Does it work the same way as in Elm, or are other patterns supported (like tail recursion modulo cons, etc.)?
(I'm currently supporting more cases for tail recursion for Elm with elm-optimize-level-2, so I was hoping we could exchange pointers :) )

view this post on Zulip Richard Feldman (Jan 08 2022 at 14:07):

I think @Folkert de Vries would know best!

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:09):

we currently do a pretty naive sort of tail recursion; calls to self in tail position are turned into a loop/jumps

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:09):

we use join points to represent these (see https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/join-points-pldi17.pdf)

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:10):

then, llvm has more general support for tail calls. These need not be recursive, but recursive ones also work. llvm inserts these as it sees fit

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:11):

I don't think I ever properly understood tail calls modulo cons. Always thought I was missing something about ocaml's implementation details to understand how that works. But you do it in elm's generated code too?

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:13):

the idea as I remember it was to transform this into something that takes one stack frame, right?

map f xs =
    case xs of
        Nil -> Nil
        Cons x xx -> Cons (f x) (map f xx)

view this post on Zulip Jeroen Engels (Jan 08 2022 at 14:36):

I don't think modulo cons is implemented in OCaml, I've seen proposals though. The way I see TRMC is basically that you can compute the data wrapper with a "hole", and the hole will be filled by a future iteration.

So if you have:

map : (a -> b) -> List a -> List b
map fn list =
  case list of
    [] -> []
    x :: xs -> fn x :: map fn xs

then you can (and I now do :sunglasses: ) transform this to:

var map = F2(
    function (fn, list) {
        var $start = _List_Cons(undefined, _List_Nil);
        var $end = $start;
        map:
        while (true) {
            if (!list.b) {
                return $start.b;
            } else {
                var x = list.a;
                var xs = list.b;
                $end.b = _List_Cons(fn(x), _List_Nil);
                $end = $end.b;
                list = xs;
                continue map;
            }
        }
    });

So we keep references to the start (start.b, start is just a wrapper to keep a reference) and the end of the list. At every iteration, we create a new list item that points to nothing, and replace the "next item" of the current end.

view this post on Zulip Jeroen Engels (Jan 08 2022 at 14:39):

Your example would be compiled to

var $something$Nil = {$: 0};
var $something$Cons = F2(
  function (a, b) {
    return {$: 1, a: a, b: b};
  });

var $something$map = function (pairs) {
  var $start = { b: null };
  var $end = $start;
  map:
  while (true) {
    if (!pairs.$) {
      $end.b = $something$Nil;
      return $start.b;
    } else {
      var x = pairs.a;
      var ps = pairs.b;
      $end.b = A2($something$Cons, f(y), null);
      $end = $end.b;
      pairs = ps;
      continue map;
    }
  }
};

view this post on Zulip Jeroen Engels (Jan 08 2022 at 14:40):

So here our hole is currently filled with null, and filled at the next iteration.

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:50):

well what we currently do is really this

map f xs =
    if xs == NULL then
        NULL

    else
        let ( x, xx ) = *xs;

        a1 = f x
        a2 = map f xx

        cell = malloc(..)

        cell.a1 = a1;
        cell.a2 = a2;

        cell

so we represent a linked list as a nullable pointer, NULL is the empty list, any non-null pointer is a Cons cell. I don't think LLVM is smart enough to make this into a loop today

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:51):

what this does do actually is re-use the memory if we determine that is safe, so this linked list map could run in constant space

view this post on Zulip Folkert de Vries (Jan 08 2022 at 14:51):

well, for the list, clearly not constant stack space, at least today

view this post on Zulip Jeroen Engels (Jan 08 2022 at 15:08):

Gotcha, that's pretty nice already, but yeah, this could probably be improved to be made into a loop I'm guessing.

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:09):

If I understand right we could do something like this (where &foo.1 in this case gets a pointer to the second field of the Cons cell)

map f list is
        when list is 
            Nil -> Nil
            Cons p ps ->
                result = Cons (f p) NULL

                join go = \xs, destination -> 
                    when xs is
                        Nil -> 
                            return result

                        Cons y yy ->
                            *destination = Cons (f y) NULL

                            jump go yy (&destination.1)

                in 
                    jump go ps (&result.1)

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:09):

so here I give the next iteration of the loop an adress to store its result in (if it has any)

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:12):

the question is then how this would interact with refcounting and our alias analysis

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:12):

and the reset-reuse mechanism that allows us to reuse stack space

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:13):

combining all of those sounds like an excellent master thesis project for someone

view this post on Zulip Jeroen Engels (Jan 08 2022 at 15:15):

Yes, I think that would work. You could maybe reduce some duplication, but that's the idea at least.

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:16):

well in the elm case you always return a heap-allocated object, even in the empty list case

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:16):

we don't

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:16):

so that's why the outer case is needed

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:17):

but, I'd imagine, the hard part is figuring out when you can perform this trick

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:17):

a tail call is relatively easy to recognize, but this seems quite a bit harder

view this post on Zulip Jeroen Engels (Jan 08 2022 at 15:17):

It's not that bad, but yes, it's a bit harder :grinning_face_with_smiling_eyes:

view this post on Zulip Jeroen Engels (Jan 08 2022 at 15:20):

So this recursion seems to work, but this is Roc code right? So you're doing manual tail recursion here. So my original question was, could the equivalent of

map : (a -> b) -> List a -> List b
map fn list =
  case list of
    [] -> []
    x :: xs -> fn x :: map fn xs

be transformed and compiled to tailrecursive call? Because this would super simple for users of the language (I mean, it's textbook recursion)

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:23):

well it could given the transformation we just discussed by sort of "forward-passing" the destination to write the result into. But we don't make any attempt to do that today

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:24):

we currently transform this into

map : (a -> b) -> List a -> List b
map fn list =
  case list of
    [] -> []
    x :: xs ->
        a1 = fn x
        a2 = map fn xs

        a1 :: a2

at which point there is no call in tail position, so we just don't touch the function

view this post on Zulip Folkert de Vries (Jan 08 2022 at 15:28):

so I guess you can see that the thing in tail position here is a cons, know that the function is recursive, and then do some further analysis to see if you can do the TC modulo cons transform?

view this post on Zulip Jeroen Engels (Jan 08 2022 at 15:30):

Yes exactly. In my case (but I'm analyzing the compiled JS code), I check whether the return value is a cons, and then if the second argument is a recursive call, then I apply the transformation (I don't need to check anything more).

view this post on Zulip Folkert de Vries (Jan 08 2022 at 16:12):

ah, so your transform only works for the builtin elm list?

view this post on Zulip Jeroen Engels (Jan 08 2022 at 19:09):

It has a special case, but I think that when I will have a more general implementation, it won't be special. For now I special case custom type constructors and lists, which are like most of what I need to cover. But if I want to handle nested recursion, the best way will be to analyze all the functions and see if they respond to a few rules that enable this optimisation, which will then enable a lot more cases.

view this post on Zulip Jeroen Engels (Jan 08 2022 at 19:13):

For now, I support:

view this post on Zulip Folkert de Vries (Jan 08 2022 at 19:34):

what makes + or * different from an arbitrary function call?

view this post on Zulip Jeroen Engels (Jan 08 2022 at 19:43):

They're operators/binary operations (at least in the compiled JS), so they need to be handled slightly differently. Also, they're associative and have a neutral value (so kind of similar to a "hole"), so we can compute their sum/product before continuing with the loop, and thee add/multiply them to the accumulated value once we reach a non-recursive return statement.

view this post on Zulip Folkert de Vries (Jan 08 2022 at 19:49):

ah I forgot you're dealing with the compiled output

view this post on Zulip Folkert de Vries (Jan 30 2022 at 13:26):

I read the ocaml paper and got a better understanding. "destination-passing style" made it click for me. That transform might be beneficial in non-recursive contexts too. Tail calls don't have the overhead of a function call (like making a new stack frame), at least when the system stack is used (this will probably be very different for JS or with WASM). With that, I think we could represent TRMC roughly as follows in our IR

map : LinkedList a, (a -> b) -> LinkedList b
map = \list, f ->
    when list is
        Nil ->
            return Nil;

        Cons x xs ->
            # basically, an uninitialized linkedlist cell
            let head = Alloc;

            join go = \first, rest, destination ->
                when rest is
                    Nil ->
                        destination @ Cons(f first, Nil);
                        return head;

                    Cons y ys ->
                        let next = Alloc;
                        destination @ Cons(f first, next);
                        jump go y ys next;
            in
                jump go x xs head;

We assume some novel Alloc construct that allocates space for a value, but leaves the memory uninitialized. We already have location @ Constructor syntax to put a constructor value at a particular memory location; we can reuse that here.

view this post on Zulip Richard Feldman (Jan 30 2022 at 13:41):

cc @Jeroen Engels if you're interested!

view this post on Zulip Jeroen Engels (Jan 30 2022 at 19:29):

@Folkert de Vries I made a PR for elm-optimize-level-2: https://github.com/mdgriffith/elm-optimize-level-2/pull/82

I didn't do the DPS style because that's less performant, but I wouldn't know what's best or most applicable in your use-case. Happy to discuss it though!

view this post on Zulip Folkert de Vries (Jan 30 2022 at 19:35):

with DPS style you mean introducing an extra function?

view this post on Zulip Folkert de Vries (Jan 30 2022 at 19:36):

5000 line PR, nice!

view this post on Zulip Jeroen Engels (Jan 30 2022 at 19:44):

Lots of tests and docs though! If you have any remarks on either, let me know. I want it all to be good to make it easier to add to the Elm compiler

view this post on Zulip Folkert de Vries (Jan 30 2022 at 19:48):

If I remember right the elm compiler already has a "looplike" construct (similar to join points?) to express tail recursion

view this post on Zulip Folkert de Vries (Jan 30 2022 at 19:48):

so you'd need to formulate the transform in terms of that

view this post on Zulip Folkert de Vries (Jan 30 2022 at 19:49):

aside: what do you do for trees? just pick the first opportunity for TRMC?

view this post on Zulip Folkert de Vries (Jan 30 2022 at 19:53):

The OCaml implementation chose to only support data construction, whereas we also support arithmetic operators,
string concatenation and list concatenation.

I'm assuming this is related to ocaml implementation details. We use LLVM which already gives us the benefits of tail calls in simple scenarios like arithmetic operations. I wouldn't be surprised if ocaml has something similar on a lower level.

view this post on Zulip Jeroen Engels (Jan 30 2022 at 20:44):

I don't think it has, I think developers have to write their recursive functions in a normal tail safe manner. I am not sure why they ended up only doing it for data constructors, and I don't know if they want to do more later.

view this post on Zulip Jeroen Engels (Jan 30 2022 at 20:45):

For trees, yeah, just take a single one. But surprisingly, this is one of the only cases where performance seems to get worse with TRMC. I still need to benchmark this properly but I don't know enough.

view this post on Zulip Jeroen Engels (Jan 30 2022 at 20:47):

For the Elm compiler and the loop like construct, do you mean in the output code, or in the compiler implementation?

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:23):

yeah, there is (in AST/Optimized.hs

  | Call Expr [Expr]
  | TailCall Name [(Name, Expr)]

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:23):

The Elm compiler has a TailCall variant in the "Optimized" AST which is the last intermediate representation before code gen. Then in code gen that gets transformed into a JavaScript while loop. That AST node is specifically used for tail calls and can't really be repurposed for other forms of recursion, at least not easily.

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:24):

right

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:24):

join points are more powerful than that

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:25):

It's quite different from Roc in that Roc has an imperative IR which you can do more stuff with

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:25):

yeah exactly

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:25):

you could add join points to that data structure, the original paper adds them to one of GHC's intermediate languages

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:26):

which is still pure functional

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:26):

and order is given by data dependency

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:28):

Yeah I only meant that Roc's IR can do more types of control flow without specific changes to the IR whereas Elm would need bigger changes

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:28):

what you'd still need is to reason about the destination somehow, within that AST. So I wonder what the minimal primitives are for that

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:29):

we already have the idea of "store a tag at a particular memory location" in our IR

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:29):

but not yet the "allocate space for a tag" idea

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:29):

Yeah Elm doesn't have anything explicitly about allocation

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:31):

and I don't think you'd want to make it too explicit. What doesn't help is that in elm at this point you have lost type info

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:32):

I think it would need some way of doing mutation. You'd need to make a Cons cell and then later change what its tail points to. So if you could do some kind of MutateField then you could initialise it to Unit or Nil and then replace it later.

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:32):

but what is a "cons cell" here? you don't know the type info, so you can't just make something out of thin air ?!

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:32):

Yeah but you'd be inserting this node before the type info is dropped

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:32):

The tail calls are detected at an earlier stage

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:33):

is the optimized ast generated before type checking ?

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:33):

no, after type checking

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:33):

I thought there was no relation between the variables and the AST in the elm compiler?

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:35):

Well you can tell that it's a Union type, you'd have to work in those terms rather than specifically for List

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:37):

ok, I just remember that it turned out to be fast to have the same representation for e.g. Cons and Nil nodes

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:37):

in JS with the jit and such

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:37):

so you'd want to "allocate" a value of the most optimal shape

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:38):

Yeah. You'd have to detect the modulo cons pattern and replace it with some special node. Maybe detecting that pattern would be too hard to do without full type info. Because as you mentioned, Elm only checks that the types are valid and says "OK" or "error", it doesn't attach type info to any AST nodes.

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:40):

Folkert de Vries said:

ok, I just remember that it turned out to be fast to have the same representation for e.g. Cons and Nil nodes

Yeah and that would also apply to any other Union. You want to find the maximum number of fields in any variant and then always use that number of fields for every variant, even if you're setting some of them to null or undefined.

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:40):

Then the JS engine sees them as all being the same shape

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:41):

So yeah if you were doing this it might be part of a whole set of changes like this... which is how Evan likes to work generally too!

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:42):

oh elm itself does not yet do that "object with max number of fields" optimization?

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:42):

Not as far as I know. It certainly didn't in 0.19.1

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:43):

And no releases since then, so no

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:43):

right

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:43):

That was some analysis that Robin did a while back but it wasn't implemented

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:44):

...yet?

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:44):

(suspenseful music)

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:44):

right but that felt like a long time ago

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:44):

yup

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:44):

yeah surely that must make it in at some point

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:46):

Yeah if there is a round of JS optimisations at some point then this would be one of the obvious ones to do.

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:46):

The currying helpers are a major factor as well.

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:46):

A2, F2, etc.

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:48):

Those two things would probably be simpler than modulo cons transformations and would probably affect more code.

view this post on Zulip Brian Carroll (Jan 30 2022 at 21:48):

I am guessing...

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:55):

on the other hand stack overflows actually crash your program, slow currying does not

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:56):

anyhow, I want Roc to have this; it's a fun bit of CS research with some nice benefits

view this post on Zulip Folkert de Vries (Jan 30 2022 at 21:56):

even if we won't be using linked lists in roc a lot

view this post on Zulip Brian Carroll (Jan 30 2022 at 22:17):

Yeah absolutely! I think it's a good time for it in Roc.
In Elm there are other things that might be higher priority, because it only has one or two optimizations currently. Most of the 0.19 optimizations were in compiler speed rather than program speed.

view this post on Zulip Jeroen Engels (Jan 30 2022 at 22:45):

@Brian Carroll yeah the optimization about always the same amount of fields is not in eol2 nor the compiler

view this post on Zulip Jeroen Engels (Jan 30 2022 at 22:46):

A bit hard to implement in eol2 since we don't know what the original types are though...

view this post on Zulip Folkert de Vries (Jan 30 2022 at 22:47):

but then neither does the current elm compiler

view this post on Zulip Jeroen Engels (Jan 30 2022 at 22:48):

That's true, but it could change that. since eol2 can be run with only the JS, it can't rely on looking at the Elm code

view this post on Zulip Jeroen Engels (Jan 30 2022 at 22:49):

I am super happy and excited about TRMC in Elm because it makes a lot of recursive functions simpler to write. Here are a few rewrites I did:

https://github.com/jfmengels/core/compare/2fa34772a2575d036c0871b4390379741e6f5f91...new-tail-recursion

https://github.com/elm-community/list-extra/compare/master...jfmengels:new-tail-recursion?expand=1

view this post on Zulip Jeroen Engels (Jan 30 2022 at 22:49):

But I don't know hoW applicable it would be to Roc if you don't use lists all that much.

view this post on Zulip Folkert de Vries (Jan 30 2022 at 22:50):

well for us I think it clearly would help for trees

view this post on Zulip Folkert de Vries (Jan 30 2022 at 22:51):

also putting functions in tail position in general (even without recursion) can have benefits

view this post on Zulip Folkert de Vries (Jan 30 2022 at 22:52):

and I suspect that thinking a bit about memory locations in our IR can expose other optimizations down the line


Last updated: Jun 16 2026 at 16:19 UTC