Closure Data Copying · compiler development

At some point we need to really investigate closure data copying in Roc. Especially to compare it to either a manual implementation in c/zig or to compare it to rust async (or similar).

I am still kinda stunned that in an example like rocci-bird, essentially all of the code size seems to come from simply copying data from one closure to the next. Which probably also means that a significant portion of the runtime is just copying data between closures.

It fundamentally feels off to me. Maybe we are simply generating in a way that llvm can't optimize. May we need to unify our closure captures different to reduce data movement. I'm really not sure, but this really feels off to me.

Richard Feldman (May 01 2024 at 22:22):

I wonder how hard it would be to start passing them around by reference if they're over a certain size

Brendan Hansknecht (May 01 2024 at 22:48):

I think the issue is that essentially every new await captures slightly different data (cause new local variables). Instead of having space and just adding the single new variable to the closure capture, we create a totally new closure capture with a slightly different layout.

state = SomeBigRecord

# captures {state}
clickedMouse = getMouse!

# captures {state, clickedMouse}
pressedW = getKey! 'w'

activateAbility = calc state pressedW

if clickedMouse && activateAbility then
    # captures {state, activateAbility}
    changeColorPalette!

nextState =
    if activateAbility then
        # something that uses state ...
    else
        # something that uses state ...

# captures {nextState}
displayState! nextState

Task.ok nextState

Optimally, this would be one shared closure capture. It would have enough space for everything and state would never move around.

Richard Feldman (May 01 2024 at 22:49):

Richard Feldman (May 01 2024 at 22:54):

not saying we should do this, but if we had nested closures capture strictly more as they got more and more nested, then:

Richard Feldman (May 01 2024 at 22:54):

Brendan Hansknecht (May 01 2024 at 22:56):

Due to each cluster capture being is own alloca and everything being nested, I think we also have all 4 closure captures alive for a really long time(so much more memory use than sharing even in the naive way).

Richard Feldman (May 01 2024 at 23:25):

Richard Feldman (May 01 2024 at 23:26):

like the state in between operations has to be saved somewhere, and that either results in a big chunk of memory that lives a long time, or else copying between minimally-sized chunks of memory

Richard Feldman (May 01 2024 at 23:28):

I guess there could be some cleverness to try to reduce that, e.g. choosing layouts where values that refer to the same thing will be in the same place from one capture to the next, so if we're able to reuse that memory then we don't need to copy those particular fields because they're already in the right place

Richard Feldman (May 01 2024 at 23:32):

I don't know how hard of a problem that is to automate (seems like the type of thing where there's an off-the-shelf algorithm somewhere which optimizes it) but it could be a helpful exercise to try to find a manual organization of the rocci bird captures that would minimize copying and memory usage

Richard Feldman (May 01 2024 at 23:32):

like if you were doing all of it by hand in Zig or something, what's the most efficient layout and copying strategy you could come up with

Brendan Hansknecht (May 01 2024 at 23:48):

For sure, some of this is inherent. That said, I think we currently use more memory overall and do a bunch of copying.

Brendan Hansknecht (May 01 2024 at 23:49):

But yeah, let me try to port to zig while still using an async io style with capturing lambas. not sure when I will have time, but should be a really good exercise in what we could theoretically generate.

Brendan Hansknecht (May 03 2024 at 16:26):

Richard Feldman (May 03 2024 at 20:14):

so I guess not only is it optimizable, but rustc already implemented it! :laughing:

Brendan Hansknecht (May 03 2024 at 21:30):

Yeah, in the roc context. It would mean collapsing the state of nested closures that tail call each other. Which is most often seen in Task.await. but also result.try and similar.

Brendan Hansknecht (May 03 2024 at 21:31):

In current roc, llvm often manages to inline these chains, but still creates a state for every single closure and copies around tons of data.

Folkert de Vries (May 12 2024 at 13:28):

Also, this problem is still not fully solved in the rust compiler. The latest attempt is https://github.com/rust-lang/rust/pull/120168 which seems like it's another step in the right direction, but notes that there are still cases that are not covered.

Richard Feldman (May 12 2024 at 15:01):

Agus Zubiaga (May 12 2024 at 15:14):

Brendan Hansknecht (May 12 2024 at 15:20):

Ayaz Hafiz (May 12 2024 at 16:19):

There are a couple ideas to resolve this that would likely work well.
One is to more aggressively specialize unique lambdas to the same type. Part of the issue is that lambda sets monomorphize very aggressively, so two closures may have similar (but not the same) layouts, and be forced into different types. This can be seen for example with nested Task.await calls, where each nested Task.await is a separate closure. It may be better to force all nested calls to be the same closure type.

Ayaz Hafiz (May 12 2024 at 16:22):

Another is to store closures that are additive as a linked list. If you indicate in the closure type the name of a symbol that the closure references (which is already done, at least at the type-checking level), you can avoid unpacking and repacking closure sets by instead representing each closure set as a linked list of records. When you need to add new data, append to the linked list. This used to be an approach for closure compilation, though I don't know how popular it is today.

Brendan Hansknecht (May 12 2024 at 16:53):

Yeah, I think the nested closures being the same type would be a huge win and fix this issue for the most part.

Stream: compiler development

Topic: Closure Data Copying

Brendan Hansknecht (May 01 2024 at 21:09):

Richard Feldman (May 01 2024 at 22:22):

Brendan Hansknecht (May 01 2024 at 22:48):

Richard Feldman (May 01 2024 at 22:49):

Richard Feldman (May 01 2024 at 22:54):

Richard Feldman (May 01 2024 at 22:54):

Brendan Hansknecht (May 01 2024 at 22:56):

Richard Feldman (May 01 2024 at 23:25):

Richard Feldman (May 01 2024 at 23:25):

Richard Feldman (May 01 2024 at 23:26):

Richard Feldman (May 01 2024 at 23:28):

Richard Feldman (May 01 2024 at 23:32):

Richard Feldman (May 01 2024 at 23:32):

Brendan Hansknecht (May 01 2024 at 23:48):

Brendan Hansknecht (May 01 2024 at 23:49):

Brendan Hansknecht (May 03 2024 at 16:26):

Richard Feldman (May 03 2024 at 20:14):

Richard Feldman (May 03 2024 at 20:14):

Richard Feldman (May 03 2024 at 20:14):

Richard Feldman (May 03 2024 at 20:14):

Brendan Hansknecht (May 03 2024 at 21:30):

Brendan Hansknecht (May 03 2024 at 21:31):

Folkert de Vries (May 12 2024 at 13:28):

Richard Feldman (May 12 2024 at 15:01):

Agus Zubiaga (May 12 2024 at 15:14):

Brendan Hansknecht (May 12 2024 at 15:20):

Ayaz Hafiz (May 12 2024 at 16:19):

Ayaz Hafiz (May 12 2024 at 16:22):

Brendan Hansknecht (May 12 2024 at 16:53):