Stream: ideas

Topic: allocator as state machine entry


view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:40):

As a spin off of discussion in #ideas > Purity inference proposal v3 and #ideas>`crash` as state machine entry. I think instead of having the platform pass in explicit allocators, we should consider just having the allocation as part of the effect state machine.

The platform still has the same amount of control. We no longer have to deal with passing around an extra data structure to all functions that might allocate.

Benefits: Simplicity and consistency.
Cost: Suspension of intermediate state at each allocation.


This feels like something we should consider in general. It may be too heavy of a perf cost to be worthwhile. Really depends on how fast suspension can be (assuming we reuse the same buffer and minimize data movement it may not be a big deal). If allocations are in the hot loop perf is probably already rough. This could throw it over the edge. So may not be worth it.

I think this would mainly be a huge boon in terms of simplicity and maintainability. But it may not be the right tradeoff for perf.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:43):

To clarify, the platform has the same amount of control cause each time it calls to roc, it is starting a state machine execution. For each of those calls it can use a different state machine with its own arena allocator or whatever other allocator design.

view this post on Zulip Sam Mohr (Sep 18 2024 at 02:48):

This would be great! We eventually want to reduce the platform authoring complexity to the point that developers feel they can implement their own platforms. I think if this was added, plus more glue being added, we could end up doing 90% of the scaffolding work for users.

view this post on Zulip Luke Boswell (Sep 18 2024 at 03:04):

we should consider just having the allocation as part of the effect state machine

Can someone explain what this means? Do we still have a roc_alloc that is called by roc?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 03:07):

Eventual future with Effect interpretters.

The platform roughly sees a single tag union:

Op: [
    StdoutLine Str \{} -> Op,
    StdinLine \Str -> Op,
]

I am suggesting that we automatically add:

    RocAlloc U64 U32 \Ptr -> Op
    RocDealloc Ptr \{} -> Op
    RocEtc ...

The effect interpreter implemented on the host would just handle this as part of a big switch statement.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 03:08):

The alternative (and current plan) is explicit allocators. Where the platform creates an allocator struct and passes that into every single call to roc. Roc would implicitly thread that behind the scenes to all functions that need to allocate. The struct would be a function pointer to roc_alloc, roc_dealloc, etc. It would also be a pointer to arbitrary data (void*) that contains any allocator state.

view this post on Zulip Luke Boswell (Sep 18 2024 at 03:16):

This is a really neat idea!

view this post on Zulip Luke Boswell (Sep 18 2024 at 03:17):

Yeah, definitely want good glue for tag unions with payloads, but this would simplify platform development a bit I think.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 03:18):

definitely want good glue for tag unions with payload

Yeah, this definitely will be required for effect interpreter in general.

view this post on Zulip Sam Mohr (Sep 18 2024 at 03:19):

Looking through the basic-cli host lib, we still have a few functions like roc_dbg and roc_panic that aren't yet proposed for addition to the state machine. Would we want to add them as well, or is there anything that shouldn't/couldn't be supported?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 03:21):

Debug and (expect_failed when we add it) do printing and are slower, so should be fine to be part of the state machine

Allocations are probably fine.

Panic (and crash) are probably fine if we implement exception handling for pure function hot loops with crash in them.

view this post on Zulip Luke Boswell (Sep 18 2024 at 03:32):

@Brendan Hansknecht fine here means we can support them right?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:00):

We could make them all use the state machine.

There are zero concerns with making debug and expect_failed use the state machine.

There are minor concerns around perf for the allocation functions, but my guess is they probably would not be an issue if using the state machine.

With crash, there are extra complexities to handle around hot loops in pure functions that call crash (those absolutely 100% need to stay fast and have essentially zero cost except if the cash is hit). That is the discussion happening in #ideas>`crash` as state machine entry and I think we have a path forward to making it work fine.

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:05):

I think a pretty interesting argument for this direction is that there's both a perf benefit and an implementation simplification benefit to not having to thread through any kind of struct to say what all the allocators are etc

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:06):

like if we can get rid of all of them and the state machine is all there is

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:07):

so a potential thing we could try is switching over to just state machine on current platforms, and just see how the perf is in practice

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:07):

compared to status quo

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:09):

and if we're unhappy with that perf in practice, then we'd have a clear motivation to implement the other design

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:11):

Yeah. I think that will be worth testing. My guess is that it will be noticably worse unless we do some smart optimizations around layout and minimizing copy of data captured in the state machine. Like the tag layout algorithm today focuses on minimizing memory usage. For the state machine, we really want to minimize data movement more than memory use (so a lifetime based api of some sort). Like if I see:

someTask! {}
x = 3
someTask! {}

I would hope that 3 is just stuck somewhere in the state machine without any other data movement. In current roc, placing the 3 in the state machine capture could lead to a shift of all data in the capture.

Aside: obviously in a perfect world, we would actually never capture the 3 in this case. We would just move the definition to where it needs to be used.

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:13):

yeah I also would anticipate that it would be noticeably worse

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:14):

but I do wonder if there are any interesting platform designs that would otherwise be impossible

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:14):

Yeah, I think this can be pretty easily seen by the amount of data movement generated by the captures in wasm4. They actually still run pretty darn fast, but they add up to a crazy amount of code bloat. Most code in a wasm4 app is spent moving data from one closure capture to the next.

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:15):

hm yeah

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:15):

oh yeah replacing all of those with direct ffi-style calls would be epic :big_smile:

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:16):

bc they don't want to be async anyway, right?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:16):

Yeah, though in most cases the capture is just slowly growing, so preallocating one giant capture would also fix the issue pretty well (obviously not as well as just making everything sync).

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:18):

So with reusing the same capture allocation and laying out to minimize data movement, I think it would be very very close to the sync code. But definitely still have extra overhead.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:18):

Not sure what the percentages would actually be. Very very small would be my guess.

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:19):

:thinking: is there some way we could facilitate reusing the same capture allocation for async effects?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 04:20):

I don't see why not. I guess we would need a refcount the capture allocation as a whole to make sure the host isn't holding on to an extra copy.

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:23):

eh we could just tell the host not to

view this post on Zulip Richard Feldman (Sep 18 2024 at 04:23):

there are plenty of host rules around "don't do this or else UB"

view this post on Zulip Oskar Hahn (Sep 20 2024 at 07:26):

What about platforms, that do not want to use tasks?

Currently, it is relatively easy to write a platform, where the roc function takes some argument and returns something without doing effects. Like main : \Str -> {part1: Str, part2: Str} for Advent of Code. At the moment, calling roc from the host is like calling any other function, you just have to transform the memory.

Would this proposal (or the effect interpreter proposal in general) mean, that this is no longer possible and you always have to handle the state machine?

In any case, I like this proposal. I think it will make interesting stuff easy, like arena allocators or zero allocation calls. In go, I currently have to use C.malloc to make the garbage collector happy. With this proposal, it should be possible to use a native go []byte for the roc memory.

view this post on Zulip Brendan Hansknecht (Sep 20 2024 at 12:28):

Oskar Hahn said:

Would this proposal (or the effect interpreter proposal in general) mean, that this is no longer possible and you always have to handle the state machine?

Yes, but I'm pretty sure that Str -> { part1: Str, part2: Str } would need to use the effect interpreter. Cause it will be needed for allocations and such.

view this post on Zulip Brendan Hansknecht (Sep 20 2024 at 12:28):

Though hopefully it has more consistent glue generation and other simplifications


Last updated: Jun 16 2026 at 16:19 UTC