allocator as state machine entry · ideas

The platform still has the same amount of control. We no longer have to deal with passing around an extra data structure to all functions that might allocate.

Benefits: Simplicity and consistency.
Cost: Suspension of intermediate state at each allocation.

This feels like something we should consider in general. It may be too heavy of a perf cost to be worthwhile. Really depends on how fast suspension can be (assuming we reuse the same buffer and minimize data movement it may not be a big deal). If allocations are in the hot loop perf is probably already rough. This could throw it over the edge. So may not be worth it.

I think this would mainly be a huge boon in terms of simplicity and maintainability. But it may not be the right tradeoff for perf.

Brendan Hansknecht (Sep 18 2024 at 02:43):

To clarify, the platform has the same amount of control cause each time it calls to roc, it is starting a state machine execution. For each of those calls it can use a different state machine with its own arena allocator or whatever other allocator design.

Sam Mohr (Sep 18 2024 at 02:48):

This would be great! We eventually want to reduce the platform authoring complexity to the point that developers feel they can implement their own platforms. I think if this was added, plus more glue being added, we could end up doing 90% of the scaffolding work for users.

Luke Boswell (Sep 18 2024 at 03:04):

Can someone explain what this means? Do we still have a roc_alloc that is called by roc?

Brendan Hansknecht (Sep 18 2024 at 03:07):

Op: [
    StdoutLine Str \{} -> Op,
    StdinLine \Str -> Op,
]

    RocAlloc U64 U32 \Ptr -> Op
    RocDealloc Ptr \{} -> Op
    RocEtc ...

The effect interpreter implemented on the host would just handle this as part of a big switch statement.

Brendan Hansknecht (Sep 18 2024 at 03:08):

The alternative (and current plan) is explicit allocators. Where the platform creates an allocator struct and passes that into every single call to roc. Roc would implicitly thread that behind the scenes to all functions that need to allocate. The struct would be a function pointer to roc_alloc, roc_dealloc, etc. It would also be a pointer to arbitrary data (void*) that contains any allocator state.

Luke Boswell (Sep 18 2024 at 03:16):

Luke Boswell (Sep 18 2024 at 03:17):

Yeah, definitely want good glue for tag unions with payloads, but this would simplify platform development a bit I think.

Brendan Hansknecht (Sep 18 2024 at 03:18):

Sam Mohr (Sep 18 2024 at 03:19):

Looking through the basic-cli host lib, we still have a few functions like roc_dbg and roc_panic that aren't yet proposed for addition to the state machine. Would we want to add them as well, or is there anything that shouldn't/couldn't be supported?

Brendan Hansknecht (Sep 18 2024 at 03:21):

Debug and (expect_failed when we add it) do printing and are slower, so should be fine to be part of the state machine

Panic (and crash) are probably fine if we implement exception handling for pure function hot loops with crash in them.

Luke Boswell (Sep 18 2024 at 03:32):

Brendan Hansknecht (Sep 18 2024 at 04:00):

There are zero concerns with making debug and expect_failed use the state machine.

There are minor concerns around perf for the allocation functions, but my guess is they probably would not be an issue if using the state machine.

With crash, there are extra complexities to handle around hot loops in pure functions that call crash (those absolutely 100% need to stay fast and have essentially zero cost except if the cash is hit). That is the discussion happening in #ideas>`crash` as state machine entry and I think we have a path forward to making it work fine.

Richard Feldman (Sep 18 2024 at 04:05):

I think a pretty interesting argument for this direction is that there's both a perf benefit and an implementation simplification benefit to not having to thread through any kind of struct to say what all the allocators are etc

Richard Feldman (Sep 18 2024 at 04:06):

Richard Feldman (Sep 18 2024 at 04:07):

so a potential thing we could try is switching over to just state machine on current platforms, and just see how the perf is in practice

Richard Feldman (Sep 18 2024 at 04:07):

Richard Feldman (Sep 18 2024 at 04:09):

and if we're unhappy with that perf in practice, then we'd have a clear motivation to implement the other design

Brendan Hansknecht (Sep 18 2024 at 04:11):

Yeah. I think that will be worth testing. My guess is that it will be noticably worse unless we do some smart optimizations around layout and minimizing copy of data captured in the state machine. Like the tag layout algorithm today focuses on minimizing memory usage. For the state machine, we really want to minimize data movement more than memory use (so a lifetime based api of some sort). Like if I see:

someTask! {}
x = 3
someTask! {}

I would hope that 3 is just stuck somewhere in the state machine without any other data movement. In current roc, placing the 3 in the state machine capture could lead to a shift of all data in the capture.

Aside: obviously in a perfect world, we would actually never capture the 3 in this case. We would just move the definition to where it needs to be used.

Richard Feldman (Sep 18 2024 at 04:13):

Richard Feldman (Sep 18 2024 at 04:14):

but I do wonder if there are any interesting platform designs that would otherwise be impossible

Brendan Hansknecht (Sep 18 2024 at 04:14):

Yeah, I think this can be pretty easily seen by the amount of data movement generated by the captures in wasm4. They actually still run pretty darn fast, but they add up to a crazy amount of code bloat. Most code in a wasm4 app is spent moving data from one closure capture to the next.

Richard Feldman (Sep 18 2024 at 04:15):

oh yeah replacing all of those with direct ffi-style calls would be epic :big_smile:

Richard Feldman (Sep 18 2024 at 04:16):

Brendan Hansknecht (Sep 18 2024 at 04:16):

Yeah, though in most cases the capture is just slowly growing, so preallocating one giant capture would also fix the issue pretty well (obviously not as well as just making everything sync).

Brendan Hansknecht (Sep 18 2024 at 04:18):

So with reusing the same capture allocation and laying out to minimize data movement, I think it would be very very close to the sync code. But definitely still have extra overhead.

Brendan Hansknecht (Sep 18 2024 at 04:18):

Not sure what the percentages would actually be. Very very small would be my guess.

Richard Feldman (Sep 18 2024 at 04:19):

:thinking: is there some way we could facilitate reusing the same capture allocation for async effects?

Brendan Hansknecht (Sep 18 2024 at 04:20):

I don't see why not. I guess we would need a refcount the capture allocation as a whole to make sure the host isn't holding on to an extra copy.

Richard Feldman (Sep 18 2024 at 04:23):

Oskar Hahn (Sep 20 2024 at 07:26):

Currently, it is relatively easy to write a platform, where the roc function takes some argument and returns something without doing effects. Like main : \Str -> {part1: Str, part2: Str} for Advent of Code. At the moment, calling roc from the host is like calling any other function, you just have to transform the memory.

Would this proposal (or the effect interpreter proposal in general) mean, that this is no longer possible and you always have to handle the state machine?

In any case, I like this proposal. I think it will make interesting stuff easy, like arena allocators or zero allocation calls. In go, I currently have to use C.malloc to make the garbage collector happy. With this proposal, it should be possible to use a native go []byte for the roc memory.

Brendan Hansknecht (Sep 20 2024 at 12:28):

Yes, but I'm pretty sure that Str -> { part1: Str, part2: Str } would need to use the effect interpreter. Cause it will be needed for allocations and such.

Brendan Hansknecht (Sep 20 2024 at 12:28):

Though hopefully it has more consistent glue generation and other simplifications

Stream: ideas

Topic: allocator as state machine entry

Brendan Hansknecht (Sep 18 2024 at 02:40):

Brendan Hansknecht (Sep 18 2024 at 02:43):

Sam Mohr (Sep 18 2024 at 02:48):

Luke Boswell (Sep 18 2024 at 03:04):

Brendan Hansknecht (Sep 18 2024 at 03:07):

Brendan Hansknecht (Sep 18 2024 at 03:08):

Luke Boswell (Sep 18 2024 at 03:16):

Luke Boswell (Sep 18 2024 at 03:17):

Brendan Hansknecht (Sep 18 2024 at 03:18):

Sam Mohr (Sep 18 2024 at 03:19):

Brendan Hansknecht (Sep 18 2024 at 03:21):

Luke Boswell (Sep 18 2024 at 03:32):

Brendan Hansknecht (Sep 18 2024 at 04:00):

Richard Feldman (Sep 18 2024 at 04:05):

Richard Feldman (Sep 18 2024 at 04:06):

Richard Feldman (Sep 18 2024 at 04:07):

Richard Feldman (Sep 18 2024 at 04:07):

Richard Feldman (Sep 18 2024 at 04:09):

Brendan Hansknecht (Sep 18 2024 at 04:11):

Richard Feldman (Sep 18 2024 at 04:13):

Richard Feldman (Sep 18 2024 at 04:14):

Brendan Hansknecht (Sep 18 2024 at 04:14):

Richard Feldman (Sep 18 2024 at 04:15):

Richard Feldman (Sep 18 2024 at 04:15):

Richard Feldman (Sep 18 2024 at 04:16):

Brendan Hansknecht (Sep 18 2024 at 04:16):

Brendan Hansknecht (Sep 18 2024 at 04:18):

Brendan Hansknecht (Sep 18 2024 at 04:18):

Richard Feldman (Sep 18 2024 at 04:19):

Brendan Hansknecht (Sep 18 2024 at 04:20):

Richard Feldman (Sep 18 2024 at 04:23):

Richard Feldman (Sep 18 2024 at 04:23):

Oskar Hahn (Sep 20 2024 at 07:26):

Brendan Hansknecht (Sep 20 2024 at 12:28):

Brendan Hansknecht (Sep 20 2024 at 12:28):