Stream: ideas

Topic: `crash` as state machine entry


view this post on Zulip Richard Feldman (Sep 18 2024 at 00:50):

splitting this off from:

Richard Feldman said:

so in this design, I think we have a type variable behind the scenes which tracks which of these 4 function types a given function has (only 2 of the types are visible, namely pure vs effectful - but we need to track all 4 as distinct from one another behind the scenes, in order to compile the way we want):

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:50):

so today, when a crash happens, we immediately call the host and say "whatever is going on in the call stack right now, you need to stop what you're doing and deal with this crash"

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:51):

another design we could go with is to treat crash as an async state machine entry as described above :point_up:

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:51):

essentially putting it in the "pure and concurrent" bucket

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:52):

such that when a crash occurs, the host just sees a normal-looking return with a state machine entry that has no continuation in it

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:54):

off the top of my head, some of the tradeoffs involved here:

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:55):

(this is essentially the "RocResult" design from awhile back, except rolling it into the state machine instead of wrapping the state machine, since the state machine already has a discriminant, so why introduce a second one?)

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 00:55):

Oh, and then roc would clean everything up before the return. That would actually be really awesome (though wasteful in some platforms with arenas)

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:56):

yeah, there are also implications for stack traces

view this post on Zulip Sam Mohr (Sep 18 2024 at 00:56):

Though crash should be sparing enough that we don't really care about said waste

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:56):

e.g. right now hosts can grab a backtrace right inside the crash handler, and the stack still exists

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:56):

whereas if we wanted to get a trace to them, we'd need to capture it before returning the state machine entry etc.

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:57):

but I think that's something we'll want to figure out anyway for async stack traces, so I'm not considering it a tradeoff really

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:57):

Sam Mohr said:

Though crash should be sparing enough that we don't really care about said waste

I haven't thought it through all the way, but I think there could be a perf impact even if the crash doesn't occur

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:58):

although actually that might not be true anymore if it's just one more discriminant in the state machine :thinking:

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:58):

certainly it was true in the RocResult design

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:58):

but maybe doesn't apply anymore

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 00:58):

I feel like in my mind, the perfect implementation would:

  1. let the host pick if clean is run or not
  2. be implemented using exception handling mechanism that have basically zero runtime cost in the good case
  3. also let the host choose if roc will generate a nice stack trace for the crash (would just be an optional part of the crash tag)

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 00:59):

although actually that might not be true anymore if it's just one more discriminant in the state machine

In an already effectful function, it is essentially no extra cost.
In a pure function, it is extra cost.

view this post on Zulip Richard Feldman (Sep 18 2024 at 00:59):

yeah @Folkert de Vries and I had the exception handling thing implemented a long time ago...it didn't go great :sweat_smile:

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:01):

Yeah, for proper exception we need full debug info with exception frames that track everything that must be refcounted if an exception is thown. It also has to walk the stack a frame at a time as it unwinds

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:01):

or, I guess we just need exception frames and not full debug info

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:02):

Some of this stuff is built at least partially into llvm, but I don't think it is simple to implement

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:03):

yeah based on our experience last time I don't think we want to go down that road again :laughing:

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:04):

Honestly I think that is a mistake. Essentially all programming langauges have exceptions. Most llvm supported languages have them. So they can't be that hard to implement even if it is opaque and a general pain to do so.

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:05):

what about dev backend?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:07):

In the worst case, dev backend could go the result under the hood route. But I assume once we figure things out in llvm, it will be easier to figure things out in the dev backend.

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:08):

that's interesting, I hadn't thought about that!

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:09):

one of the problems as I recall was that LLVM basically requires you to link libcpp for the exceptions to work

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:09):

and trying to remove that dependency was...not straightforward haha

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:10):

I'm imagining a hot loop that is using a dictionary. It has a crash for the impossible case of loading an out of bounds element. If that function and everything call it has to be turned into a result under the hood to deal with crash, it will lead to major perf regressions. Any hot loop with a crash in it would hit this.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:10):

one of the problems as I recall was that LLVM basically requires you to link libcpp for the exceptions to work

... That would really suck

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:10):

I wonder what rust does

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:11):

do they use llvm exceptions? :thinking:

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:12):

I'm not sure, but they definitely have catchable unwinds, an llvm backend, and I didn't think they always linked libc++.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:14):

As a note, we technically could add crash to the task state machine, but still use setjmp and longjmp for pure functions. Just jump back to where we generate the crash for the state machine. Of course, that wouldn't deal with cleanup, but if we don't have a good way to deal with cleanup, that still could be a nicer interface for platforms without harming perf.

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:15):

true!

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:17):

as I recall, the basic way that they deal with cleanup is that each function gets a little header in the machine code that runs to perform cleanup if it's unwinding

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:17):

so you specify that in llvm and it puts it in the machine code

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:18):

also, there's a "personality function" that is also a little header, and it's for catch - basically a way to say "here's what my class is" or something like that, so your code can detect whether it's time to stop unwinding and run the catch code

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:18):

but we wouldn't need that aspect

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:18):

or rather we'd only need it at the entrypoint from the host

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:19):

anyway, I agree that this would be the best for both perf and host ergonomics if we could make it work

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:19):

one important prerequisite would be figuring out how to do it without libcpp :big_smile:

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:19):

I think that was where we got stuck last time

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:19):

because I think we had the other stuff working

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:20):

Good to know

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:24):

I wonder if we'll have to do something like statically link libunwind or something.

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:26):

I think libunwind is only part of it

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:26):

but we could prob just get the sources from that and import them into our zig builtin code, because zig is awesome like that :grinning_face_with_smiling_eyes:

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:26):

we didn't have zig back when we tried this last time haha

view this post on Zulip Richard Feldman (Sep 18 2024 at 01:26):

we may want libunwind regardless for async backtraces

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:36):

does zig have exceptions? Can they just tell us how to do everything?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:37):

:tears: They tend to be super helpful and have low dependency ways of doing things.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:45):

Oh, it looks like they just have printing an error, dumping a stack trace, and then hanging.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 01:45):

So no unwind and what not

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:16):

Not that I understand the pieces yet, but rust's implementation seems to exist in these locations and only depends on libunwind (or libgcc), not libc++ or libstdc++ (at least from what I can tell).

https://github.com/rust-lang/rust/blob/master/library/std/src/panicking.rs
https://github.com/rust-lang/rust/tree/master/library/panic_unwind/src
https://github.com/rust-lang/rust/tree/master/library/unwind/src

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:22):

Then this just walks the landing pads and what not create by the llvm ir

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:47):

And an example using only c and llvm. No linking to anything c++:
https://youtu.be/gH5-lITYrMg?si=nf7DFINdmhxDBQRl&t=1110

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:48):

Source before they switch to using c++ (so just c and llvm): https://github.com/AlexDenisov/llvm-social-exception-handling/tree/main/05

view this post on Zulip Richard Feldman (Sep 18 2024 at 02:49):

whoa!

Brendan Hansknecht said:

And an example using only c and llvm. No linking to anything c++:
https://youtu.be/gH5-lITYrMg?si=nf7DFINdmhxDBQRl&t=1110

whoa! :open_mouth:

view this post on Zulip Richard Feldman (Sep 18 2024 at 02:50):

:thinking: so if desired, we could theoretically switch to that already, if we wanted to switch from roc_panic to RocResult?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:50):

yes

view this post on Zulip Richard Feldman (Sep 18 2024 at 02:50):

oh I guess dev backend wouldn't love that though

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:51):

also yes. Need to figure out generating these landing pads and eh headers from the dev backend as well.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:52):

Also, I'm guessing the issue was needing to implement your own personality functions and what not instead of depending on the c++ ones.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:52):

Also, no idea how this all works in wasm

view this post on Zulip Richard Feldman (Sep 18 2024 at 02:53):

how do we do crashes in wasm today?

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:55):

we call roc_panic and then let the host language figure it out. So we let zig deal with generating it.

view this post on Zulip Brendan Hansknecht (Sep 18 2024 at 02:56):

And I think it calls some sort of wasm halt instruction

view this post on Zulip Richard Feldman (May 18 2025 at 15:11):

I realized something about the whole "automatic unwinding such that host calls to Roc functions return a Result" idea: there's basically no way to have Roc handle stack overflows automatically in this way

view this post on Zulip Richard Feldman (May 18 2025 at 15:14):

that is, the way stack overflow handling works (and has to work) is:

view this post on Zulip Richard Feldman (May 18 2025 at 15:14):

in other words, if hosts want to gracefully handle stack overflows in Roc programs (which they should!) then they already need to deal with the circumstances of today's roc_panic

view this post on Zulip Richard Feldman (May 18 2025 at 15:15):

so it's actually better to not do the whole "Roc functions return a Result to the host" because the host needs to deal with the "gracefully clean up a Roc program, including unwinding the stack and dealing with heap resources/file handles/etc. in the middle of the Roc program's execution" thing no matter what because of stack overflows

view this post on Zulip Richard Feldman (May 18 2025 at 15:16):

so the roc_panic design lets the host reuse code between the stack overflow handling logic and the "Roc executed a crash" handling logic

view this post on Zulip Brendan Hansknecht (May 18 2025 at 15:51):

I kinda agree, kinda don't. I think in practice, most programs accept that crash on stack overflow is fine behaviour.

view this post on Zulip Brendan Hansknecht (May 18 2025 at 15:51):

But I do agree that is important to be able to handle it.

view this post on Zulip Richard Feldman (May 18 2025 at 18:01):

Brendan Hansknecht said:

I think in practice, most programs accept that crash on stack overflow is fine behaviour.

sure, but for those programs it's presumably fine to crash on crash too :smile:

view this post on Zulip Sky Rose (May 18 2025 at 18:02):

Richard Feldman said:

These don't seem unsolvable.

To the first point: Can Roc provide the host an init function it has to call before it can call any other roc entry points? Or alternatively, can Roc maintain some state (at the top of the stack?) about whether init has been called?

To the second: can the host provide a function to Roc for a custom stack overflow handler? Roc gets the stack overflow first, wraps it in a result, and then passes the result to the host's callback.

These would make the interface between host and Roc more complex, so there's a tradeoff. But if that can provide a better abstraction boundary over Roc crashes, it could be worth considering.

view this post on Zulip Richard Feldman (May 18 2025 at 18:02):

Sky Rose said:

Richard Feldman said:

These don't seem unsolvable.

:thinking: can you give an example of how that could be done?

view this post on Zulip Richard Feldman (May 18 2025 at 18:05):

like for example, let's say the host wants to do its own custom stack overflow handling via a segfault handler (for stack overflows in the host itself), and wants to incorporate into that handler the logic for handling a stack overflow in a call to a roc function

view this post on Zulip Richard Feldman (May 18 2025 at 18:08):

also, in wasm there is no way to do this in wasm itself; the best you can (apparently) do is to have a try/catch in the JavaScript code that invokes the wasm, and then it can inspect the error message string to try to guess whether it was a stack overflow

view this post on Zulip Richard Feldman (May 18 2025 at 18:09):

anyway, the reason I ask is because I started from the premise that these seemed solvable and then (after a lot of investigation) concluded that this was the best way to go...it's very possible that I missed something, but if so, I need to know the specific design that I missed! :smile:

view this post on Zulip Sky Rose (May 18 2025 at 18:10):

Okay, I don't have a solution in mind. I was just unconvinced by your short summary. If there's a bigger proof or a previous attempt backing up that argument, then I certainly don't have anything better.

view this post on Zulip Sky Rose (May 18 2025 at 18:12):

"The host needs its own stack overflow handler for the host stack, and so roc can't have a stack overflow handler for the Roc stack" is a more convincing reason than the bullet point I quoted.

view this post on Zulip Brendan Hansknecht (May 18 2025 at 18:35):

Richard Feldman said:

Brendan Hansknecht said:

I think in practice, most programs accept that crash on stack overflow is fine behaviour.

sure, but for those programs it's presumably fine to crash on crash too :smile:

I think there is overlap, but I wouldn't call this necessarily correct. Different classes of errors with different expectations. Like taking down a server due to an int overflow is very different than taking it down from a stack overflow in my opinion....but I see your point.

view this post on Zulip Brendan Hansknecht (May 18 2025 at 18:37):

All this to say, I think it would be reasonable to turn crashes into results, but still have stack overflows.

view this post on Zulip Brendan Hansknecht (May 18 2025 at 18:39):

That being said, I don't feel strong either way at this point....but the concept of a simple recovery from a crash is important....currently in roc, that is not easy. And forcing arenas is not necessarily the solution....so we may want to think deeper about that.

view this post on Zulip Anthony Bullard (May 18 2025 at 19:54):

i think a platform that supports concurrency through co-routines want a way to have a stack over flow in one coroutine not crash the entire system

view this post on Zulip Brendan Hansknecht (May 18 2025 at 19:55):

Oh sure, but a platform can always do that no matter how we design Roc. Really the question is if after a stack overflow they can cleanup the garbage left behind.


Last updated: Jun 16 2026 at 16:19 UTC