splitting this off from:
Richard Feldman said:
so in this design, I think we have a type variable behind the scenes which tracks which of these 4 function types a given function has (only 2 of the types are visible, namely pure vs effectful - but we need to track all 4 as distinct from one another behind the scenes, in order to compile the way we want):
- effectful and synchronous - compiles to what we have today, where at some point the application just straight-up calls a function in the host and then continues. Examples of where we'd want this:
Time.now!and just about everything in the wasm4 platform :big_smile:- effectful and async - compiles to state machine. This is the really involved one because we need to convert everything that's happened up to that point into a
Taskequivalent so that the function actually returns at that point, and one of the things it returns is "everything that happens next" inside a closure. (Today we call thatTask.await.) Examples:Http.get!,Result.parallel!, etc.- pure and sequential - the normal thing we do today, no special compilation necessary
- pure and concurrent - for
List.mapParalleland such - from a type-checking perspective, this counts as pure, but from a compilation perspective, it's exactly the same as "effectful and async" in that it compiles down to a state machine entry. (The host doesn't care about the distinction between pure and effectful, it just cares about the distinction between sync and async.)
so today, when a crash happens, we immediately call the host and say "whatever is going on in the call stack right now, you need to stop what you're doing and deal with this crash"
another design we could go with is to treat crash as an async state machine entry as described above :point_up:
essentially putting it in the "pure and concurrent" bucket
such that when a crash occurs, the host just sees a normal-looking return with a state machine entry that has no continuation in it
off the top of my head, some of the tradeoffs involved here:
crash if that's a thing they want to do, e.g. no setjmp/longjmp(this is essentially the "RocResult" design from awhile back, except rolling it into the state machine instead of wrapping the state machine, since the state machine already has a discriminant, so why introduce a second one?)
Oh, and then roc would clean everything up before the return. That would actually be really awesome (though wasteful in some platforms with arenas)
yeah, there are also implications for stack traces
Though crash should be sparing enough that we don't really care about said waste
e.g. right now hosts can grab a backtrace right inside the crash handler, and the stack still exists
whereas if we wanted to get a trace to them, we'd need to capture it before returning the state machine entry etc.
but I think that's something we'll want to figure out anyway for async stack traces, so I'm not considering it a tradeoff really
Sam Mohr said:
Though crash should be sparing enough that we don't really care about said waste
I haven't thought it through all the way, but I think there could be a perf impact even if the crash doesn't occur
although actually that might not be true anymore if it's just one more discriminant in the state machine :thinking:
certainly it was true in the RocResult design
but maybe doesn't apply anymore
I feel like in my mind, the perfect implementation would:
although actually that might not be true anymore if it's just one more discriminant in the state machine
In an already effectful function, it is essentially no extra cost.
In a pure function, it is extra cost.
yeah @Folkert de Vries and I had the exception handling thing implemented a long time ago...it didn't go great :sweat_smile:
Yeah, for proper exception we need full debug info with exception frames that track everything that must be refcounted if an exception is thown. It also has to walk the stack a frame at a time as it unwinds
or, I guess we just need exception frames and not full debug info
Some of this stuff is built at least partially into llvm, but I don't think it is simple to implement
yeah based on our experience last time I don't think we want to go down that road again :laughing:
Honestly I think that is a mistake. Essentially all programming langauges have exceptions. Most llvm supported languages have them. So they can't be that hard to implement even if it is opaque and a general pain to do so.
what about dev backend?
In the worst case, dev backend could go the result under the hood route. But I assume once we figure things out in llvm, it will be easier to figure things out in the dev backend.
that's interesting, I hadn't thought about that!
one of the problems as I recall was that LLVM basically requires you to link libcpp for the exceptions to work
and trying to remove that dependency was...not straightforward haha
I'm imagining a hot loop that is using a dictionary. It has a crash for the impossible case of loading an out of bounds element. If that function and everything call it has to be turned into a result under the hood to deal with crash, it will lead to major perf regressions. Any hot loop with a crash in it would hit this.
one of the problems as I recall was that LLVM basically requires you to link libcpp for the exceptions to work
... That would really suck
I wonder what rust does
do they use llvm exceptions? :thinking:
I'm not sure, but they definitely have catchable unwinds, an llvm backend, and I didn't think they always linked libc++.
As a note, we technically could add crash to the task state machine, but still use setjmp and longjmp for pure functions. Just jump back to where we generate the crash for the state machine. Of course, that wouldn't deal with cleanup, but if we don't have a good way to deal with cleanup, that still could be a nicer interface for platforms without harming perf.
true!
as I recall, the basic way that they deal with cleanup is that each function gets a little header in the machine code that runs to perform cleanup if it's unwinding
so you specify that in llvm and it puts it in the machine code
also, there's a "personality function" that is also a little header, and it's for catch - basically a way to say "here's what my class is" or something like that, so your code can detect whether it's time to stop unwinding and run the catch code
but we wouldn't need that aspect
or rather we'd only need it at the entrypoint from the host
anyway, I agree that this would be the best for both perf and host ergonomics if we could make it work
one important prerequisite would be figuring out how to do it without libcpp :big_smile:
I think that was where we got stuck last time
because I think we had the other stuff working
Good to know
I wonder if we'll have to do something like statically link libunwind or something.
I think libunwind is only part of it
but we could prob just get the sources from that and import them into our zig builtin code, because zig is awesome like that :grinning_face_with_smiling_eyes:
we didn't have zig back when we tried this last time haha
we may want libunwind regardless for async backtraces
does zig have exceptions? Can they just tell us how to do everything?
:tears: They tend to be super helpful and have low dependency ways of doing things.
Oh, it looks like they just have printing an error, dumping a stack trace, and then hanging.
So no unwind and what not
Not that I understand the pieces yet, but rust's implementation seems to exist in these locations and only depends on libunwind (or libgcc), not libc++ or libstdc++ (at least from what I can tell).
https://github.com/rust-lang/rust/blob/master/library/std/src/panicking.rs
https://github.com/rust-lang/rust/tree/master/library/panic_unwind/src
https://github.com/rust-lang/rust/tree/master/library/unwind/src
Then this just walks the landing pads and what not create by the llvm ir
And an example using only c and llvm. No linking to anything c++:
https://youtu.be/gH5-lITYrMg?si=nf7DFINdmhxDBQRl&t=1110
Source before they switch to using c++ (so just c and llvm): https://github.com/AlexDenisov/llvm-social-exception-handling/tree/main/05
whoa!
Brendan Hansknecht said:
And an example using only c and llvm. No linking to anything c++:
https://youtu.be/gH5-lITYrMg?si=nf7DFINdmhxDBQRl&t=1110
whoa! :open_mouth:
:thinking: so if desired, we could theoretically switch to that already, if we wanted to switch from roc_panic to RocResult?
yes
oh I guess dev backend wouldn't love that though
also yes. Need to figure out generating these landing pads and eh headers from the dev backend as well.
Also, I'm guessing the issue was needing to implement your own personality functions and what not instead of depending on the c++ ones.
Also, no idea how this all works in wasm
how do we do crashes in wasm today?
we call roc_panic and then let the host language figure it out. So we let zig deal with generating it.
And I think it calls some sort of wasm halt instruction
I realized something about the whole "automatic unwinding such that host calls to Roc functions return a Result" idea: there's basically no way to have Roc handle stack overflows automatically in this way
that is, the way stack overflow handling works (and has to work) is:
mprotect on a stack guard page and a signal handler for SIGSEGV which occurs when something writes to the readonly guard pageSIGSEGV handler runs in the middle of a Roc program's execution, and needs to handle cleanup right away - so there's no opportunity for Roc to convert things into Resultin other words, if hosts want to gracefully handle stack overflows in Roc programs (which they should!) then they already need to deal with the circumstances of today's roc_panic
so it's actually better to not do the whole "Roc functions return a Result to the host" because the host needs to deal with the "gracefully clean up a Roc program, including unwinding the stack and dealing with heap resources/file handles/etc. in the middle of the Roc program's execution" thing no matter what because of stack overflows
so the roc_panic design lets the host reuse code between the stack overflow handling logic and the "Roc executed a crash" handling logic
I kinda agree, kinda don't. I think in practice, most programs accept that crash on stack overflow is fine behaviour.
But I do agree that is important to be able to handle it.
Brendan Hansknecht said:
I think in practice, most programs accept that crash on stack overflow is fine behaviour.
sure, but for those programs it's presumably fine to crash on crash too :smile:
Richard Feldman said:
- Roc cannot reasonably set this up automatically, partly because it needs to happen exactly once (and not just once per Roc call), but also because the host might want to have host-specific logic in there which Roc doesn't know about
These don't seem unsolvable.
To the first point: Can Roc provide the host an init function it has to call before it can call any other roc entry points? Or alternatively, can Roc maintain some state (at the top of the stack?) about whether init has been called?
To the second: can the host provide a function to Roc for a custom stack overflow handler? Roc gets the stack overflow first, wraps it in a result, and then passes the result to the host's callback.
These would make the interface between host and Roc more complex, so there's a tradeoff. But if that can provide a better abstraction boundary over Roc crashes, it could be worth considering.
Sky Rose said:
Richard Feldman said:
- Roc cannot reasonably set this up automatically, partly because it needs to happen exactly once (and not just once per Roc call), but also because the host might want to have host-specific logic in there which Roc doesn't know about
These don't seem unsolvable.
:thinking: can you give an example of how that could be done?
like for example, let's say the host wants to do its own custom stack overflow handling via a segfault handler (for stack overflows in the host itself), and wants to incorporate into that handler the logic for handling a stack overflow in a call to a roc function
also, in wasm there is no way to do this in wasm itself; the best you can (apparently) do is to have a try/catch in the JavaScript code that invokes the wasm, and then it can inspect the error message string to try to guess whether it was a stack overflow
anyway, the reason I ask is because I started from the premise that these seemed solvable and then (after a lot of investigation) concluded that this was the best way to go...it's very possible that I missed something, but if so, I need to know the specific design that I missed! :smile:
Okay, I don't have a solution in mind. I was just unconvinced by your short summary. If there's a bigger proof or a previous attempt backing up that argument, then I certainly don't have anything better.
"The host needs its own stack overflow handler for the host stack, and so roc can't have a stack overflow handler for the Roc stack" is a more convincing reason than the bullet point I quoted.
Richard Feldman said:
Brendan Hansknecht said:
I think in practice, most programs accept that crash on stack overflow is fine behaviour.
sure, but for those programs it's presumably fine to crash on
crashtoo :smile:
I think there is overlap, but I wouldn't call this necessarily correct. Different classes of errors with different expectations. Like taking down a server due to an int overflow is very different than taking it down from a stack overflow in my opinion....but I see your point.
All this to say, I think it would be reasonable to turn crashes into results, but still have stack overflows.
That being said, I don't feel strong either way at this point....but the concept of a simple recovery from a crash is important....currently in roc, that is not easy. And forcing arenas is not necessarily the solution....so we may want to think deeper about that.
i think a platform that supports concurrency through co-routines want a way to have a stack over flow in one coroutine not crash the entire system
Oh sure, but a platform can always do that no matter how we design Roc. Really the question is if after a stack overflow they can cleanup the garbage left behind.
Last updated: Jun 16 2026 at 16:19 UTC