Stream: compiler development

Topic: host crash and stack overflow recovery


view this post on Zulip Richard Feldman (Aug 01 2025 at 14:54):

I want to write down some notes on how hosts should handle the roc program either performing a crash or stack overflowing. this is going to be kinda unstructured; I just want to get it down in one place and we can talk about whatever aspects of it :smile:

view this post on Zulip Richard Feldman (Aug 01 2025 at 14:54):

so wasm and non-wasm targets need totally different things, so let's start with Windows because it's the most straightforward

view this post on Zulip Richard Feldman (Aug 01 2025 at 14:55):

outside wasm, the way stack overflows get detected is that the host decides how much stack space to reserve, and then "reserves" it by telling the OS to mark a page of memory just past that point as readonly

view this post on Zulip Richard Feldman (Aug 01 2025 at 14:55):

e.g. if you want 2MB of stack space, you set the memory page right after the 2MB mark to be readonly

view this post on Zulip Richard Feldman (Aug 01 2025 at 14:56):

the CPU is really fast and efficient at bumping the stack pointer, but it's also really dumb. it doesn't do any checking whatsoever about whether it has run out of stack space

view this post on Zulip Richard Feldman (Aug 01 2025 at 14:56):

so what happens is that eventually it bumps its way into the readonly memory, causing a segfault

view this post on Zulip Richard Feldman (Aug 01 2025 at 14:57):

(in contrast, heap allocation - which makes sure not to allocate into the 2MB of reserved space, or the readonly guard page, and which can explicitly do conditionals and give an OOM error if it has run out - does not need to cause a segfault on purpose like this)

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:00):

so, to recover from the stack overflow, there are a few things that need to happen:

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:04):

at this point, when the "handle stack overflow" logic is running, the system is in a weird state because:

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:06):

also, once we're done handling everything, we need to clean up all the resources and get back to a normal host state

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:08):

I mention this stack overflow case first, because:

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:09):

ok so let's start with recovering and cleaning up, and get to showing a trace later

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:11):

the simplest way to recover from a stack overflow (and this works for crash too) is:

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:13):

so we can use that to get execution back where we want it, but there's still the matter that the Roc program may have allocated heap memory or opened file handles, and this doesn't clean any of that up

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:13):

I think the best solution there is that the host should not share a single global allocator between itself and Roc, but rather provide a dedicated allocator just for the Roc program

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:14):

and if there are multiple Roc entrypoints running concurrently, they should each have their own allocators

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:15):

this way, if one of them crashes or stack overflows, the host can just reset that allocator and cleanup has been achieved

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:15):

separately, if it's doing file handle stuff, that should be tracked somewhere the host can access it, so that it can iterate through the open file handles and close them

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:16):

we can model all of this in basic-cli and basic-webserver so host authors can have a good example to follow

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:21):

ok, so at this point we have a way for the host to:

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:24):

of note, an alternative to setjmp/longjmp is to do stack unwinding - e.g. libunwind (~100KB dependency) lets you avoid the setjmp up front (maybe like a dozen CPU instructions), but that seems not worth it to me considering setjmp and longjmp are much simpler

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:24):

ok so now the backtrace

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:25):

on Windows, there's a built-in API for getting backtraces - https://learn.microsoft.com/en-us/windows/win32/debug/capturestackbacktrace

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:26):

however, it isn't aware of debuginfo, so it wouldn't give things like line numbers etc.

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:27):

however, there's a more advanced one that does get you debug info - https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-stackwalk64 - but to get that, the host has to dynamically link dbghelp.dll (which does ship with Windows, fortunately)

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:33):

unfortunately, this is a pretty complex function that does use stack memory itself, and if we're in the middle of a stack overflow, it's possible that trying to walk the stack could result in another stack overflow, at which point we're toast

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:41):

so we need to be really really conservative in what we do

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:43):

one thing that helps is that Windows automatically converts the guard page to be writable again (so, no longer a guard page) at which point we can grow our stack into it

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:44):

so we get one more page (at least 4096B) of memory to work with, for just the stack walking and recording it somewhere else (most likely a threadlocal that we pre-reserved and can access after the longjmp to display the recorded stack trace to the end user in whatever way we like)

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:44):

however, this does mean that if we don't want to "leak" guard pages, we need to manually change that page back to a guard page after the longjmp - which is doable, it's just something the host needs to do

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:46):

one more note: the interpreter just does all this stuff itself and reports a stack overflow as a normal crash

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:46):

because it's managing its own "stack"

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:46):

ok so that's Windows - we can recover from stack overflows and crashes, including showing the end user a backtrace

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:46):

which includes debuginfo, so function names and line numbers etc.

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:47):

on UNIX, we can follow the same basic strategy but with less built-in behavior

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:49):

specifically:

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:50):

since the backtrace-obtaining functionality is completely different for interpreter vs optimized build, I think the compiled roc application should expose a backtrace() symbol the host can link and call, and it will have the same API regardless of whether we're interpreting or running an optimized build (the optimized build will use libbacktrace, which we'll need to bundle along with our builtins, and then the interpreter will do its own thing with its own stack memory)

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:51):

ok so that's Windows and UNIX...now for wasm!

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:51):

wasm is totally different from either of those

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:51):

the wasm binary is not allowed to look at its own bytes, so it can't do its own stack unwinding or generate its own backtrace

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:52):

however, the things we need to happen can be done outside the wasm vm (so, in the browser that means JS can do it)

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:52):

for simplicity I'll talk about the browser, but hopefully other wasm vms will have equivalent ways of doing these things

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:53):

when a stack overflow occurs in wasm, execution immediately halts and JS gets a runtime exception, which it can catch with try/ catch

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:54):

that exception includes the backtrace info, and as long as the binary was built with source maps (which can be either included in the wasm binary itself or can be hosted separate from the binary and the browser can download it), JS can use the source maps to get line numbers etc.

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:55):

for crash, wasm can call out to an external JS function, and JS can inspect the current wasm stack while it's paused - and again can use source maps to get debuginfo

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:55):

JS can also just cancel the entire wasm process, which takes care of stack memory cleanup, execution, etc.

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:56):

for the interpreter inside the browser, it gets slightly trickier because we can't just be like "here is my backtrace() function and I'll give you what you want regardless of whether I'm an interpreter or an optimized build"

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:56):

because although the interpreter can offer that, the optimized build can't (because it can't see its own executable bytes like it can outside of wasm)

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:57):

so instead, basically I think what we need to do in wasm is make it so that when we call the "I crashed" host handler (in JS), if we're the interpreter, we automatically create a backtrace and pass it to JS, so JS can just use that

view this post on Zulip Richard Feldman (Aug 01 2025 at 15:58):

but if we're an optimized build, we just pass null, and then JS in the host can check for that and see "oh, there's a null backtrace here, so that must mean I need to go inspect the currently-running wasm stack myself and build my own trace"

view this post on Zulip Richard Feldman (Aug 01 2025 at 16:04):

for the stack overflow scenario it's actually more straightforward, because that's naturally a thrown exception with backtrace info in the exception, so in theory we can have the interpreter just handcraft one of those exceptions, except including all the backtrace info from the interpreter

view this post on Zulip Richard Feldman (Aug 01 2025 at 16:04):

ok I think that's everything!

view this post on Zulip Richard Feldman (Aug 01 2025 at 16:06):

I think the key here is having good examples of how to do all of that in basic-cli, basic-webserver, and wasm hosts too, so that other platform authors can treat everything I just described as just some general boilerplate stuff they set up and then backtraces for stack overflows and crash Just Work as if this were an interpreted language or a language with a VM, except without the VM overhead :smile:

view this post on Zulip Joshua Warner (Aug 01 2025 at 17:38):

Couple thoughts:

  1. We may want to consider abstracting stack overflow a _bit_, such that the host doesn't have to know about both the interpreter and the compiler.
  2. We should use a frame pointer, which (as long as all code on the stack is Roc code) makes backtraces relatively trivial to compute.

view this post on Zulip Richard Feldman (Aug 01 2025 at 17:51):

Joshua Warner said:

  1. We should use a frame pointer, which (as long as all code on the stack is Roc code) makes backtraces relatively trivial to compute.

the problem with that is that then inlined functions disappear completely, whereas e.g. DWARF has metadata about "here's the function that was inlined here" so you can get a more useful stack trace even if things were inlined. also we want the DWARF metadata anyway for line numbers etc.

I haven't actually implemented a stack walker with or without libbactrace, but from what I've read, if you want to get the DWARF metadata anyway, that's the whole hard part - and at that point you might as well rely on that as the source of truth instead of the frame pointer, even if you have one

view this post on Zulip Richard Feldman (Aug 01 2025 at 17:52):

Joshua Warner said:

  1. We may want to consider abstracting stack overflow a _bit_, such that the host doesn't have to know about both the interpreter and the compiler.

in theory I'd like to do this, but it doesn't seem possible for the crash scenario in wasm (at least based on my research) to avoid the host having to do something different in one scenario vs the other one

view this post on Zulip Richard Feldman (Aug 01 2025 at 17:53):

and outside of wasm, I think the stack overflow handler has to be installed by the host, because Roc functions are (and ought to be) stateless, plus the host might have other signal handlers that could conflict with ours if we installed it ourselves

view this post on Zulip Richard Feldman (Aug 01 2025 at 17:54):

and the interpreter can just do a normal crash for signalling that a stack overflow has happened

view this post on Zulip Richard Feldman (Aug 01 2025 at 17:54):

so from that perspective, the only thing I think we could really abstract is backtrace() - but honestly I actually think I'd rather do the same thing we do with wasm there, where we say "here's a backtrace if I'm an interpreter, but otherwise I just give you null and you need to go walk the stack yourself"

view this post on Zulip Richard Feldman (Aug 01 2025 at 17:55):

because that way we don't have to staple a 50KB libbacktracestatic dependency to every optimized Roc app, when the host might already have that dependency themselves and just be able to use the one they already have

view this post on Zulip Joshua Warner (Aug 01 2025 at 19:34):

Hmm, tbh I wouldn't optimize for making backtraces perfect in the presence of inlining and optimization.

view this post on Zulip Joshua Warner (Aug 01 2025 at 19:35):

There never going to be perfect anyway

view this post on Zulip Joshua Warner (Aug 01 2025 at 19:35):

I'd just keep track of which functions have other functions inlined into them, and make a note of that in the backtrace

view this post on Zulip Joshua Warner (Aug 01 2025 at 19:36):

e.g.

foo
bar (*contains inlined functions)
baz

view this post on Zulip Joshua Warner (Aug 01 2025 at 19:39):

In other words, what I'd do is:

  1. guarantee you can always get some kind of backtrace via frame pointers. The runtime code is pretty minimal. You may or may not have symbols, and that's fine. Image base offset, image UUID, and return addresses are sufficient to fully reconstruct the stack more properly later (e.g. in a tool we ship with the roc binary, or cobbling together other open source tools)
  2. allow the host to link libbacktrace if it wants more full functionality

view this post on Zulip Richard Feldman (Aug 01 2025 at 19:50):

ah that's interesting

view this post on Zulip Richard Feldman (Aug 01 2025 at 19:51):

so basically give up a slight amount of perf (reserving 1 register for the duration of the roc call) for the benefit of hosts having the option to do simple backtrace without line numbers on optimized builds

view this post on Zulip Richard Feldman (Aug 01 2025 at 19:52):

I could also see that being a platform module knob maybe

view this post on Zulip Richard Feldman (Aug 01 2025 at 19:52):

like they can request a frame pointer or not (if they're not gonna use it anyway, might as well take the extra perf improvement)

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:15):

https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html?utm_source=chatgpt.com

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:15):

lol, utm outing me :sweat_smile:

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:16):

I would actually say that unless we find a strong case that really really needs that register, we should not even make it an option

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:18):

Note that line numbers can work just fine with frame pointer backtraces

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:18):

All you need is addr2line at that point

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:18):

(either the tool itself, or functionality thereof)

view this post on Zulip Richard Feldman (Aug 01 2025 at 20:23):

TIL about addr2line - but apparently it's basically the same thing as libbacktrace except it's designed to be invoked in a separate process on a file that's a memory dump of the frame?

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:42):

Yep!

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:42):

And, most importantly IMO, it doesn't require linking, and can even be done after the fact

view this post on Zulip Joshua Warner (Aug 01 2025 at 20:42):

(doesn't require symbols being embedded in the binary, or even being on the machine where the crash happened)

view this post on Zulip Richard Feldman (Aug 01 2025 at 21:27):

hm, but wouldn't the best UX normally be getting the stack trace immediately?

view this post on Zulip Richard Feldman (Aug 01 2025 at 21:28):

I mean if all we have to do is enable stack frames to get it, then sure, might as well, but wouldn't host authors pretty much always want to use libbacktrace anyway for runtime stack traces? :thinking:

view this post on Zulip Brendan Hansknecht (Aug 01 2025 at 21:41):

I don't think it is safe to longjmp from a segfault handler, but you can modify the return address to go to a temp function and then longjmp.

Also, of note, you want to leave segfault handlers as fast as possible. So all of the stuff discussed here should be deferred and outside of the segfault handler.

and if there are multiple Roc entrypoints running concurrently, they should each have their own allocators

I wonder how expensive that would be... A lot of perf is gained from things like tcmalloc (thread cache malloc) with its smart reuse of memory.

if it's doing file handle stuff, that should be tracked somewhere the host can access it, so that it can iterate through the open file handles and close them

Or we implement standard exceptions in roc. Use debug info to walk the stack and run destructors on the roc values. Stack walking is slower when there is an exception but greatly reduces the cost when no exceptions happen.

of note, an alternative to setjmp/longjmp is to do stack unwinding - e.g. libunwind (~100KB dependency) lets you avoid the setjmp up front (maybe like a dozen CPU instructions), but that seems not worth it to me considering setjmp and longjmp are much simpler

I'm not sold due to all the extra cost of the host needing to manually track every resource that roc uses....

one more note: the interpreter just does all this stuff itself and reports a stack overflow as a normal crash

Yeah, interpreter should be super simple here. And should be able to give nicer errors.

since the backtrace-obtaining functionality is completely different for interpreter vs optimized build, I think the compiled roc application should expose a backtrace() symbol the host can link and call, and it will have the same API regardless of whether we're interpreting or running an optimized build (the optimized build will use libbacktrace, which we'll need to bundle along with our builtins, and then the interpreter will do its own thing with its own stack memory)

If compiled binaries are just for optimized builds, why not just not have backtraces? It is really common to leave out debug info in release builds anyway. Leave that for the interpreter.

view this post on Zulip Brendan Hansknecht (Aug 01 2025 at 21:43):

guarantee you can always get some kind of backtrace via frame pointers

Frame pointer is a waste in my opinion (hurts perf for little gain most of the time, should be optional and not default). Just don't have debug info in release builds and use the interpreter which is better suited for this anyway. Give a super crude trace for anything optimized

view this post on Zulip Richard Feldman (Aug 01 2025 at 22:33):

Brendan Hansknecht [said](https://roc.zulipchat.com/#narrow/channel/395097-compiler-development/topic/host.20crash.20and.20stack.20overflow.20recov

Brendan Hansknecht said:

I don't think it is safe to longjmp from a segfault handler, but you can modify the return address to go to a temp function and then longjmp.

Also, of note, you want to leave segfault handlers as fast as possible. So all of the stuff discussed here should be deferred and outside of the segfault handler.

hm, how would this work for purposes of getting a backtrace? :thinking:

modify return address, go to temp function, then do backtrace, then longjmp?

I assume by default libbacktrace would not work post-longjmp, since the backtrace code itself would be potentially stomping over the previous stack memory :sweat_smile:

view this post on Zulip Richard Feldman (Aug 01 2025 at 22:35):

Brendan Hansknecht said:

guarantee you can always get some kind of backtrace via frame pointers

Frame pointer is a waste in my opinion (hurts perf for little gain most of the time, should be optional and not default). Just don't have debug info in release builds and use the interpreter which is better suited for this anyway. Give a super crude trace for anything optimized

I think that would be fine for some use cases but pretty bad for others - e.g. if my web server crashes, I really want as real backtrace in my logs as I can get without sacrificing optimizations :smile:

view this post on Zulip Richard Feldman (Aug 01 2025 at 22:39):

Brendan Hansknecht said:

if it's doing file handle stuff, that should be tracked somewhere the host can access it, so that it can iterate through the open file handles and close them

Or we implement standard exceptions in roc. Use debug info to walk the stack and run destructors on the roc values. Stack walking is slower when there is an exception but greatly reduces the cost when no exceptions happen.

I dunno, I know we've talked about this a bunch over the years, but I just keep coming back to:

it just feels like exceptions would be a way to make host authors do a bit less work for the tradeoff of having worse perf when there's an exception and a ton of added complexity

view this post on Zulip Richard Feldman (Aug 01 2025 at 22:40):

btw I don't actually think cleaning up file handles would be much work for the host

view this post on Zulip Richard Feldman (Aug 01 2025 at 22:42):

they already need to be tracked somewhere for the whole "refcounted fd via roc_alloc that cleans itself up" design, so even in the worst case scenario where you have a bunch of roc functions running at once and they're passing the handles between one another, even then all you have to do is just make sure to keep a running list of which fds a given roc function is executing, and you just go through and decref those if it crashes.

view this post on Zulip Richard Feldman (Aug 01 2025 at 22:43):

feels like stack overflow handling is the big burden here, and I don't know of a way to make that burden easier without sacrificing flexibility (e.g. host can have their own signal handlers without worrying about roc installing its own that conflict with the host's) and/or some huge perf penalty like introducing a vm or something

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 00:09):

Richard Feldman said:

hm, how would this work for purposes of getting a backtrace? :thinking:

modify return address, go to temp function, then do backtrace, then longjmp?

I assume by default libbacktrace would not work post-longjmp, since the backtrace code itself would be potentially stomping over the previous stack memory :sweat_smile:

Yeah, I think something like that. Savinging the old stack address when you do so

Maybe all the ceremony isn't required but I remember seeing something like this in go when dealing with stack swapping (which happens on a signal from a timer if a go routine runs for top long)

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 00:11):

Richard Feldman said:

btw I don't actually think cleaning up file handles would be much work for the host

No, I assume the memory allocator cleanup would be the bigger hassle (and having state per thread). Specifically if you just want a gpa and not arenas

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 00:13):

it just feels like exceptions would be a way to make host authors do a bit less work for the tradeoff of having worse perf when there's an exception and a ton of added complexity

I mean I would hope exceptions are truly exceptional and roc rarely crashes in production, but I guess that may vary a lot by use case.

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 00:15):

And for backrraces in release builds, it definitely is true that having full dwarf info and backrraces from debug info is the most friendly.

view this post on Zulip Richard Feldman (Aug 02 2025 at 00:33):

Brendan Hansknecht said:

Richard Feldman said:

btw I don't actually think cleaning up file handles would be much work for the host

No, I assume the memory allocator cleanup would be the bigger hassle (and having state per thread). Specifically if you just want a gpa and not arenas

hm, why would that be hard? seems like e.g. mimalloc, jemalloc, snmalloc all have a concept of making an allocator instance

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 00:58):

I don't think you actually want a new allocator per coroutine

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 00:59):

That could be hundreds of thousands of allocator instances in a webserver.

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:00):

not per coroutine, per roc entrypoint

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:00):

so per request handler

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:00):

Yeah, that is per coroutine. Each coroutine would be running a roc request

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:02):

hm, why would that be bad? :thinking:

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:02):

having a separate heap for each of them

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:02):

e.g. seems to work great for Erlang :smile:

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:03):

Maybe it wouldn't be, but it would be a lot more overhead than what is normal.

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:03):

Cause they won't share pages for example. So each will end up claim some number of pages even if very little memory is used by roc.

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:06):

yeah that's fair

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:06):

I mean we can try it out and see how it does

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:07):

can also try that out alongside arenas per request

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:09):

I mean, to be clear, Folkert and I spent a ton of hours trying to get C++ style exceptions working without a libc++ dependency, everyone I asked about it said what we were trying to do was a huge mistake, and scrapping it really made things a lot simpler - so I'm not saying we could never do it, just that I think the downside is very serious, so the upside really needs to justify it :sweat_smile:

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:11):

Yeah, totally fair around exceptions.

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:18):

Also, to be fair, most languages just don't recover from stack overflows. Like no attempt is made at all

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:18):

Same as OOMs

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:18):

Just classes of bugs that are accepted as fatal generally

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:19):

But I guess you still need some solution to calls to crash....so :shrug:, may need this kind of heavy handed solution.

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:22):

Brendan Hansknecht said:

Also, to be fair, most languages just don't recover from stack overflows. Like no attempt is made at all

that's true of like C and C++ and Rust, sure, but not the garbage collected languages we're competing directly with

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:23):

like if you hit a stack overflow in Java, C#, JavaScript, Ruby, Python, they all definitely let you gracefully recover - and normally they translate it to a normal exception

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:23):

I don't know about Go but given its approach to stacks I assume so too

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:23):

like a whole webserver going down because one request handler stack overflowed would be a deal-breaker I think

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:24):

and if we solve it for hosts in that use case, then we have a solution for all hosts :smile:

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:30):

I don't know about Go but given its approach to stacks I assume so too

Go doesn't have stack overflows. Just OOMs. Stack will keep growing until OOM. At least that is my recollection

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:31):

ahh right

view this post on Zulip Brendan Hansknecht (Aug 02 2025 at 01:31):

Also, it might just be the case that we have to hack something like mimalloc or tcmalloc to instead of being thread local, being coroutine local. That may work to avoid the overhead, but give good cleanup.

view this post on Zulip Richard Feldman (Aug 02 2025 at 01:31):

ooh interesting!


Last updated: Aug 17 2025 at 12:14 UTC