Stream: contributing

Topic: Interpreter recursion crash


view this post on Zulip Devin Prothero (Nov 27 2025 at 02:47):

So I have been doing a little more investigation into the crash that I encountered yesterday. I had to build the platform shim with debug symbols in order to get a backtrace which required some modifications in cli/main.zig and cli/linker.zig. It looks like the exception actually occurs in translateTypeVar when we try to access a key in the translate_cache.

It was a bit difficult getting to this point because I had to manually modify the calls to llvm and lld-link to get a debuggable platform shim. Does it make sense to make this an option in the generatePlatformHostShim function for debugging? I could even see this being valuable for people developing their own platforms for debugging purposes.

Also I was wondering if this recursive fib test should be put into the snapshot tests folder or be added to one of the test platforms like int or str etc.

image.png
image.png

view this post on Zulip Luke Boswell (Nov 27 2025 at 03:38):

Sounds like something that would be very useful in future to have

view this post on Zulip Devin Prothero (Nov 27 2025 at 03:38):

Another update. It does actually look like its a stack depth bug. I updated to the latest main branch and the crash location changed but the call stack looks very similar.

view this post on Zulip Devin Prothero (Nov 27 2025 at 03:47):

Does this mean that evalExprMinimal and dispatchBinaryOp should be implemented without recursion so that they do not trigger a stack overflow, or is it better to ignore the problem for now and implement tail call recursion? I think that tail calls would only work in the trivial case like for Fibonacci where there is no more work after the recursive call correct?

view this post on Zulip Luke Boswell (Nov 27 2025 at 03:48):

I'm probably not the best person to comment on that question... but I know that trmc is the plan

view this post on Zulip Luke Boswell (Nov 27 2025 at 03:49):

I thought our interpreter was implemented without recursion though

view this post on Zulip Richard Feldman (Nov 27 2025 at 03:49):

I'm thinking of tail call optimization as something to do in 2026

view this post on Zulip Richard Feldman (Nov 27 2025 at 03:49):

right now I'm trying to focus on what's needed to unblock people doing Advent of Code in Roc if they want to

view this post on Zulip Richard Feldman (Nov 27 2025 at 03:50):

and since we already have for and while, stack-safe recursion isn't a blocker :smile:

view this post on Zulip Richard Feldman (Nov 27 2025 at 03:50):

Devin Prothero said:

Does this mean that evalExprMinimal and dispatchBinaryOp should be implemented without recursion so that they do not trigger a stack overflow

oh you mean recursion in the Zig code!

view this post on Zulip Richard Feldman (Nov 27 2025 at 03:51):

yeah it would be great if they could be stack-safe :smile:

view this post on Zulip Devin Prothero (Nov 27 2025 at 03:55):

Yeah the zig code for the interpreter is all recursive which causes the stack overflow. It might be a bit difficult to rewrite in a stack safe way since there are multiple mutually recursive functions all interacting. Some kind of state machine with ArrayLists to hold stack frames might make the most sense, but I would have to try it first and see.

view this post on Zulip Devin Prothero (Nov 27 2025 at 03:56):

I have written some toy languages before but usually I would compile to a bytecode and interpret that with a VM rather than directly interpreting expressions so iv'e just been trying to familiarize myself with the code.

view this post on Zulip Devin Prothero (Nov 27 2025 at 03:57):

Richard Feldman said:

and since we already have for and while, stack-safe recursion isn't a blocker :smile:

That makes a lot of sense, I think that is smart since AOC is just around the corner

view this post on Zulip Devin Prothero (Nov 27 2025 at 04:01):

Luke Boswell said:

I'm probably not the best person to comment on that question... but I know that trmc is the plan

Sorry I'm not familiar with trmc what is that?

view this post on Zulip Luke Boswell (Nov 27 2025 at 04:06):

https://jfmengels.net/modulo-cons/#tail-recursion-but-modulo-cons

view this post on Zulip Richard Feldman (Nov 27 2025 at 04:07):

relevant Roc-specific context: #contributing > Tail Recursion Modulo Cons and later https://github.com/roc-lang/roc/pull/5569

view this post on Zulip Richard Feldman (Nov 27 2025 at 04:07):

(and other follow-up PRs)

view this post on Zulip Devin Prothero (Nov 27 2025 at 04:07):

Thanks

view this post on Zulip Luke Boswell (Nov 27 2025 at 04:08):

Richard that zulip chat is a private group DM I think... I can't see it at least

view this post on Zulip Richard Feldman (Nov 27 2025 at 04:10):

oops sorry, fixed the link!

view this post on Zulip Brendan Hansknecht (Nov 27 2025 at 04:36):

Hmm. I wouldn't expect the interpreter to need this at all. I thought the interpreter was a big while loop that runs instructions and then pushes function calls onto the stack. If the interpreter itself is recursive for code execution it may be a general problem in large roc programs even with trmc.

view this post on Zulip Richard Feldman (Nov 27 2025 at 04:38):

yeah I was confused earlier - this is about our Zig code being recursive

view this post on Zulip Richard Feldman (Nov 27 2025 at 04:38):

instead of stack-safe

view this post on Zulip Richard Feldman (Nov 27 2025 at 04:38):

as I recall I went down a rabbit hole trying to do that at some point and backed off

view this post on Zulip Devin Prothero (Nov 27 2025 at 05:11):

From what I have seen so far it looks like CIR is modeled as an S-Expression so the most natural way to implement the interpreter is recursively. Here is a random example of the simple add test after the canonicalize pass.

(can-ir
    (d-let
        (p-assign (ident "addU8"))
        (e-lambda
            (args
                (p-assign (ident "a"))
                (p-assign (ident "b")))
            (e-binop (op "add")
                (e-lookup-local
                    (p-assign (ident "a")))
                (e-lookup-local
                    (p-assign (ident "b")))))
        (annotation
            (ty-fn (effectful false)
                (ty-lookup (name "U8") (builtin))
                (ty-lookup (name "U8") (builtin))
                (ty-lookup (name "U8") (builtin)))))
    (s-expect
        (e-binop (op "eq")
            (e-call
                (e-lookup-local
                    (p-assign (ident "addU8")))
                (e-num (value "1"))
                (e-num (value "2")))
            (e-num (value "3"))))
    (s-expect
        (e-binop (op "eq")
            (e-call
                (e-lookup-local
                    (p-assign (ident "addU8")))
                (e-num (value "0"))
                (e-num (value "10")))
            (e-num (value "10")))))

It seems like the problem arises once the call depth of the roc program reaches some arbitrary number because instructions like e-call and e-binopare intepreted by recursively calling evalExprMinimal and the like. I think that this means @Brendan Hansknecht is correct and that even non-recursive code will trigger this crash if you have deep enough function calls.

view this post on Zulip Richard Feldman (Nov 27 2025 at 05:12):

for sure! that's why it should be stack-safe :smile:

view this post on Zulip Richard Feldman (Nov 27 2025 at 05:12):

the only reason it isn't right now is just the amount of work it would take - although maybe it will turn out we need it for advent of code :sweat_smile:

view this post on Zulip Brendan Hansknecht (Nov 27 2025 at 06:37):

Yeah, I guess the current recursive nodes are not exactly interpreter friendly passed just being a tree walking interpreter

view this post on Zulip Brendan Hansknecht (Nov 27 2025 at 06:53):

I wonder if we will need to make a flat bytecode for it to work well.

view this post on Zulip Brendan Hansknecht (Nov 27 2025 at 06:55):

With anything tree, I think it is either expensive (push and popping state for every node), or recursive.

That said, if we just push state at function call boundaries (and loops), that may be enough

view this post on Zulip isaactfa (Nov 27 2025 at 12:47):

Devin Prothero said:

It was a bit difficult getting to this point because I had to manually modify the calls to llvm and lld-link to get a debuggable platform shim. Does it make sense to make this an option in the generatePlatformHostShim function for debugging? I could even see this being valuable for people developing their own platforms for debugging purposes.

Could you share how you got a debuggable build? I'm struggling to find the right build/link options to adjust.

view this post on Zulip Devin Prothero (Nov 27 2025 at 19:18):

isaactfa said:

Devin Prothero said:

It was a bit difficult getting to this point because I had to manually modify the calls to llvm and lld-link to get a debuggable platform shim. Does it make sense to make this an option in the generatePlatformHostShim function for debugging? I could even see this being valuable for people developing their own platforms for debugging purposes.

Could you share how you got a debuggable build? I'm struggling to find the right build/link options to adjust.

I had to make quite a few changes to get debugging to work I made a patch file that shows the important stuff tho.

debug_platform_builds.patch

Also I don't know if you are on windows but if you are then I would recommend the RadDebugger because it makes it easy to debug subprocesses. You can just turn on a setting and it will automatically hook into child processes which is important because the interpreter runs as a subprocess.

view this post on Zulip isaactfa (Nov 28 2025 at 07:07):

Thank you!


Last updated: Nov 28 2025 at 12:16 UTC