So I have been doing a little more investigation into the crash that I encountered yesterday. I had to build the platform shim with debug symbols in order to get a backtrace which required some modifications in cli/main.zig and cli/linker.zig. It looks like the exception actually occurs in translateTypeVar when we try to access a key in the translate_cache.
It was a bit difficult getting to this point because I had to manually modify the calls to llvm and lld-link to get a debuggable platform shim. Does it make sense to make this an option in the generatePlatformHostShim function for debugging? I could even see this being valuable for people developing their own platforms for debugging purposes.
Also I was wondering if this recursive fib test should be put into the snapshot tests folder or be added to one of the test platforms like int or str etc.
Sounds like something that would be very useful in future to have
Another update. It does actually look like its a stack depth bug. I updated to the latest main branch and the crash location changed but the call stack looks very similar.
Does this mean that evalExprMinimal and dispatchBinaryOp should be implemented without recursion so that they do not trigger a stack overflow, or is it better to ignore the problem for now and implement tail call recursion? I think that tail calls would only work in the trivial case like for Fibonacci where there is no more work after the recursive call correct?
I'm probably not the best person to comment on that question... but I know that trmc is the plan
I thought our interpreter was implemented without recursion though
I'm thinking of tail call optimization as something to do in 2026
right now I'm trying to focus on what's needed to unblock people doing Advent of Code in Roc if they want to
and since we already have for and while, stack-safe recursion isn't a blocker :smile:
Devin Prothero said:
Does this mean that
evalExprMinimalanddispatchBinaryOpshould be implemented without recursion so that they do not trigger a stack overflow
oh you mean recursion in the Zig code!
yeah it would be great if they could be stack-safe :smile:
Yeah the zig code for the interpreter is all recursive which causes the stack overflow. It might be a bit difficult to rewrite in a stack safe way since there are multiple mutually recursive functions all interacting. Some kind of state machine with ArrayLists to hold stack frames might make the most sense, but I would have to try it first and see.
I have written some toy languages before but usually I would compile to a bytecode and interpret that with a VM rather than directly interpreting expressions so iv'e just been trying to familiarize myself with the code.
Richard Feldman said:
and since we already have
forandwhile, stack-safe recursion isn't a blocker :smile:
That makes a lot of sense, I think that is smart since AOC is just around the corner
Luke Boswell said:
I'm probably not the best person to comment on that question... but I know that trmc is the plan
Sorry I'm not familiar with trmc what is that?
https://jfmengels.net/modulo-cons/#tail-recursion-but-modulo-cons
relevant Roc-specific context: #contributing > Tail Recursion Modulo Cons and later https://github.com/roc-lang/roc/pull/5569
(and other follow-up PRs)
Thanks
Richard that zulip chat is a private group DM I think... I can't see it at least
oops sorry, fixed the link!
Hmm. I wouldn't expect the interpreter to need this at all. I thought the interpreter was a big while loop that runs instructions and then pushes function calls onto the stack. If the interpreter itself is recursive for code execution it may be a general problem in large roc programs even with trmc.
yeah I was confused earlier - this is about our Zig code being recursive
instead of stack-safe
as I recall I went down a rabbit hole trying to do that at some point and backed off
From what I have seen so far it looks like CIR is modeled as an S-Expression so the most natural way to implement the interpreter is recursively. Here is a random example of the simple add test after the canonicalize pass.
(can-ir
(d-let
(p-assign (ident "addU8"))
(e-lambda
(args
(p-assign (ident "a"))
(p-assign (ident "b")))
(e-binop (op "add")
(e-lookup-local
(p-assign (ident "a")))
(e-lookup-local
(p-assign (ident "b")))))
(annotation
(ty-fn (effectful false)
(ty-lookup (name "U8") (builtin))
(ty-lookup (name "U8") (builtin))
(ty-lookup (name "U8") (builtin)))))
(s-expect
(e-binop (op "eq")
(e-call
(e-lookup-local
(p-assign (ident "addU8")))
(e-num (value "1"))
(e-num (value "2")))
(e-num (value "3"))))
(s-expect
(e-binop (op "eq")
(e-call
(e-lookup-local
(p-assign (ident "addU8")))
(e-num (value "0"))
(e-num (value "10")))
(e-num (value "10")))))
It seems like the problem arises once the call depth of the roc program reaches some arbitrary number because instructions like e-call and e-binopare intepreted by recursively calling evalExprMinimal and the like. I think that this means @Brendan Hansknecht is correct and that even non-recursive code will trigger this crash if you have deep enough function calls.
for sure! that's why it should be stack-safe :smile:
the only reason it isn't right now is just the amount of work it would take - although maybe it will turn out we need it for advent of code :sweat_smile:
Yeah, I guess the current recursive nodes are not exactly interpreter friendly passed just being a tree walking interpreter
I wonder if we will need to make a flat bytecode for it to work well.
With anything tree, I think it is either expensive (push and popping state for every node), or recursive.
That said, if we just push state at function call boundaries (and loops), that may be enough
Devin Prothero said:
It was a bit difficult getting to this point because I had to manually modify the calls to llvm and lld-link to get a debuggable platform shim. Does it make sense to make this an option in the
generatePlatformHostShimfunction for debugging? I could even see this being valuable for people developing their own platforms for debugging purposes.
Could you share how you got a debuggable build? I'm struggling to find the right build/link options to adjust.
isaactfa said:
Devin Prothero said:
It was a bit difficult getting to this point because I had to manually modify the calls to llvm and lld-link to get a debuggable platform shim. Does it make sense to make this an option in the
generatePlatformHostShimfunction for debugging? I could even see this being valuable for people developing their own platforms for debugging purposes.Could you share how you got a debuggable build? I'm struggling to find the right build/link options to adjust.
I had to make quite a few changes to get debugging to work I made a patch file that shows the important stuff tho.
Also I don't know if you are on windows but if you are then I would recommend the RadDebugger because it makes it easy to debug subprocesses. You can just turn on a setting and it will automatically hook into child processes which is important because the interpreter runs as a subprocess.
Thank you!
Last updated: Nov 28 2025 at 12:16 UTC