Hi, sorry to write such a message, but it feels like I'm stuck. I was attempting to write a simple compiler and roc seemed like it would be fun to try in. At some point i started running into compiler bugs, that after many attempts don't seem to be circumventable. Here's my repo if anyone wishes to reproduce this https://ass.si/git/Adrian/CompilerV3/src/commit/6cd097a3d58c94197b2a76e1d291545655e69043/main.roc .
Upon running roc build
the output is
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
Thanks to everyone who'd have looked into this in advance :)
No need to apologize when asking for help @Adrian :)
Is there a specific piece of code that was added and produced the issue?
Anton said:
No need to apologize when asking for help @Adrian :)
Is there a specific piece of code that was added and produced the issue?
Not really, I had been preforming a ton of removals and additons and at some point i couldn't get the errors to go away
Does roc check
result in the same issue?
The apology is for writing such a demoralizing error report. This isn't really something I'd want to receive and something I'd even less want to debug. Tho this might be on me for not being familiar with the compiler and not compiling the roc myself but getting it from AUR.
Anton said:
Does
roc check
result in the same issue?
Nope, it works
When minimizing the code, I found that changing oneOf [fnP]
to fnP
results in a drop of build time from 11918 ms to 232 ms, there is also a ~7 GiB difference in RAM consumption between the two. Do you happen to have seen behavior like this before @Brendan Hansknecht?
oh wow, definitely need to get a flamegraph of that diff
Yeah, good idea
Basically all time is spent in recursive calls of reify_layout_slice
and reify_recursive_layout
Screenshot_20240706_203114.png.
I can also reproduce the overflow without fnP
(Just for reference). I would guess that the same problem is at play
@Ayaz Hafiz that function would be trying to unify lambdaset layouts? Right?
it builds the runtime representation of the lambda set, so no unification
Ah ok
I can also reproduce the overflow without
fnP
(Just for reference). I would guess that the same problem is at play
Thanks @Adrian, that allowed me to do a lot of minimization:
app [main] {
pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.10.0/vNe6s9hWzoTZtFmNkvEICPErI9ptji_ySjicO6CkucY.tar.br",
parser: "https://github.com/lukewilliamboswell/roc-parser/releases/download/0.7.1/MvLlME9RxOBjl0QCxyn3LIaoG9pSlaNxCa-t3BfbPNc.tar.br",
}
import pf.Task
import parser.Core exposing [
Parser,
const,
keep,
oneOf,
skip,
]
import parser.String exposing [Utf8, parseUtf8, string]
main =
input : List U8
input = []
parsed = parseUtf8 parser input
Task.ok {}
parser =
oneOf [includeP, whiteP]
includeP =
const \x -> Include x
|> keep (string "blabla")
|> skip whiteP # comment this line out to prevent the stack overflow
whitesp = \x -> "\n \r\t" |> Str.toUtf8 |> List.contains x
whiteP : Parser Utf8 [Wsp]_
whiteP = const Wsp |> skip (Core.chompWhile whitesp)
When I look at the call stack with gdb I see this endlessly repeated:
roc_mono::layout::layout_from_flat_type
roc_mono::layout::Layout::from_var
roc_mono::layout::Layout::from_var
roc_mono::layout::LambdaSet::from_var
roc_mono::layout::layout_from_flat_type
roc_mono::layout::Layout::from_var
roc_mono::layout::Layout::from_var
roc_mono::layout::LambdaSet::from_var
roc_mono::layout::layout_from_flat_type
Also found related (closed) issues: #3444 and #3449
I made #6890 for this issue.
Adrian has marked this topic as resolved.
I also made #6891 for the other issue I encountered with slow compilation and high RAM usage.
Anton said:
When minimizing the code, I found that changing
oneOf [fnP]
tofnP
results in a drop of build time from 11918 ms to 232 ms, there is also a ~7 GiB difference in RAM consumption between the two. Do you happen to have seen behavior like this before Brendan Hansknecht?
In case it is helpful, on my M1 mac RTL takes about 30s to build with --optimize and about 900ms without. I'd be happy to generate a flamegraph but not sure how.
cargo install flamegraph
flamegraph --root -- ....
On mac
roc_build_flamegraph.svg
roc_build_optimize_flamegraph.svg
Do you build roc from source? Can you build it with --profile release-with-debug
? That will make the flamegraph a lot more meaningful
Also, just ran it locally:
73% of time spent in morphic. Most of rest spent in llvm.
Very very deep stack trace:
analyze_func -> analyze_graph -> analyze_block_scc -> analyze_block -> analyze_value -> get_analysis -> analyze_func
cc @Folkert de Vries just in case he has any thoughts. I have never really dug into morphic.
Would be good to file a bug.
Brendan Hansknecht said:
Do you build roc from source? Can you build it with
--profile release-with-debug
? That will make the flamegraph a lot more meaningful
Good to know, I can build it from source if the need arises again
Though masked in practically infinite recursion (which might be the bug, not sure), it looks like the execution time actually breaks down to a lot of time spent in primitives (all the hash set operations).
about 7/8ths of the time spent in morphic is spent in hashbrown. About half the time spent in morphic is specifically spent in HashSet::insert
Not sure if any of this is actionable or just a fact of how morphic works though. Just giving the breakdown I see.
@Isaac Van Doren RTL is a super recursive parser, right?
Yes it is very recursive
As for morphic functions that are calling into hashbrown that are eating up all the time:
ForwardState::merge_slots
-> ForwardState::copy_data
-> ForwardState::copy_aliases
-> ForwardState::add_alias
-> HashSet::insert
ForwardState::scc_summarize
-> Iterator::collect
ForwardState::merge_slots
-> HashSet::insert
/HashSet::remove
directlyNot that I have any concrete idea for size, but I think that a solid bit of time could be recovered if we start optimizing the starting size of hashsets and hashmaps. Resizing them is extra expensive cause everything has to be rehashed (if they are small enough, might be worth trying normal sets/maps without hashing or btrees).
So probably need to profile how large contains get in morphic and get better default sizes.
Oh also, I wonder if we can parallelize some of this morphic work for perf gains? I think it basically spends the entire time maxing a single core.
After tons of stack collapsing, this is where the time is actually spent:
Screenshot-2024-07-14-at-1.00.19AM.png
Last updated: Jul 06 2025 at 12:14 UTC