Stream: beginners

Topic: ✔ Help with my program


view this post on Zulip Adrian (Jul 06 2024 at 12:43):

Hi, sorry to write such a message, but it feels like I'm stuck. I was attempting to write a simple compiler and roc seemed like it would be fun to try in. At some point i started running into compiler bugs, that after many attempts don't seem to be circumventable. Here's my repo if anyone wishes to reproduce this https://ass.si/git/Adrian/CompilerV3/src/commit/6cd097a3d58c94197b2a76e1d291545655e69043/main.roc .

view this post on Zulip Adrian (Jul 06 2024 at 12:43):

Upon running roc build the output is

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

view this post on Zulip Adrian (Jul 06 2024 at 12:44):

Thanks to everyone who'd have looked into this in advance :)

view this post on Zulip Anton (Jul 06 2024 at 12:53):

No need to apologize when asking for help @Adrian :)

Is there a specific piece of code that was added and produced the issue?

view this post on Zulip Adrian (Jul 06 2024 at 14:09):

Anton said:

No need to apologize when asking for help @Adrian :)

Is there a specific piece of code that was added and produced the issue?

Not really, I had been preforming a ton of removals and additons and at some point i couldn't get the errors to go away

view this post on Zulip Anton (Jul 06 2024 at 14:12):

Does roc check result in the same issue?

view this post on Zulip Adrian (Jul 06 2024 at 14:13):

The apology is for writing such a demoralizing error report. This isn't really something I'd want to receive and something I'd even less want to debug. Tho this might be on me for not being familiar with the compiler and not compiling the roc myself but getting it from AUR.

view this post on Zulip Adrian (Jul 06 2024 at 14:13):

Anton said:

Does roc check result in the same issue?

Nope, it works

view this post on Zulip Adrian (Jul 06 2024 at 14:14):

view this post on Zulip Anton (Jul 06 2024 at 16:28):

When minimizing the code, I found that changing oneOf [fnP]to fnP results in a drop of build time from 11918 ms to 232 ms, there is also a ~7 GiB difference in RAM consumption between the two. Do you happen to have seen behavior like this before @Brendan Hansknecht?

view this post on Zulip Brendan Hansknecht (Jul 06 2024 at 16:31):

oh wow, definitely need to get a flamegraph of that diff

view this post on Zulip Anton (Jul 06 2024 at 16:31):

Yeah, good idea

view this post on Zulip Anton (Jul 06 2024 at 18:36):

Basically all time is spent in recursive calls of reify_layout_slice and reify_recursive_layout
Screenshot_20240706_203114.png.

view this post on Zulip Adrian (Jul 06 2024 at 19:59):

image.png

view this post on Zulip Adrian (Jul 06 2024 at 19:59):

I can also reproduce the overflow without fnP (Just for reference). I would guess that the same problem is at play

view this post on Zulip Brendan Hansknecht (Jul 06 2024 at 20:01):

@Ayaz Hafiz that function would be trying to unify lambdaset layouts? Right?

view this post on Zulip Ayaz Hafiz (Jul 06 2024 at 23:14):

it builds the runtime representation of the lambda set, so no unification

view this post on Zulip Brendan Hansknecht (Jul 07 2024 at 00:06):

Ah ok

view this post on Zulip Anton (Jul 08 2024 at 14:39):

I can also reproduce the overflow without fnP (Just for reference). I would guess that the same problem is at play

Thanks @Adrian, that allowed me to do a lot of minimization:

app [main] {
    pf: platform "https://github.com/roc-lang/basic-cli/releases/download/0.10.0/vNe6s9hWzoTZtFmNkvEICPErI9ptji_ySjicO6CkucY.tar.br",
    parser: "https://github.com/lukewilliamboswell/roc-parser/releases/download/0.7.1/MvLlME9RxOBjl0QCxyn3LIaoG9pSlaNxCa-t3BfbPNc.tar.br",
}

import pf.Task

import parser.Core exposing [
    Parser,
    const,
    keep,
    oneOf,
    skip,
]
import parser.String exposing [Utf8, parseUtf8, string]

main =
    input : List U8
    input = []

    parsed = parseUtf8 parser input

    Task.ok {}

parser =
    oneOf [includeP, whiteP]

includeP =
    const \x -> Include x
    |> keep (string "blabla")
    |> skip whiteP # comment this line out to prevent the stack overflow

whitesp = \x -> "\n \r\t" |> Str.toUtf8 |> List.contains x
whiteP : Parser Utf8 [Wsp]_
whiteP = const Wsp |> skip (Core.chompWhile whitesp)

view this post on Zulip Anton (Jul 08 2024 at 14:42):

When I look at the call stack with gdb I see this endlessly repeated:

roc_mono::layout::layout_from_flat_type
roc_mono::layout::Layout::from_var
roc_mono::layout::Layout::from_var
roc_mono::layout::LambdaSet::from_var
roc_mono::layout::layout_from_flat_type
roc_mono::layout::Layout::from_var
roc_mono::layout::Layout::from_var
roc_mono::layout::LambdaSet::from_var
roc_mono::layout::layout_from_flat_type

view this post on Zulip Anton (Jul 08 2024 at 14:46):

Also found related (closed) issues: #3444 and #3449

view this post on Zulip Anton (Jul 08 2024 at 16:08):

I made #6890 for this issue.

view this post on Zulip Notification Bot (Jul 08 2024 at 16:10):

Adrian has marked this topic as resolved.

view this post on Zulip Anton (Jul 08 2024 at 16:20):

I also made #6891 for the other issue I encountered with slow compilation and high RAM usage.

view this post on Zulip Isaac Van Doren (Jul 14 2024 at 02:26):

Anton said:

When minimizing the code, I found that changing oneOf [fnP]to fnP results in a drop of build time from 11918 ms to 232 ms, there is also a ~7 GiB difference in RAM consumption between the two. Do you happen to have seen behavior like this before Brendan Hansknecht?

In case it is helpful, on my M1 mac RTL takes about 30s to build with --optimize and about 900ms without. I'd be happy to generate a flamegraph but not sure how.

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 02:28):

cargo install flamegraph
flamegraph --root -- ....

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 02:29):

On mac

view this post on Zulip Isaac Van Doren (Jul 14 2024 at 02:37):

roc_build_flamegraph.svg
roc_build_optimize_flamegraph.svg

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 02:48):

Do you build roc from source? Can you build it with --profile release-with-debug? That will make the flamegraph a lot more meaningful

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 02:55):

Also, just ran it locally:

73% of time spent in morphic. Most of rest spent in llvm.

Very very deep stack trace:
analyze_func -> analyze_graph -> analyze_block_scc -> analyze_block -> analyze_value -> get_analysis -> analyze_func

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 02:56):

cc @Folkert de Vries just in case he has any thoughts. I have never really dug into morphic.

Would be good to file a bug.

view this post on Zulip Isaac Van Doren (Jul 14 2024 at 02:59):

Brendan Hansknecht said:

Do you build roc from source? Can you build it with --profile release-with-debug? That will make the flamegraph a lot more meaningful

Good to know, I can build it from source if the need arises again

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 03:10):

Though masked in practically infinite recursion (which might be the bug, not sure), it looks like the execution time actually breaks down to a lot of time spent in primitives (all the hash set operations).

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 03:12):

about 7/8ths of the time spent in morphic is spent in hashbrown. About half the time spent in morphic is specifically spent in HashSet::insert

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 03:13):

Not sure if any of this is actionable or just a fact of how morphic works though. Just giving the breakdown I see.

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 03:16):

@Isaac Van Doren RTL is a super recursive parser, right?

view this post on Zulip Isaac Van Doren (Jul 14 2024 at 03:17):

Yes it is very recursive

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 03:36):

As for morphic functions that are calling into hashbrown that are eating up all the time:

  1. ForwardState::merge_slots -> ForwardState::copy_data -> ForwardState::copy_aliases -> ForwardState::add_alias -> HashSet::insert
  2. ForwardState::scc_summarize -> Iterator::collect
  3. ForwardState::merge_slots -> HashSet::insert/HashSet::remove directly

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 04:07):

Not that I have any concrete idea for size, but I think that a solid bit of time could be recovered if we start optimizing the starting size of hashsets and hashmaps. Resizing them is extra expensive cause everything has to be rehashed (if they are small enough, might be worth trying normal sets/maps without hashing or btrees).

So probably need to profile how large contains get in morphic and get better default sizes.

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 04:29):

Oh also, I wonder if we can parallelize some of this morphic work for perf gains? I think it basically spends the entire time maxing a single core.

view this post on Zulip Brendan Hansknecht (Jul 14 2024 at 08:01):

After tons of stack collapsing, this is where the time is actually spent:

Screenshot-2024-07-14-at-1.00.19AM.png


Last updated: Jul 06 2025 at 12:14 UTC