Stream: compiler development

Topic: zig compiler perf tracking


view this post on Zulip Anton (Jul 09 2025 at 10:35):

If I run roc check New-List.roc it kills my terminal after a while :p

Turns out this is because it's trying eat all my RAM (64 GB)

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:36):

Yeah, we need to implement some kind of cut-off or streaming for the problem reports

view this post on Zulip Anton (Jul 09 2025 at 10:37):

It's strange that Brendan didn't experience any issues, he's probably on macos

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:37):

He mentioned that he did

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:38):

He said there was thousands of errors I recall

view this post on Zulip Anton (Jul 09 2025 at 10:38):

#compiler development > casual conversation @ 💬

view this post on Zulip Anton (Jul 09 2025 at 10:38):

I meant memory issues, lots of Roc errors is ok to test our perf

view this post on Zulip Anton (Jul 09 2025 at 10:40):

60s -> generating diagnostic from can (this is after my fix that made this part way way faster)

Perhaps this perf fix is not yet merged in @Brendan Hansknecht?

view this post on Zulip Anton (Jul 09 2025 at 10:41):

Ok no, looks like it's merged in https://github.com/roc-lang/roc/pull/7938

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:43):

Screenshot 2025-07-09 at 20.42.21.png

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:43):

I killed it at 8GB

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:43):

./zig-out/bin/roc check ~/Documents/New-List.roc

view this post on Zulip Anton (Jul 09 2025 at 10:44):

Hmm, could this difference in behavior be due to a recent commit?

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:44):

Definitely could be

view this post on Zulip Anton (Jul 09 2025 at 10:44):

I'll try with Brendan's perf fix PR

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:45):

I'm running on my PR branch which includes the module caching stuff, but other than that I can't think of any significant changes since Brendan's analysis

view this post on Zulip Anton (Jul 09 2025 at 10:45):

Could be due to zig 0.14.1 instead of 0.14.0

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:49):

Running with a smaller file https://gist.github.com/lukewilliamboswell/532f48c70cfc3bca866c239cad291378

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:49):

Looks like maybe PackedDataSpan is an issue, or at least we exceeded the assumptions we made using that.

view this post on Zulip Anton (Jul 09 2025 at 10:51):

Anton said:

I'll try with Brendan's perf fix PR

That gives me a panic:

thread 12630 panic: reached unreachable code
/home/username/Downloads/zig-linux-x86_64-0.14.0/lib/std/debug.zig:522:14: 0x109d64d in assert (roc)
    if (!ok) unreachable; // assertion failure
             ^
/home/username/gitrepos/roc/src/check/canonicalize/NodeStore.zig:1274:33: 0x120e3fa in addExpr (roc)
                std.debug.assert(PackedDataSpan.FunctionArgs.canFit(args.span));

view this post on Zulip Anton (Jul 09 2025 at 10:51):

Luke Boswell said:

Running with a smaller file https://gist.github.com/lukewilliamboswell/532f48c70cfc3bca866c239cad291378

Oh yeah, that's my panic too :p

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:53):

I added that PackedDataSpan ... but definitely wasn't thinking about mammoth files like this when I did that

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:56):

That's 47,028 dot access expressions (List.something)! Each one creates a e_dot_access node in the CIR, and if each has arguments, it needs to store a span. This is exactly what's causing the memory explosion.

Summary

Root Cause

  1. File Structure: The 10MB file contains ~47,000 dot access expressions (like List.len, List.get_unsafe, etc.)
  2. Memory Explosion: Each dot access expression creates a e_dot_access node that requires storing argument spans
  3. Limit Exceeded: The PackedDataSpan.FunctionArgs configuration (20 bits start, 12 bits length) can only handle start positions up to ~1M, but the large number of expressions causes the data structure indices to exceed this limit

view this post on Zulip Luke Boswell (Jul 09 2025 at 10:58):

I've got an idea

view this post on Zulip Luke Boswell (Jul 09 2025 at 11:24):

https://github.com/roc-lang/roc/pull/7980

view this post on Zulip Luke Boswell (Jul 09 2025 at 11:27):

My idea was to stop creating new malformed nodes after a certain threshold and just re-use one node.

view this post on Zulip Luke Boswell (Jul 09 2025 at 11:28):

That doesn't seem to have completely solved our problem

view this post on Zulip Luke Boswell (Jul 09 2025 at 11:54):

I think I found another problem causing exponential growth in CIR nodes

view this post on Zulip Luke Boswell (Jul 09 2025 at 12:08):

Something strange is defintely happening somewhere. I'm using ~3GB for around 1_000_000 nodes.

view this post on Zulip Luke Boswell (Jul 09 2025 at 12:24):

Some stats https://gist.github.com/lukewilliamboswell/cc52a944807ef16cd357356eda438ecb

view this post on Zulip Richard Feldman (Jul 09 2025 at 13:17):

Total CIR nodes created: 1023040

that's fascinating - so this is about 1M CIR nodes for about 1M LoC of non-comments

view this post on Zulip Richard Feldman (Jul 09 2025 at 13:18):

I would assumed a much higher average of CIR nodes per line, even with a lot of lines being just closing delimiters

view this post on Zulip Richard Feldman (Jul 09 2025 at 13:18):

or blank lines I guess

view this post on Zulip Richard Feldman (Jul 09 2025 at 13:22):

that's wild bc it suggests 16-bit indices would work for modules up to like 30-60K LoC depending on how much type instantiation was happening

view this post on Zulip Luke Boswell (Jul 09 2025 at 20:19):

After sleeping on this issue, I think I know the problem. It may be easy to fix our memory problem.

view this post on Zulip Brendan Hansknecht (Jul 09 2025 at 20:32):

When I measured, I think I saw 3GB of RAM, which is still quite high, but nothing like 64

view this post on Zulip Luke Boswell (Jul 09 2025 at 22:32):

To fix this memory thing properly. This is what I think we need to do.

Have a counter, and increment every time we push a diagnostic

Once that counter hits a threshold ~10_000 errors or something

We then no longer allocate any memory for new diagnostics. We use a placeholder malformed node (the same one can be re-used) that just says "TOO MANY ERRORS".

It's a bit of a mechanical change, but I think it's necessary so we don't keep allocating strings and random things that we'll never use or need later.

view this post on Zulip Luke Boswell (Jul 09 2025 at 22:37):

Here's is an example of the culprit which is common across Can. We're allocating a new string and allocating another Node in the store. Both of these will never be needed if we have thousands of errors already.

.crash => |crash_stmt| {
    // Not valid at top-level
    const string_idx = self.can_ir.env.strings.insert(self.can_ir.env.gpa, "crash");
    const region = self.parse_ir.tokenizedRegionToRegion(crash_stmt.region);
    self.can_ir.pushDiagnostic(CIR.Diagnostic{ .invalid_top_level_statement = .{
        .stmt = string_idx,
        .region = region,
    } });
    last_type_anno = null; // Clear on non-annotation statement
},

view this post on Zulip Luke Boswell (Jul 09 2025 at 22:40):

I'd like some thoughts on this before I try and implement this. It's a pretty mechanical change, I guess I could start with just the counter and a just few high priority areas, then gradually implement in later PR's.

view this post on Zulip Richard Feldman (Jul 09 2025 at 22:47):

I think that's fine; lots of compilers do it, and it's not like anyone can usefully process 400K errors at once anyway :stuck_out_tongue:

view this post on Zulip Brendan Hansknecht (Jul 09 2025 at 23:14):

Hmm....is that the root cause of the memory issue?

view this post on Zulip Brendan Hansknecht (Jul 09 2025 at 23:16):

Also, arent string deduplicated? On top of that, static strings like "crash" we should just never allocate ever

view this post on Zulip Brendan Hansknecht (Jul 09 2025 at 23:19):

Like we still should limit diagnostics, but I don't think it is the root cause of the 64 gb oom

view this post on Zulip Luke Boswell (Jul 10 2025 at 06:09):

Found another related issue... we are getting Regions start at 0 and end somewhere in the middle of the file. When we slice that for our reports we utf-8 validate the whole thing, over and over again thousands of times.

$ wasmtime --dir=. --profile=guest zig-out/bin/roc.wasm check New-List-Cutoff_10MB.roc
WARNING: Large region detected: start=0 end=43906 size=43906 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=44523 size=44523 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=50011 size=50011 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=99849 size=99849 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=100466 size=100466 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=105954 size=105954 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=155792 size=155792 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=156409 size=156409 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=161897 size=161897 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=211735 size=211735 - may cause slow UTF-8 validation

This goes on for thousands of lines.

view this post on Zulip Luke Boswell (Jul 10 2025 at 06:50):

Here is a fix/workaround https://github.com/roc-lang/roc/pull/7995

view this post on Zulip Luke Boswell (Jul 10 2025 at 06:52):

This fixes the memory issues entirely for me. The profiling for New-List-Cutoff_10MB.roc looks completely normal and is no longer dominated by utf-8 validation. It runs to completion though prints out all 1,000 errors to the terminal.

Found 35197 error(s) and 17939 warning(s) in 21785.9 ms for New-List-Cutoff_10MB.roc.

view this post on Zulip Luke Boswell (Jul 10 2025 at 06:54):

Correction .. I hit a debug assertion and it crashes. But in ReleaseSmall it runs fine. Time to track down the next issue I guess :sweat_smile:

view this post on Zulip Brendan Hansknecht (Jul 10 2025 at 06:56):

I'm not really for that workaround. I think we should just correct newlines to have proper region info. That or we should correct parsing to not include new lines as the starting and ending tokens for nodes that we will report diagnostic on.

view this post on Zulip Brendan Hansknecht (Jul 10 2025 at 06:56):

This solution feels more bug prone and like it might just randomly bite us later

view this post on Zulip Brendan Hansknecht (Jul 10 2025 at 06:58):

I'm not sure if they should be... it may have been a performance optimization

Correct functionality before random performance optmizations that may or may not work.

view this post on Zulip Luke Boswell (Jul 10 2025 at 07:10):

I don't love this solution, but it's only temporary I think. It resolves the immediate memory issue, and is noisy (but not blocking) to help us track down any issues.

I've also resolved the debug assertion, and added a limit on the number of warnings we print.

view this post on Zulip Luke Boswell (Jul 10 2025 at 07:10):

I'm avoiding significant changes in the Parser while @Anthony Bullard works on his refactor.

view this post on Zulip Luke Boswell (Jul 10 2025 at 07:11):

I'm also totally up for giving them proper regions.

view this post on Zulip Luke Boswell (Jul 10 2025 at 07:14):

At least with this workaround, we can see the next perf issue which is our CIR diagnostics... it looks like they need the same treatment we just gave AST diagnostics. We should limit the amount we create and re-use a common malformed node after a certain number.

view this post on Zulip Joshua Warner (Jul 10 2025 at 19:14):

I would actually like to get rid of newline tokens all together

view this post on Zulip Richard Feldman (Jul 10 2025 at 19:43):

yeah what do we still use them for?

view this post on Zulip Richard Feldman (Jul 10 2025 at 19:43):

formatter heuristics?

view this post on Zulip Joshua Warner (Jul 10 2025 at 19:45):

That's all I think

view this post on Zulip Joshua Warner (Jul 10 2025 at 19:46):

The parser completely skips them, and even so sometimes they cause weird bugs (or at least the potential for bugs) in the parser

view this post on Zulip Richard Feldman (Jul 10 2025 at 19:54):

fair, although @Anton made the point that not having any control over newlines (e.g. wanting to put blank lines between some assignments and not others) would be a pain.

is there some other way we could do that? e.g. use Region info in the formatter to scan for newlines between assignments and if there's more than 1 that means you want a blank line?

view this post on Zulip Kiryl Dziamura (Jul 10 2025 at 19:59):

Newlines significant only in some particular places? It makes sense to leave them only where user can contol them for having gaps.

Also, can things like if/else have a parameter is_multiline? It would save tokens in such places. Maybe it's an obvious idea

view this post on Zulip Kiryl Dziamura (Jul 10 2025 at 21:51):

Or, if parse ast and cir have the same indexes, newlines aren't needed indeed (because they're noop). It's possible to have a parrallel vector for newline gaps that will contain only indexes of the tokens after which user want to have a gap (on the other hand, u32 per gap is not that great? But this collection would have no region info so memory consumption would be 50% less)

view this post on Zulip Luke Boswell (Jul 10 2025 at 22:24):

So I'm wondering if we should merge this workaround PR https://github.com/roc-lang/roc/pull/7995 I don't feel strongly about it, it was helpful to understand why we were chewing up all that memory.

If we've got a solid plan to move forward with removing newlines or fixing regions then our perf/memory issue won't be a problem for long, and would only a delay our ability to use the profiler effectively until we resolve the underlying root cause.

Is anyone interesting in fixing these?

view this post on Zulip Joshua Warner (Jul 10 2025 at 22:43):

is there some other way we could do that? e.g. use Region info in the formatter to scan for newlines between assignments and if there's more than 1 that means you want a blank line?

The formatter already needs to look at the original source to pull out comments, so of course it can still look there to check whether there's a double (or multiple) newline and preserve that if we want.

view this post on Zulip Richard Feldman (Jul 10 2025 at 22:48):

yikes, that's how we're doing comments? :grimacing:

view this post on Zulip Joshua Warner (Jul 10 2025 at 22:50):

Tell me more about that 'yikes'

view this post on Zulip Joshua Warner (Jul 10 2025 at 22:51):

It avoids a whole bunch of issues with comment tracking and placement that the parser and AST are now freed from

view this post on Zulip Joshua Warner (Jul 10 2025 at 22:51):

All that complexity is concentrated in the formatter, which is the only place that cares about this

view this post on Zulip Joshua Warner (Jul 10 2025 at 22:52):

Consistent comment tracking was probably the #1 hardest thing to get right in the old parser

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:03):

maybe I'm misremembering, but I thought the plan was going to be:

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:03):

that sounded pretty simple to me, but maybe I'm misremembering or misunderstanding something :sweat_smile:

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:04):

the yikes is just about re-tokenizing in the middle of formatting, sounds like a lot of extra processing to do

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:04):

I totally agree that the way we did comments in the previous compiler should not be repeated

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:05):

There's no retokenizing necessary; we're just looking at the space between tokens

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:05):

(based on token regions)

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:06):

Also no side-table necessary either!

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:07):

I see, so I guess we're kind of assuming that the space between the tokens is all comments and/or whitespace, since those are the things we discarded

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:07):

and everything else would have gotten an actual token

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:09):

It's possible there are some sneaky edge-cases there with errors :thinking:

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:09):

e.g. I think mismatched braces currently don't make it into the token stream

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:09):

(but they should)

view this post on Zulip Richard Feldman (Jul 10 2025 at 23:11):

yeah, makes sense!

view this post on Zulip Joshua Warner (Jul 10 2025 at 23:16):

@Luke Boswell

If we've got a solid plan to move forward with removing newlines

Let me take a swing at this to judge how hard this will actually be...

view this post on Zulip Luke Boswell (Jul 10 2025 at 23:20):

Sounds good :+1: thank you

view this post on Zulip Joshua Warner (Jul 11 2025 at 01:32):

Still a few bugs to fix up, but getting pretty close: https://github.com/roc-lang/roc/pull/8000

view this post on Zulip Joshua Warner (Jul 11 2025 at 01:32):

(nice round number there!)

view this post on Zulip Richard Feldman (Jul 11 2025 at 01:37):

over-8.jpg

view this post on Zulip Brendan Hansknecht (Jul 11 2025 at 02:47):

I like this way of going much more!!!

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:15):

This is a crazy histogram. What an absolutely crazy long tail

Screenshot 2025-07-12 at 12.15.13 PM.png

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:16):

This is the time per call to diagnosticToReport for one of the mega files that takes like 2 minutes to run.

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:22):

General question. Why do we make reports at all? Why don't we just stream the output diagnostics and print them right away? Why waste any allocations or memory at all building report objects?

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:23):

Seems that each report is ~1KB of memory use and requires like 10 allocations and 5 frees to make.

view this post on Zulip Joshua Warner (Jul 12 2025 at 19:27):

What is that a histogram of?

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:29):

Histogram of execution times of a single call to diagnosticToReport

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:30):

When checking one of my gigantic 1 million line of code files that makes a metric ton of errors

view this post on Zulip Richard Feldman (Jul 12 2025 at 19:42):

Brendan Hansknecht said:

General question. Why do we make reports at all? Why don't we just stream the output diagnostics and print them right away? Why waste any allocations or memory at all building report objects?

I think what we ideally want is:

view this post on Zulip Richard Feldman (Jul 12 2025 at 19:44):

so for example when writing reports to stderr I think it would be good to have an array of string buffers, one per module, and then when we decide to flush, we can do one pwritev to send them all to stderr in one syscall

view this post on Zulip Richard Feldman (Jul 12 2025 at 19:44):

but yeah I don't think there's any reason to make actual heap allocations for the reports, just for the string buffer so we're not making tons of tiny syscalls

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:50):

I think currently we still have full source in memory, so reports don't necessarily need any allocations or strings, more just need instructions on how to render

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:51):

Also, I guess I considered diagnostics the list we would sort and pass to tools and what not

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:51):

So why also make a list of reports

view this post on Zulip Brendan Hansknecht (Jul 12 2025 at 19:51):

But I guess they have richer info to pass to an lsp or something


Last updated: Jul 26 2025 at 12:14 UTC