zig compiler perf tracking · compiler development

I'm running on my PR branch which includes the module caching stuff, but other than that I can't think of any significant changes since Brendan's analysis

Anton (Jul 09 2025 at 10:45):

Could be due to zig 0.14.1 instead of 0.14.0

Luke Boswell (Jul 09 2025 at 10:49):

Running with a smaller file https://gist.github.com/lukewilliamboswell/532f48c70cfc3bca866c239cad291378

Luke Boswell (Jul 09 2025 at 10:49):

Looks like maybe PackedDataSpan is an issue, or at least we exceeded the assumptions we made using that.

Anton (Jul 09 2025 at 10:51):

Anton said:

I'll try with Brendan's perf fix PR

That gives me a panic:

thread 12630 panic: reached unreachable code
/home/username/Downloads/zig-linux-x86_64-0.14.0/lib/std/debug.zig:522:14: 0x109d64d in assert (roc)
    if (!ok) unreachable; // assertion failure
             ^
/home/username/gitrepos/roc/src/check/canonicalize/NodeStore.zig:1274:33: 0x120e3fa in addExpr (roc)
                std.debug.assert(PackedDataSpan.FunctionArgs.canFit(args.span));

Anton (Jul 09 2025 at 10:51):

Luke Boswell said:

Running with a smaller file https://gist.github.com/lukewilliamboswell/532f48c70cfc3bca866c239cad291378

Oh yeah, that's my panic too :p

Luke Boswell (Jul 09 2025 at 10:53):

I added that PackedDataSpan ... but definitely wasn't thinking about mammoth files like this when I did that

Luke Boswell (Jul 09 2025 at 10:56):

That's 47,028 dot access expressions (List.something)! Each one creates a e_dot_access node in the CIR, and if each has arguments, it needs to store a span. This is exactly what's causing the memory explosion.

Summary

Root Cause

File Structure: The 10MB file contains ~47,000 dot access expressions (like List.len, List.get_unsafe, etc.)
Memory Explosion: Each dot access expression creates a e_dot_access node that requires storing argument spans
Limit Exceeded: The PackedDataSpan.FunctionArgs configuration (20 bits start, 12 bits length) can only handle start positions up to ~1M, but the large number of expressions causes the data structure indices to exceed this limit

Luke Boswell (Jul 09 2025 at 10:58):

I've got an idea

Luke Boswell (Jul 09 2025 at 11:24):

https://github.com/roc-lang/roc/pull/7980

Luke Boswell (Jul 09 2025 at 11:27):

My idea was to stop creating new malformed nodes after a certain threshold and just re-use one node.

Luke Boswell (Jul 09 2025 at 11:28):

That doesn't seem to have completely solved our problem

Luke Boswell (Jul 09 2025 at 11:54):

I think I found another problem causing exponential growth in CIR nodes

Luke Boswell (Jul 09 2025 at 12:08):

Something strange is defintely happening somewhere. I'm using ~3GB for around 1_000_000 nodes.

Luke Boswell (Jul 09 2025 at 12:24):

Some stats https://gist.github.com/lukewilliamboswell/cc52a944807ef16cd357356eda438ecb

Richard Feldman (Jul 09 2025 at 13:17):

Total CIR nodes created: 1023040

that's fascinating - so this is about 1M CIR nodes for about 1M LoC of non-comments

Richard Feldman (Jul 09 2025 at 13:18):

I would assumed a much higher average of CIR nodes per line, even with a lot of lines being just closing delimiters

Richard Feldman (Jul 09 2025 at 13:18):

or blank lines I guess

Richard Feldman (Jul 09 2025 at 13:22):

that's wild bc it suggests 16-bit indices would work for modules up to like 30-60K LoC depending on how much type instantiation was happening

Luke Boswell (Jul 09 2025 at 20:19):

After sleeping on this issue, I think I know the problem. It may be easy to fix our memory problem.

Brendan Hansknecht (Jul 09 2025 at 20:32):

When I measured, I think I saw 3GB of RAM, which is still quite high, but nothing like 64

Luke Boswell (Jul 09 2025 at 22:32):

To fix this memory thing properly. This is what I think we need to do.

Have a counter, and increment every time we push a diagnostic

Once that counter hits a threshold ~10_000 errors or something

We then no longer allocate any memory for new diagnostics. We use a placeholder malformed node (the same one can be re-used) that just says "TOO MANY ERRORS".

It's a bit of a mechanical change, but I think it's necessary so we don't keep allocating strings and random things that we'll never use or need later.

Luke Boswell (Jul 09 2025 at 22:37):

Here's is an example of the culprit which is common across Can. We're allocating a new string and allocating another Node in the store. Both of these will never be needed if we have thousands of errors already.

.crash => |crash_stmt| {
    // Not valid at top-level
    const string_idx = self.can_ir.env.strings.insert(self.can_ir.env.gpa, "crash");
    const region = self.parse_ir.tokenizedRegionToRegion(crash_stmt.region);
    self.can_ir.pushDiagnostic(CIR.Diagnostic{ .invalid_top_level_statement = .{
        .stmt = string_idx,
        .region = region,
    } });
    last_type_anno = null; // Clear on non-annotation statement
},

Luke Boswell (Jul 09 2025 at 22:40):

I'd like some thoughts on this before I try and implement this. It's a pretty mechanical change, I guess I could start with just the counter and a just few high priority areas, then gradually implement in later PR's.

Richard Feldman (Jul 09 2025 at 22:47):

I think that's fine; lots of compilers do it, and it's not like anyone can usefully process 400K errors at once anyway :stuck_out_tongue:

Brendan Hansknecht (Jul 09 2025 at 23:14):

Hmm....is that the root cause of the memory issue?

Brendan Hansknecht (Jul 09 2025 at 23:16):

Also, arent string deduplicated? On top of that, static strings like "crash" we should just never allocate ever

Brendan Hansknecht (Jul 09 2025 at 23:19):

Like we still should limit diagnostics, but I don't think it is the root cause of the 64 gb oom

Luke Boswell (Jul 10 2025 at 06:09):

Found another related issue... we are getting Regions start at 0 and end somewhere in the middle of the file. When we slice that for our reports we utf-8 validate the whole thing, over and over again thousands of times.

$ wasmtime --dir=. --profile=guest zig-out/bin/roc.wasm check New-List-Cutoff_10MB.roc
WARNING: Large region detected: start=0 end=43906 size=43906 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=44523 size=44523 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=50011 size=50011 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=99849 size=99849 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=100466 size=100466 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=105954 size=105954 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=155792 size=155792 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=156409 size=156409 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=161897 size=161897 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=211735 size=211735 - may cause slow UTF-8 validation

This goes on for thousands of lines.

Luke Boswell (Jul 10 2025 at 06:50):

Here is a fix/workaround https://github.com/roc-lang/roc/pull/7995

Luke Boswell (Jul 10 2025 at 06:52):

This fixes the memory issues entirely for me. The profiling for New-List-Cutoff_10MB.roc looks completely normal and is no longer dominated by utf-8 validation. It runs to completion though prints out all 1,000 errors to the terminal.

Found 35197 error(s) and 17939 warning(s) in 21785.9 ms for New-List-Cutoff_10MB.roc.

Luke Boswell (Jul 10 2025 at 06:54):

Correction .. I hit a debug assertion and it crashes. But in ReleaseSmall it runs fine. Time to track down the next issue I guess :sweat_smile:

Brendan Hansknecht (Jul 10 2025 at 06:56):

I'm not really for that workaround. I think we should just correct newlines to have proper region info. That or we should correct parsing to not include new lines as the starting and ending tokens for nodes that we will report diagnostic on.

Brendan Hansknecht (Jul 10 2025 at 06:56):

This solution feels more bug prone and like it might just randomly bite us later

Brendan Hansknecht (Jul 10 2025 at 06:58):

I'm not sure if they should be... it may have been a performance optimization

Correct functionality before random performance optmizations that may or may not work.

Luke Boswell (Jul 10 2025 at 07:10):

I don't love this solution, but it's only temporary I think. It resolves the immediate memory issue, and is noisy (but not blocking) to help us track down any issues.

I've also resolved the debug assertion, and added a limit on the number of warnings we print.

Luke Boswell (Jul 10 2025 at 07:10):

I'm avoiding significant changes in the Parser while @Anthony Bullard works on his refactor.

Luke Boswell (Jul 10 2025 at 07:11):

I'm also totally up for giving them proper regions.

Luke Boswell (Jul 10 2025 at 07:14):

At least with this workaround, we can see the next perf issue which is our CIR diagnostics... it looks like they need the same treatment we just gave AST diagnostics. We should limit the amount we create and re-use a common malformed node after a certain number.

Joshua Warner (Jul 10 2025 at 19:14):

I would actually like to get rid of newline tokens all together

Richard Feldman (Jul 10 2025 at 19:43):

yeah what do we still use them for?

Richard Feldman (Jul 10 2025 at 19:43):

formatter heuristics?

Joshua Warner (Jul 10 2025 at 19:45):

That's all I think

Joshua Warner (Jul 10 2025 at 19:46):

The parser completely skips them, and even so sometimes they cause weird bugs (or at least the potential for bugs) in the parser

Richard Feldman (Jul 10 2025 at 19:54):

fair, although @Anton made the point that not having any control over newlines (e.g. wanting to put blank lines between some assignments and not others) would be a pain.

is there some other way we could do that? e.g. use Region info in the formatter to scan for newlines between assignments and if there's more than 1 that means you want a blank line?

Kiryl Dziamura (Jul 10 2025 at 19:59):

Newlines significant only in some particular places? It makes sense to leave them only where user can contol them for having gaps.

Also, can things like if/else have a parameter is_multiline? It would save tokens in such places. Maybe it's an obvious idea

Kiryl Dziamura (Jul 10 2025 at 21:51):

Or, if parse ast and cir have the same indexes, newlines aren't needed indeed (because they're noop). It's possible to have a parrallel vector for newline gaps that will contain only indexes of the tokens after which user want to have a gap (on the other hand, u32 per gap is not that great? But this collection would have no region info so memory consumption would be 50% less)

Luke Boswell (Jul 10 2025 at 22:24):

So I'm wondering if we should merge this workaround PR https://github.com/roc-lang/roc/pull/7995 I don't feel strongly about it, it was helpful to understand why we were chewing up all that memory.

If we've got a solid plan to move forward with removing newlines or fixing regions then our perf/memory issue won't be a problem for long, and would only a delay our ability to use the profiler effectively until we resolve the underlying root cause.

Is anyone interesting in fixing these?

Joshua Warner (Jul 10 2025 at 22:43):

is there some other way we could do that? e.g. use Region info in the formatter to scan for newlines between assignments and if there's more than 1 that means you want a blank line?

The formatter already needs to look at the original source to pull out comments, so of course it can still look there to check whether there's a double (or multiple) newline and preserve that if we want.

Richard Feldman (Jul 10 2025 at 22:48):

yikes, that's how we're doing comments? :grimacing:

Joshua Warner (Jul 10 2025 at 22:50):

Tell me more about that 'yikes'

Joshua Warner (Jul 10 2025 at 22:51):

It avoids a whole bunch of issues with comment tracking and placement that the parser and AST are now freed from

Joshua Warner (Jul 10 2025 at 22:51):

All that complexity is concentrated in the formatter, which is the only place that cares about this

Joshua Warner (Jul 10 2025 at 22:52):

Consistent comment tracking was probably the #1 hardest thing to get right in the old parser

Richard Feldman (Jul 10 2025 at 23:03):

maybe I'm misremembering, but I thought the plan was going to be:

give the tokenizer a flag where it tokenizes comments into a side table by just storing their Regions (instead of discarding them, which it would do if the flag is false)
then when the formatter is going through emitting things, it can compare the current regions it's working with to those regions in the side table to figure out where the comments are and interleave them back in

Richard Feldman (Jul 10 2025 at 23:03):

that sounded pretty simple to me, but maybe I'm misremembering or misunderstanding something :sweat_smile:

Richard Feldman (Jul 10 2025 at 23:04):

the yikes is just about re-tokenizing in the middle of formatting, sounds like a lot of extra processing to do

Richard Feldman (Jul 10 2025 at 23:04):

I totally agree that the way we did comments in the previous compiler should not be repeated

Joshua Warner (Jul 10 2025 at 23:05):

There's no retokenizing necessary; we're just looking at the space between tokens

Joshua Warner (Jul 10 2025 at 23:05):

(based on token regions)

Joshua Warner (Jul 10 2025 at 23:06):

Also no side-table necessary either!

Richard Feldman (Jul 10 2025 at 23:07):

I see, so I guess we're kind of assuming that the space between the tokens is all comments and/or whitespace, since those are the things we discarded

Richard Feldman (Jul 10 2025 at 23:07):

and everything else would have gotten an actual token

Joshua Warner (Jul 10 2025 at 23:09):

It's possible there are some sneaky edge-cases there with errors :thinking:

Joshua Warner (Jul 10 2025 at 23:09):

e.g. I think mismatched braces currently don't make it into the token stream

Joshua Warner (Jul 10 2025 at 23:09):

(but they should)

Richard Feldman (Jul 10 2025 at 23:11):

yeah, makes sense!

Joshua Warner (Jul 10 2025 at 23:16):

@Luke Boswell

If we've got a solid plan to move forward with removing newlines

Let me take a swing at this to judge how hard this will actually be...

Luke Boswell (Jul 10 2025 at 23:20):

Sounds good :+1: thank you

Joshua Warner (Jul 11 2025 at 01:32):

Still a few bugs to fix up, but getting pretty close: https://github.com/roc-lang/roc/pull/8000

Joshua Warner (Jul 11 2025 at 01:32):

(nice round number there!)

Richard Feldman (Jul 11 2025 at 01:37):

over-8.jpg

Brendan Hansknecht (Jul 11 2025 at 02:47):

I like this way of going much more!!!

Brendan Hansknecht (Jul 12 2025 at 19:15):

This is a crazy histogram. What an absolutely crazy long tail

Screenshot 2025-07-12 at 12.15.13 PM.png

Brendan Hansknecht (Jul 12 2025 at 19:16):

This is the time per call to diagnosticToReport for one of the mega files that takes like 2 minutes to run.

Brendan Hansknecht (Jul 12 2025 at 19:22):

General question. Why do we make reports at all? Why don't we just stream the output diagnostics and print them right away? Why waste any allocations or memory at all building report objects?

Brendan Hansknecht (Jul 12 2025 at 19:23):

Seems that each report is ~1KB of memory use and requires like 10 allocations and 5 frees to make.

Joshua Warner (Jul 12 2025 at 19:27):

What is that a histogram of?

Brendan Hansknecht (Jul 12 2025 at 19:29):

Histogram of execution times of a single call to diagnosticToReport

Brendan Hansknecht (Jul 12 2025 at 19:30):

When checking one of my gigantic 1 million line of code files that makes a metric ton of errors

Richard Feldman (Jul 12 2025 at 19:42):

Brendan Hansknecht said:

General question. Why do we make reports at all? Why don't we just stream the output diagnostics and print them right away? Why waste any allocations or memory at all building report objects?

I think what we ideally want is:

some abstraction for how we format reports and where we output them, so we can e.g. stream to stderr or send to a language server or send to a UI in a browser or GUI app using Roc for plugins, and also we can use different formats (e.g. ANSI escape codes for terminal, html for browser, something else for editors)
buffering so that we aren't doing a gazillion syscalls
sorting based on module graph so that when you're rebuilding the same project over and over you don't see the same errors jumping around in terms of ordering; rather, they're stable in terms of ordering
streaming, so we flush the buffers as soon as we can (and decide to)

Richard Feldman (Jul 12 2025 at 19:44):

so for example when writing reports to stderr I think it would be good to have an array of string buffers, one per module, and then when we decide to flush, we can do one pwritev to send them all to stderr in one syscall

Richard Feldman (Jul 12 2025 at 19:44):

but yeah I don't think there's any reason to make actual heap allocations for the reports, just for the string buffer so we're not making tons of tiny syscalls

Brendan Hansknecht (Jul 12 2025 at 19:50):

I think currently we still have full source in memory, so reports don't necessarily need any allocations or strings, more just need instructions on how to render

Brendan Hansknecht (Jul 12 2025 at 19:51):

Also, I guess I considered diagnostics the list we would sort and pass to tools and what not

Brendan Hansknecht (Jul 12 2025 at 19:51):

So why also make a list of reports

Brendan Hansknecht (Jul 12 2025 at 19:51):

But I guess they have richer info to pass to an lsp or something

Last updated: Jul 26 2025 at 12:14 UTC