If I run
roc check New-List.roc
it kills my terminal after a while :p
Turns out this is because it's trying eat all my RAM (64 GB)
Yeah, we need to implement some kind of cut-off or streaming for the problem reports
It's strange that Brendan didn't experience any issues, he's probably on macos
He mentioned that he did
He said there was thousands of errors I recall
I meant memory issues, lots of Roc errors is ok to test our perf
60s -> generating diagnostic from can (this is after my fix that made this part way way faster)
Perhaps this perf fix is not yet merged in @Brendan Hansknecht?
Ok no, looks like it's merged in https://github.com/roc-lang/roc/pull/7938
Screenshot 2025-07-09 at 20.42.21.png
I killed it at 8GB
./zig-out/bin/roc check ~/Documents/New-List.roc
Hmm, could this difference in behavior be due to a recent commit?
Definitely could be
I'll try with Brendan's perf fix PR
I'm running on my PR branch which includes the module caching stuff, but other than that I can't think of any significant changes since Brendan's analysis
Could be due to zig 0.14.1 instead of 0.14.0
Running with a smaller file https://gist.github.com/lukewilliamboswell/532f48c70cfc3bca866c239cad291378
Looks like maybe PackedDataSpan is an issue, or at least we exceeded the assumptions we made using that.
Anton said:
I'll try with Brendan's perf fix PR
That gives me a panic:
thread 12630 panic: reached unreachable code
/home/username/Downloads/zig-linux-x86_64-0.14.0/lib/std/debug.zig:522:14: 0x109d64d in assert (roc)
if (!ok) unreachable; // assertion failure
^
/home/username/gitrepos/roc/src/check/canonicalize/NodeStore.zig:1274:33: 0x120e3fa in addExpr (roc)
std.debug.assert(PackedDataSpan.FunctionArgs.canFit(args.span));
Luke Boswell said:
Running with a smaller file https://gist.github.com/lukewilliamboswell/532f48c70cfc3bca866c239cad291378
Oh yeah, that's my panic too :p
I added that PackedDataSpan ... but definitely wasn't thinking about mammoth files like this when I did that
That's 47,028 dot access expressions (List.something
)! Each one creates a e_dot_access
node in the CIR, and if each has arguments, it needs to store a span. This is exactly what's causing the memory explosion.
List.len
, List.get_unsafe
, etc.)e_dot_access
node that requires storing argument spansPackedDataSpan.FunctionArgs
configuration (20 bits start, 12 bits length) can only handle start positions up to ~1M, but the large number of expressions causes the data structure indices to exceed this limitI've got an idea
https://github.com/roc-lang/roc/pull/7980
My idea was to stop creating new malformed nodes after a certain threshold and just re-use one node.
That doesn't seem to have completely solved our problem
I think I found another problem causing exponential growth in CIR nodes
Something strange is defintely happening somewhere. I'm using ~3GB for around 1_000_000 nodes.
Some stats https://gist.github.com/lukewilliamboswell/cc52a944807ef16cd357356eda438ecb
Total CIR nodes created: 1023040
that's fascinating - so this is about 1M CIR nodes for about 1M LoC of non-comments
I would assumed a much higher average of CIR nodes per line, even with a lot of lines being just closing delimiters
or blank lines I guess
that's wild bc it suggests 16-bit indices would work for modules up to like 30-60K LoC depending on how much type instantiation was happening
After sleeping on this issue, I think I know the problem. It may be easy to fix our memory problem.
When I measured, I think I saw 3GB of RAM, which is still quite high, but nothing like 64
To fix this memory thing properly. This is what I think we need to do.
Have a counter, and increment every time we push a diagnostic
Once that counter hits a threshold ~10_000 errors or something
We then no longer allocate any memory for new diagnostics. We use a placeholder malformed node (the same one can be re-used) that just says "TOO MANY ERRORS".
It's a bit of a mechanical change, but I think it's necessary so we don't keep allocating strings and random things that we'll never use or need later.
Here's is an example of the culprit which is common across Can. We're allocating a new string and allocating another Node in the store. Both of these will never be needed if we have thousands of errors already.
.crash => |crash_stmt| {
// Not valid at top-level
const string_idx = self.can_ir.env.strings.insert(self.can_ir.env.gpa, "crash");
const region = self.parse_ir.tokenizedRegionToRegion(crash_stmt.region);
self.can_ir.pushDiagnostic(CIR.Diagnostic{ .invalid_top_level_statement = .{
.stmt = string_idx,
.region = region,
} });
last_type_anno = null; // Clear on non-annotation statement
},
I'd like some thoughts on this before I try and implement this. It's a pretty mechanical change, I guess I could start with just the counter and a just few high priority areas, then gradually implement in later PR's.
I think that's fine; lots of compilers do it, and it's not like anyone can usefully process 400K errors at once anyway :stuck_out_tongue:
Hmm....is that the root cause of the memory issue?
Also, arent string deduplicated? On top of that, static strings like "crash" we should just never allocate ever
Like we still should limit diagnostics, but I don't think it is the root cause of the 64 gb oom
Found another related issue... we are getting Regions start at 0
and end somewhere in the middle of the file. When we slice that for our reports we utf-8 validate the whole thing, over and over again thousands of times.
$ wasmtime --dir=. --profile=guest zig-out/bin/roc.wasm check New-List-Cutoff_10MB.roc
WARNING: Large region detected: start=0 end=43906 size=43906 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=44523 size=44523 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=50011 size=50011 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=99849 size=99849 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=100466 size=100466 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=105954 size=105954 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=155792 size=155792 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=156409 size=156409 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=161897 size=161897 - may cause slow UTF-8 validation
WARNING: Large region detected: start=0 end=211735 size=211735 - may cause slow UTF-8 validation
This goes on for thousands of lines.
Here is a fix/workaround https://github.com/roc-lang/roc/pull/7995
This fixes the memory issues entirely for me. The profiling for New-List-Cutoff_10MB.roc
looks completely normal and is no longer dominated by utf-8 validation. It runs to completion though prints out all 1,000 errors to the terminal.
Found 35197 error(s) and 17939 warning(s) in 21785.9 ms for New-List-Cutoff_10MB.roc.
Correction .. I hit a debug assertion and it crashes. But in ReleaseSmall it runs fine. Time to track down the next issue I guess :sweat_smile:
I'm not really for that workaround. I think we should just correct newlines to have proper region info. That or we should correct parsing to not include new lines as the starting and ending tokens for nodes that we will report diagnostic on.
This solution feels more bug prone and like it might just randomly bite us later
I'm not sure if they should be... it may have been a performance optimization
Correct functionality before random performance optmizations that may or may not work.
I don't love this solution, but it's only temporary I think. It resolves the immediate memory issue, and is noisy (but not blocking) to help us track down any issues.
I've also resolved the debug assertion, and added a limit on the number of warnings we print.
I'm avoiding significant changes in the Parser while @Anthony Bullard works on his refactor.
I'm also totally up for giving them proper regions.
At least with this workaround, we can see the next perf issue which is our CIR diagnostics... it looks like they need the same treatment we just gave AST diagnostics. We should limit the amount we create and re-use a common malformed node after a certain number.
I would actually like to get rid of newline tokens all together
yeah what do we still use them for?
formatter heuristics?
That's all I think
The parser completely skips them, and even so sometimes they cause weird bugs (or at least the potential for bugs) in the parser
fair, although @Anton made the point that not having any control over newlines (e.g. wanting to put blank lines between some assignments and not others) would be a pain.
is there some other way we could do that? e.g. use Region info in the formatter to scan for newlines between assignments and if there's more than 1 that means you want a blank line?
Newlines significant only in some particular places? It makes sense to leave them only where user can contol them for having gaps.
Also, can things like if/else have a parameter is_multiline
? It would save tokens in such places. Maybe it's an obvious idea
Or, if parse ast and cir have the same indexes, newlines aren't needed indeed (because they're noop). It's possible to have a parrallel vector for newline gaps that will contain only indexes of the tokens after which user want to have a gap (on the other hand, u32 per gap is not that great? But this collection would have no region info so memory consumption would be 50% less)
So I'm wondering if we should merge this workaround PR https://github.com/roc-lang/roc/pull/7995 I don't feel strongly about it, it was helpful to understand why we were chewing up all that memory.
If we've got a solid plan to move forward with removing newlines or fixing regions then our perf/memory issue won't be a problem for long, and would only a delay our ability to use the profiler effectively until we resolve the underlying root cause.
Is anyone interesting in fixing these?
is there some other way we could do that? e.g. use Region info in the formatter to scan for newlines between assignments and if there's more than 1 that means you want a blank line?
The formatter already needs to look at the original source to pull out comments, so of course it can still look there to check whether there's a double (or multiple) newline and preserve that if we want.
yikes, that's how we're doing comments? :grimacing:
Tell me more about that 'yikes'
It avoids a whole bunch of issues with comment tracking and placement that the parser and AST are now freed from
All that complexity is concentrated in the formatter, which is the only place that cares about this
Consistent comment tracking was probably the #1 hardest thing to get right in the old parser
maybe I'm misremembering, but I thought the plan was going to be:
that sounded pretty simple to me, but maybe I'm misremembering or misunderstanding something :sweat_smile:
the yikes is just about re-tokenizing in the middle of formatting, sounds like a lot of extra processing to do
I totally agree that the way we did comments in the previous compiler should not be repeated
There's no retokenizing necessary; we're just looking at the space between tokens
(based on token regions)
Also no side-table necessary either!
I see, so I guess we're kind of assuming that the space between the tokens is all comments and/or whitespace, since those are the things we discarded
and everything else would have gotten an actual token
It's possible there are some sneaky edge-cases there with errors :thinking:
e.g. I think mismatched braces currently don't make it into the token stream
(but they should)
yeah, makes sense!
@Luke Boswell
If we've got a solid plan to move forward with removing newlines
Let me take a swing at this to judge how hard this will actually be...
Sounds good :+1: thank you
Still a few bugs to fix up, but getting pretty close: https://github.com/roc-lang/roc/pull/8000
(nice round number there!)
I like this way of going much more!!!
This is a crazy histogram. What an absolutely crazy long tail
Screenshot 2025-07-12 at 12.15.13 PM.png
This is the time per call to diagnosticToReport
for one of the mega files that takes like 2 minutes to run.
General question. Why do we make reports at all? Why don't we just stream the output diagnostics and print them right away? Why waste any allocations or memory at all building report objects?
Seems that each report is ~1KB of memory use and requires like 10 allocations and 5 frees to make.
What is that a histogram of?
Histogram of execution times of a single call to diagnosticToReport
When checking one of my gigantic 1 million line of code files that makes a metric ton of errors
Brendan Hansknecht said:
General question. Why do we make reports at all? Why don't we just stream the output diagnostics and print them right away? Why waste any allocations or memory at all building report objects?
I think what we ideally want is:
so for example when writing reports to stderr I think it would be good to have an array of string buffers, one per module, and then when we decide to flush, we can do one pwritev
to send them all to stderr in one syscall
but yeah I don't think there's any reason to make actual heap allocations for the reports, just for the string buffer so we're not making tons of tiny syscalls
I think currently we still have full source in memory, so reports don't necessarily need any allocations or strings, more just need instructions on how to render
Also, I guess I considered diagnostics the list we would sort and pass to tools and what not
So why also make a list of reports
But I guess they have richer info to pass to an lsp or something
Last updated: Jul 26 2025 at 12:14 UTC