Can we add a -O1 equivalent to our compiler. I think it would be really useful for debugging perf issues. Though maybe also just adding good debug info could also fix this issue. (anyone know how easy/hard that would be?)
Anyway, a lot of the time, I need --optimized to get reasonable profiles, but the flamegraphs are useless due to too much inlining. On the other hand. Normal builds don't have enough optimizations and the profiles are useless due to wasting so much time.
I'm open to something like this, but it's always annoyed me that it's unclear what the levels mean
I wonder if there's some way we could make them more descriptive than that
I can almost never get a flamegraph at all
I think the answer is to support debuginfo first
It's not hard, just time consuming (and few of us have time, I suppose)
To get a flamegraph currently, I always compile the platforms from source and use the legacy linker. Can do an ok job sometimes. Cause it will use the function symbols for the flamegraph, but yeah, inlining from optimizations often ruins it.
Also, I know nothing about how llvm adds debug info, but if we can even just add all the function names, that would be amazing. So not a full source map, just function blocks essentially.
By the time we reach the mono ir do we have any mapping back to the original source?
Cause I think to add full debug info, we would need file path and source line/col to be possible to get while generating llvm ir from mono.
We do not. But we should be able to store that. Folkert and I have talked about it. Not super hard, just tedious because we'd need to build the lookaside table
So I noticed that we theoretically at least generate function and lexical scope debug info that would be super interesting. That said, I guess it is generated wrong/bugged so we strip it and never emit it.
Simply enabling emitting that sounds like it could be a huge win
Definitely +1
Not perfect, but a lot better for an optimized call graph. Not really sure what all the [aoc] functions are in the flamegraph.
So turns out to fix debug info is super duper simple and then we can get nice flamegraphs by using perf --call-graph dwarf. Will probably try to make a PR tomorrow (general question, when do we want to emit debug info vs strip it? should we change the --debug flag to decide that? maybe have a separate flag for dumping llvm ir?)
This is an optimized build of that set app with poor performance from #contributing > Set perf.
Apparently sets are actually no longer directly the performance problem 80% of the time is spent in refcount increment and decrement functions. (though maybe it is still some major perf issue related to set refcounts, idk).
opitimized set good debug info flamegraph
I love these flame graphs... I want to see these for all my programs. Are they easy to make on macos?
@Luke Boswell not easy but doable, some useful links:
flamegraph with xctrace on macos
using xctrace with roc executables
Alternative to approach to first link
when do we want to emit debug info vs strip it?
How about this?
--optimize also emits debug info.--optimize strips debug info unless you deliberately enable it with a --profiling flag (since that's the main use case)ooh I like the --profiling flag idea! :smiley:
That works for me
Should we include debug info from builtins or just strip that by default? Even if we strip it, we will always see the wrapping roc function, but it obviously won't give as much details into random interns.
I guess the only real concern I have with keeping it is protential build time performance, but maybe that isn't much of an issue. Just thought I would ask.
eh I think don't worry about builtins - we can just be careful to not include any dbgs in there haha
This isn't dbgs, this is debug info.
Like dwarf debug executable info
Also, I realized I should correct some wording. I meant debug info from zig bitcode when I said builtins above
ohhh gotcha
yeah I'm not sure :thinking:
I guess I should test adding it and just measure the cost. If in the future we deem it too slow, it is easy to remove. Just a one line change.
Ok, measured the cost. Keeping around zig bitcode debug info costs about 250ms.
On my m1 mac. (and this is just the cost for generating the object file. I am ignoring any extra linking costs)
I get the feeling that is too expensive to be on by default.
So I'll keep stripping builtins of debug info.
#6184 to get basic debug info working such that we can make nice flamegraphs at least.
Last updated: Jun 16 2026 at 16:19 UTC