Optimize, but only a little · ideas

Stream: ideas

Topic: Optimize, but only a little

Brendan Hansknecht (Dec 04 2023 at 02:04):

Can we add a -O1 equivalent to our compiler. I think it would be really useful for debugging perf issues. Though maybe also just adding good debug info could also fix this issue. (anyone know how easy/hard that would be?)

Anyway, a lot of the time, I need --optimized to get reasonable profiles, but the flamegraphs are useless due to too much inlining. On the other hand. Normal builds don't have enough optimizations and the profiles are useless due to wasting so much time.

Richard Feldman (Dec 04 2023 at 02:08):

I'm open to something like this, but it's always annoyed me that it's unclear what the levels mean

Richard Feldman (Dec 04 2023 at 02:08):

I wonder if there's some way we could make them more descriptive than that

Ayaz Hafiz (Dec 04 2023 at 03:28):

I can almost never get a flamegraph at all

Ayaz Hafiz (Dec 04 2023 at 03:28):

I think the answer is to support debuginfo first

Ayaz Hafiz (Dec 04 2023 at 03:29):

It's not hard, just time consuming (and few of us have time, I suppose)

Brendan Hansknecht (Dec 04 2023 at 03:34):

To get a flamegraph currently, I always compile the platforms from source and use the legacy linker. Can do an ok job sometimes. Cause it will use the function symbols for the flamegraph, but yeah, inlining from optimizations often ruins it.

Brendan Hansknecht (Dec 04 2023 at 03:35):

Also, I know nothing about how llvm adds debug info, but if we can even just add all the function names, that would be amazing. So not a full source map, just function blocks essentially.

Brendan Hansknecht (Dec 04 2023 at 04:30):

By the time we reach the mono ir do we have any mapping back to the original source?

Brendan Hansknecht (Dec 04 2023 at 04:31):

Cause I think to add full debug info, we would need file path and source line/col to be possible to get while generating llvm ir from mono.

Ayaz Hafiz (Dec 04 2023 at 04:45):

We do not. But we should be able to store that. Folkert and I have talked about it. Not super hard, just tedious because we'd need to build the lookaside table

Brendan Hansknecht (Dec 04 2023 at 04:56):

So I noticed that we theoretically at least generate function and lexical scope debug info that would be super interesting. That said, I guess it is generated wrong/bugged so we strip it and never emit it.

Brendan Hansknecht (Dec 04 2023 at 04:56):

Simply enabling emitting that sounds like it could be a huge win

Ayaz Hafiz (Dec 04 2023 at 05:03):

Definitely +1

Brendan Hansknecht (Dec 04 2023 at 05:58):

Not perfect, but a lot better for an optimized call graph. Not really sure what all the [aoc] functions are in the flamegraph.

flamegraph.svg

Brendan Hansknecht (Dec 04 2023 at 06:20):

So turns out to fix debug info is super duper simple and then we can get nice flamegraphs by using perf --call-graph dwarf. Will probably try to make a PR tomorrow (general question, when do we want to emit debug info vs strip it? should we change the --debug flag to decide that? maybe have a separate flag for dumping llvm ir?)

This is an optimized build of that set app with poor performance from #contributing > Set perf.

Apparently sets are actually no longer directly the performance problem 80% of the time is spent in refcount increment and decrement functions. (though maybe it is still some major perf issue related to set refcounts, idk).

opitimized set good debug info flamegraph

Luke Boswell (Dec 04 2023 at 08:25):

I love these flame graphs... I want to see these for all my programs. Are they easy to make on macos?

Anton (Dec 04 2023 at 10:51):

@Luke Boswell not easy but doable, some useful links:
flamegraph with xctrace on macos
using xctrace with roc executables

Alternative to approach to first link

Brian Carroll (Dec 04 2023 at 14:54):

when do we want to emit debug info vs strip it?

How about this?

Dev backends always emit debug info. They are for development after all. Emitting function names was easy for the Wasm dev backend, I assume it shouldn't be hard for the others either.
LLVM without --optimize also emits debug info.
LLVM with --optimize strips debug info unless you deliberately enable it with a --profiling flag (since that's the main use case)

Richard Feldman (Dec 04 2023 at 16:05):

ooh I like the --profiling flag idea! :smiley:

Brendan Hansknecht (Dec 04 2023 at 16:06):

That works for me

Brendan Hansknecht (Dec 04 2023 at 18:40):

Should we include debug info from builtins or just strip that by default? Even if we strip it, we will always see the wrapping roc function, but it obviously won't give as much details into random interns.

I guess the only real concern I have with keeping it is protential build time performance, but maybe that isn't much of an issue. Just thought I would ask.

Richard Feldman (Dec 04 2023 at 18:58):

eh I think don't worry about builtins - we can just be careful to not include any dbgs in there haha

Brendan Hansknecht (Dec 04 2023 at 18:59):

This isn't dbgs, this is debug info.

Brendan Hansknecht (Dec 04 2023 at 18:59):

Like dwarf debug executable info

Brendan Hansknecht (Dec 04 2023 at 19:01):

Also, I realized I should correct some wording. I meant debug info from zig bitcode when I said builtins above

Richard Feldman (Dec 04 2023 at 19:06):

ohhh gotcha

Richard Feldman (Dec 04 2023 at 19:07):

yeah I'm not sure :thinking:

Brendan Hansknecht (Dec 04 2023 at 19:17):

I guess I should test adding it and just measure the cost. If in the future we deem it too slow, it is easy to remove. Just a one line change.

Brendan Hansknecht (Dec 04 2023 at 21:36):

Ok, measured the cost. Keeping around zig bitcode debug info costs about 250ms.

Brendan Hansknecht (Dec 04 2023 at 21:36):

On my m1 mac. (and this is just the cost for generating the object file. I am ignoring any extra linking costs)

Brendan Hansknecht (Dec 04 2023 at 21:37):

I get the feeling that is too expensive to be on by default.

Brendan Hansknecht (Dec 04 2023 at 21:37):

So I'll keep stripping builtins of debug info.

Brendan Hansknecht (Dec 04 2023 at 21:58):

#6184 to get basic debug info working such that we can make nice flamegraphs at least.

Last updated: Jul 23 2026 at 13:15 UTC