Stream: compiler development

Topic: zig compiler - profiling and optimization


view this post on Zulip Brendan Hansknecht (Mar 16 2025 at 01:35):

So, I have been messing a lot with profiling. Its fun to tinker with the new compiler. Interesting to see some of the tradeoffs.

Thought I should make a standalone thread cause I assume there will be many findings over time and discussion.


A few random findings.

1 million line challenge

Parsing and formatting 1 million lines of syntax grab bag.
zig compiler is ~5x faster and 4x less memory.
In real terms, the zig compiler took ~300ms to parse and format the million lines.
It used ~300MB to do so.

The input file is 21MB.

c allocator vs the new zig smp allocator.

When dealing with the 100 files of 1000 lines:
c allocator uses way less memory than the zig smp allocator (~4x less memory)
That said, it also takes significantly longer runtime to do so (~1.4x slower)

Definitely something to consider switching to. Though need to test on more cases and such.

For 1 file of 1 million lines:
both allocators are essentially equivalent.

view this post on Zulip Brendan Hansknecht (Mar 16 2025 at 01:36):

Note, these numbers are with #7704 which is very important for large file perf.

view this post on Zulip Richard Feldman (Mar 16 2025 at 01:38):

@Andrew Kelley might be interested in those findings! :smiley:

view this post on Zulip Brendan Hansknecht (Mar 16 2025 at 01:45):

Also, I am still very new to the tracy profiler (demo), but it is an awesome tool for diving into performance. I think it will be extra useful once we start doing multi-threaded work. It has too many features for me to describe here, but I definitely should give a demo of using it with roc at some point.

I graciously borrowed how the zig compiler integrates tracy and have it on a branch. At some point soon, I want to make a PR for it. It is relatively non-invasive. Just a sprinkling of trace points, some build config, and an optional allocation tracker.

view this post on Zulip Brendan Hansknecht (Mar 16 2025 at 18:51):

One thing seen clearly from profiling is that container default capacities can save a lot of time by avoiding many reallocations on copies.

I was thinking of adding a bunch of initCapacity functions to our various datastructures, but realized that in many cases, the capacity wanted is not really known by the caller. I'm thinking of flipping the script and giving the data structures control of their default size. So calling init will simply allocate the default capacity we think is reasonable for a datastructure.

As an example, instead of adding initCapacity to the small string interner, we would just update the small string interner init function to always allocate enough space for (x strings of a specific size), maybe 1000 strings of 4 characters.

Thoughts?

I'm not totally sold on this idea, but it feels like it might be easier to tune on a per data structure level than at a per instance level.

view this post on Zulip Richard Feldman (Mar 16 2025 at 19:00):

I think what they did in the Zig compiler was to do some benchmarks on heuristics and go by that

view this post on Zulip Richard Feldman (Mar 16 2025 at 19:00):

like for example "here's how much to allocate for tokens as a multiple of size of source bytes"

view this post on Zulip Richard Feldman (Mar 16 2025 at 19:01):

not an exact science obviously, but can do heuristics based on measurements in the wild

view this post on Zulip Brendan Hansknecht (Mar 16 2025 at 19:01):

Yeah, that's a good point. A lot of this likely can have simple heuristics that go beyond datastructure specific and into input specific

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:25):

@Brendan Hansknecht how are you generating the input file?

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:26):

Input is current the syntax grab bag repeated a ton

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:26):

Also, a roughly equivalent version modified for the old compiler syntax.

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:27):

Repletion with definitely benefit the interner and lead to less allocating and regrowth though. So it is biased for sure.

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:27):

I really should update the builtins and/or basic CLI to the new syntax to get a more realistic feel.

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:29):

I have a large corpus of all public roc code, but written in the old syntax of course

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:29):

("large" is a few tens of mb)

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:30):

Been thinking about running that thru the migration formatter in the old compiler (that'll need a bit of work!) and then using that as a somewhat more realistic corpus

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:30):

Oh, that would be awesome to work to update and do some benchmarks on. I think we are still a bit away from supporting everything to use that corpus, but would be great

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:31):

Can we make that corpus a GitHub repo? And make two branches, one for old and one for new syntax?

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:31):

Or otherwise share it?

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:32):

http://osprey.biercewarner.com/tarball

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:32):

Could definitely make it a git repo

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:35):

How hard would it be to make the old compiler able to migrate the syntax?

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:35):

In theory that's like 90% done, just not hooked up to the command line yet

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:36):

https://github.com/roc-lang/roc/blob/main/crates/compiler/fmt/src/migrate.rs

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:36):

There are a few missing translations there that I know of, and likely some bugs. Basically completely untested.

view this post on Zulip Joshua Warner (Mar 17 2025 at 01:45):

Of course this is somewhat complicated by the old compiler still depending on zig 13 which breaks the build there, since I've upgraded to 14 for the new compiler :grimacing:

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 01:49):

Yeah, I just use nix for old compiler work

view this post on Zulip Joshua Warner (Mar 17 2025 at 02:37):

How hard would it be to just upgrade the old compiler to zig 14?

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 02:38):

Depends on how hard it ends up being to upgrade inkwell and llvm. Occasionally that is trivial. A lot of the time that is a huge hassle.

view this post on Zulip Joshua Warner (Mar 17 2025 at 02:38):

Oh oof; those are all locked together :grimacing:

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 02:39):

Yeah....one of the other huge gains of the new compiler is that we will generate llvm bitcode directly, which gives us much more flexibility to decouple that

view this post on Zulip Joshua Warner (Mar 17 2025 at 02:40):

I guess keeping separate envs for the old and new compiler it is

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 02:46):

I guess you could always just alias/swap out only zig

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 04:23):

Some of these optimizations are bespoke tuning that probably won't be kept or need proper heuristics, but otherwise are just simple cleanups to have less allocations overall.

optimization results (-38% execution time)

view this post on Zulip Luke Boswell (Mar 19 2025 at 05:17):

@Brendan Hansknecht -- we're adding a lot of knobs and dials for tuning the compiler. I appreciate these are all things that we can tune later.

I'm wondering if we should pull all the constants out into a single file.

view this post on Zulip Luke Boswell (Mar 19 2025 at 05:19):

Maybe one day we have some automated thing that can help us tune these based on real code (i.e. using something like Osprey)... but even manually it would be easier to surface all of these decisions if they are in one place.

view this post on Zulip Brendan Hansknecht (Mar 19 2025 at 05:52):

Yeah, definitely lots of nobs. I just have been learning tracy and thus tuning a bunch of random ones

view this post on Zulip Brendan Hansknecht (Mar 19 2025 at 05:53):

Appart for initial capacities, I don't think we'll have too many bespoke constants

view this post on Zulip Brendan Hansknecht (Mar 19 2025 at 05:53):

And capacities are likely something that should be tune with context

view this post on Zulip Brendan Hansknecht (Mar 19 2025 at 05:54):

That said, setting a constant somewhere for the default capacity if people don't know what to pick sounds like protentially a good idea.

view this post on Zulip Luke Boswell (Mar 19 2025 at 05:56):

What I like about putting the constants in a single file is that its easier to track the history of any changes. If we change constants in future based on some profiling... we will include the analysis/evaluation in the PR and so we always have a good point of reference that is easy to find.

view this post on Zulip Luke Boswell (Mar 19 2025 at 05:58):

I could also imagine a future where different users might want different parameters. Like maybe if I'm using roc in some special way I might want to change things to suit me.

view this post on Zulip Brendan Hansknecht (Mar 19 2025 at 06:03):

Yeah, makes some sense. I'm not fully sure there are good names for these various constants cause many of them will just be the starting size of arbitrary containers or maybe a ratio from the input source to the size. That is where local reasoning makes a lot of sense. But I totally understand the want to have all nobs in one place.


Last updated: Jul 06 2025 at 12:14 UTC