So, I have been messing a lot with profiling. Its fun to tinker with the new compiler. Interesting to see some of the tradeoffs.
Thought I should make a standalone thread cause I assume there will be many findings over time and discussion.
Parsing and formatting 1 million lines of syntax grab bag.
zig compiler is ~5x faster and 4x less memory.
In real terms, the zig compiler took ~300ms to parse and format the million lines.
It used ~300MB to do so.
The input file is 21MB.
When dealing with the 100 files of 1000 lines:
c allocator uses way less memory than the zig smp allocator (~4x less memory)
That said, it also takes significantly longer runtime to do so (~1.4x slower)
Definitely something to consider switching to. Though need to test on more cases and such.
For 1 file of 1 million lines:
both allocators are essentially equivalent.
Note, these numbers are with #7704 which is very important for large file perf.
@Andrew Kelley might be interested in those findings! :smiley:
Also, I am still very new to the tracy profiler (demo), but it is an awesome tool for diving into performance. I think it will be extra useful once we start doing multi-threaded work. It has too many features for me to describe here, but I definitely should give a demo of using it with roc at some point.
I graciously borrowed how the zig compiler integrates tracy and have it on a branch. At some point soon, I want to make a PR for it. It is relatively non-invasive. Just a sprinkling of trace points, some build config, and an optional allocation tracker.
One thing seen clearly from profiling is that container default capacities can save a lot of time by avoiding many reallocations on copies.
I was thinking of adding a bunch of initCapacity
functions to our various datastructures, but realized that in many cases, the capacity wanted is not really known by the caller. I'm thinking of flipping the script and giving the data structures control of their default size. So calling init will simply allocate the default capacity we think is reasonable for a datastructure.
As an example, instead of adding initCapacity to the small string interner, we would just update the small string interner init function to always allocate enough space for (x strings of a specific size), maybe 1000 strings of 4 characters.
Thoughts?
I'm not totally sold on this idea, but it feels like it might be easier to tune on a per data structure level than at a per instance level.
I think what they did in the Zig compiler was to do some benchmarks on heuristics and go by that
like for example "here's how much to allocate for tokens as a multiple of size of source bytes"
not an exact science obviously, but can do heuristics based on measurements in the wild
Yeah, that's a good point. A lot of this likely can have simple heuristics that go beyond datastructure specific and into input specific
@Brendan Hansknecht how are you generating the input file?
Input is current the syntax grab bag repeated a ton
Also, a roughly equivalent version modified for the old compiler syntax.
Repletion with definitely benefit the interner and lead to less allocating and regrowth though. So it is biased for sure.
I really should update the builtins and/or basic CLI to the new syntax to get a more realistic feel.
I have a large corpus of all public roc code, but written in the old syntax of course
("large" is a few tens of mb)
Been thinking about running that thru the migration formatter in the old compiler (that'll need a bit of work!) and then using that as a somewhat more realistic corpus
Oh, that would be awesome to work to update and do some benchmarks on. I think we are still a bit away from supporting everything to use that corpus, but would be great
Can we make that corpus a GitHub repo? And make two branches, one for old and one for new syntax?
Or otherwise share it?
http://osprey.biercewarner.com/tarball
Could definitely make it a git repo
How hard would it be to make the old compiler able to migrate the syntax?
In theory that's like 90% done, just not hooked up to the command line yet
https://github.com/roc-lang/roc/blob/main/crates/compiler/fmt/src/migrate.rs
There are a few missing translations there that I know of, and likely some bugs. Basically completely untested.
Of course this is somewhat complicated by the old compiler still depending on zig 13 which breaks the build there, since I've upgraded to 14 for the new compiler :grimacing:
Yeah, I just use nix for old compiler work
How hard would it be to just upgrade the old compiler to zig 14?
Depends on how hard it ends up being to upgrade inkwell and llvm. Occasionally that is trivial. A lot of the time that is a huge hassle.
Oh oof; those are all locked together :grimacing:
Yeah....one of the other huge gains of the new compiler is that we will generate llvm bitcode directly, which gives us much more flexibility to decouple that
I guess keeping separate envs for the old and new compiler it is
I guess you could always just alias/swap out only zig
Some of these optimizations are bespoke tuning that probably won't be kept or need proper heuristics, but otherwise are just simple cleanups to have less allocations overall.
optimization results (-38% execution time)
@Brendan Hansknecht -- we're adding a lot of knobs and dials for tuning the compiler. I appreciate these are all things that we can tune later.
I'm wondering if we should pull all the constants out into a single file.
Maybe one day we have some automated thing that can help us tune these based on real code (i.e. using something like Osprey)... but even manually it would be easier to surface all of these decisions if they are in one place.
Yeah, definitely lots of nobs. I just have been learning tracy and thus tuning a bunch of random ones
Appart for initial capacities, I don't think we'll have too many bespoke constants
And capacities are likely something that should be tune with context
That said, setting a constant somewhere for the default capacity if people don't know what to pick sounds like protentially a good idea.
What I like about putting the constants in a single file is that its easier to track the history of any changes. If we change constants in future based on some profiling... we will include the analysis/evaluation in the PR and so we always have a good point of reference that is easy to find.
I could also imagine a future where different users might want different parameters. Like maybe if I'm using roc in some special way I might want to change things to suit me.
Yeah, makes some sense. I'm not fully sure there are good names for these various constants cause many of them will just be the starting size of arbitrary containers or maybe a ratio from the input source to the size. That is where local reasoning makes a lot of sense. But I totally understand the want to have all nobs in one place.
Last updated: Jul 06 2025 at 12:14 UTC