Stream: compiler development

Topic: test_gen performance


view this post on Zulip Folkert de Vries (Oct 15 2023 at 13:31):

so, I'm looking at our test_gen crate with some of the rust internal tooling and, wth?

time:   1.480; rss:  881MB -> 1178MB ( +297MB)  codegen_to_LLVM_IR
time:   2.771; rss:  786MB -> 1178MB ( +392MB)  codegen_crate
time:   0.163; rss: 1178MB -> 1204MB (  +26MB)  encode_query_results
time:   0.206; rss: 1178MB -> 1222MB (  +44MB)  incr_comp_serialize_result_cache
time:   0.206; rss: 1178MB -> 1222MB (  +44MB)  incr_comp_persist_result_cache
time:   0.206; rss: 1178MB -> 1222MB (  +44MB)  serialize_dep_graph
time:   0.054; rss: 1222MB ->  976MB ( -246MB)  free_global_ctxt
time:   2.366; rss:  909MB ->  879MB (  -30MB)  LLVM_passes
time:   0.000; rss:  869MB ->  857MB (  -13MB)  join_worker_thread
time:   0.001; rss:  857MB ->  834MB (  -23MB)  copy_all_cgu_workproducts_to_incr_comp_cache_dir
time:   0.209; rss:  976MB ->  834MB ( -142MB)  finish_ongoing_codegen
time:   0.000; rss:  834MB ->  831MB (   -3MB)  serialize_work_products
time:   0.000; rss:  774MB ->  770MB (   -4MB)  link_binary_check_files_are_writeable
time:  10.027; rss:  770MB ->  771MB (   +0MB)  run_linker
time:  10.032; rss:  774MB ->  771MB (   -3MB)  link_binary
time:  10.032; rss:  774MB ->  771MB (   -3MB)  link_crate
time:  10.242; rss:  976MB ->  771MB ( -205MB)  link
time:  20.540; rss:   26MB ->   92MB (  +66MB)  total

hard to interpret this, but those linking times are just crazy. also why is the memory residency almost 1GB here? what is happening?

view this post on Zulip Ayaz Hafiz (Oct 15 2023 at 14:02):

is that trace different than what it is when you try to build the cli binary?

view this post on Zulip Folkert de Vries (Oct 15 2023 at 14:32):

well, yes? different crates get built if you do that?

view this post on Zulip Folkert de Vries (Oct 15 2023 at 14:32):

I also compared this with the test suite of ntpd-rs and our numbers are just absolutely crazy here

view this post on Zulip Folkert de Vries (Oct 15 2023 at 14:33):

for linking, it might be

view this post on Zulip Folkert de Vries (Oct 15 2023 at 14:34):

interesting re our ealier discussion: macro expansion does not seem to be a big problem (though it's not great)

time:   0.164; rss:   26MB ->   64MB (  +37MB)  total
time:   0.000; rss:   33MB ->   35MB (   +2MB)  parse_crate
time:   0.708; rss:   39MB ->  313MB ( +274MB)  expand_crate
time:   0.708; rss:   39MB ->  313MB ( +274MB)  macro_expand_crate
time:   0.024; rss:  313MB ->  313MB (   +0MB)  maybe_building_test_harness
time:   0.014; rss:  313MB ->  313MB (   +0MB)  AST_validation
time:   0.007; rss:  313MB ->  313MB (   +0MB)  finalize_imports
time:   0.017; rss:  313MB ->  313MB (   +1MB)  finalize_macro_resolutions
time:   0.127; rss:  313MB ->  352MB (  +38MB)  late_resolve_crate

view this post on Zulip Brendan Hansknecht (Oct 15 2023 at 14:35):

Is memory just a cumulative thing where we have so much running in parallel or the linker loading so many crates at once?

view this post on Zulip Folkert de Vries (Oct 15 2023 at 14:36):

I don't know and I cannot find the docs on it. I've asked on the rust zullip what these numbers really mean

view this post on Zulip Folkert de Vries (Oct 15 2023 at 15:18):

well, so it turns out that a big part of the problem here is debug info. the binary is ~450mb, but after stripping only 50mb remains

view this post on Zulip Brendan Hansknecht (Oct 15 2023 at 15:35):

yay for debug info

view this post on Zulip Richard Feldman (Oct 15 2023 at 15:45):

this is something we should think about when we start wanting to generate debug info in Roc dev builds

view this post on Zulip Richard Feldman (Oct 15 2023 at 15:46):

I wonder if there's some way to like cache it in a way where the surgical linker can staple it in without it having to be regenerated from scratch every time

view this post on Zulip Richard Feldman (Oct 15 2023 at 15:47):

like cache it on a per module basis or something, give or take specializations maybe needing special treatment somehow

view this post on Zulip Brendan Hansknecht (Oct 15 2023 at 15:50):

Yeah, I have not even though about linking debug info at all...not sure how stapling dwarf together works

view this post on Zulip Ayaz Hafiz (Oct 15 2023 at 15:54):

Re different traces Folkert-what I mean is, are the times for specialization/linking significantly different for the test binary vs cli binary?

view this post on Zulip Ayaz Hafiz (Oct 15 2023 at 15:55):

I remember looking into this about a year ago and the large number of tests have a big part in it. if you comment out a bunch of them, or break them up into test crates, each crate is much faster to compile and link. Not sure why this was though

view this post on Zulip Ayaz Hafiz (Oct 15 2023 at 15:55):

did Jack provide any insights?

view this post on Zulip Folkert de Vries (Oct 15 2023 at 16:00):

no so this is what I want to figure out right. also those numbers are wrong because mold does not end up getting used ...

view this post on Zulip Folkert de Vries (Oct 15 2023 at 16:00):

because the rustflags are overwritten when you enable the -Ztime-passes. these are more real numbers

time:   0.344; rss:  664MB ->  677MB (  +13MB)  serialize_dep_graph
time:   0.038; rss:  677MB ->  449MB ( -227MB)  free_global_ctxt
time:   0.010; rss:  411MB ->  411MB (   +0MB)  incr_comp_finalize_session_directory
time:   0.466; rss:  411MB ->  411MB (   +0MB)  run_linker
time:   0.472; rss:  411MB ->  409MB (   -2MB)  link_binary
time:   0.472; rss:  411MB ->  409MB (   -2MB)  link_crate
time:   0.488; rss:  449MB ->  409MB (  -40MB)  link
time:   2.927; rss:   26MB ->   75MB (  +49MB)  total

af which point the macro expansion from before actually becomes relevant again

view this post on Zulip Folkert de Vries (Oct 15 2023 at 16:01):

total time is off here because cargo clean is weird in this scenario, but link time should always be constant. So here it generates simpler debug info (only lines) and uses mold

view this post on Zulip Folkert de Vries (Oct 20 2023 at 19:16):

I did some more tests and chatted with Jack. I think we should remove our use of indoc!. it is a proc macro and therefore cannot be cached. We can use a thread_local! String to perform the stripping of the whitespace at runtime. We only run the test once so whether this work happens at comptime or runtime for test_gen is not really relevant

view this post on Zulip Ayaz Hafiz (Oct 20 2023 at 19:22):

you know what would make it really fast? if we discovered and loaded the tests from disk like we do like for UItest. Then we would only need to compile one thing, the test runner

view this post on Zulip Folkert de Vries (Oct 20 2023 at 19:23):

but it's annoying with all the small tests we have

view this post on Zulip Folkert de Vries (Oct 20 2023 at 19:24):

and also the output here can't be captured nicely in a string generally

view this post on Zulip Brian Carroll (Oct 20 2023 at 19:55):

I think the Zig compiler tests use a big .zig file with a bunch of test functions in it. The test runner just calls them all. I wonder if we could do something similar.

view this post on Zulip Folkert de Vries (Oct 20 2023 at 19:56):

well we'd need to be able to have multiple mains

view this post on Zulip Folkert de Vries (Oct 20 2023 at 19:56):

but that is an interesting direction to explore

view this post on Zulip Brian Carroll (Oct 20 2023 at 19:59):

One main could call them all.
They could take an empty record and return a Boolean?
Maybe we make a list of Booleans and check they're all true?


Last updated: Jul 06 2025 at 12:14 UTC