so, I'm looking at our test_gen
crate with some of the rust internal tooling and, wth?
time: 1.480; rss: 881MB -> 1178MB ( +297MB) codegen_to_LLVM_IR
time: 2.771; rss: 786MB -> 1178MB ( +392MB) codegen_crate
time: 0.163; rss: 1178MB -> 1204MB ( +26MB) encode_query_results
time: 0.206; rss: 1178MB -> 1222MB ( +44MB) incr_comp_serialize_result_cache
time: 0.206; rss: 1178MB -> 1222MB ( +44MB) incr_comp_persist_result_cache
time: 0.206; rss: 1178MB -> 1222MB ( +44MB) serialize_dep_graph
time: 0.054; rss: 1222MB -> 976MB ( -246MB) free_global_ctxt
time: 2.366; rss: 909MB -> 879MB ( -30MB) LLVM_passes
time: 0.000; rss: 869MB -> 857MB ( -13MB) join_worker_thread
time: 0.001; rss: 857MB -> 834MB ( -23MB) copy_all_cgu_workproducts_to_incr_comp_cache_dir
time: 0.209; rss: 976MB -> 834MB ( -142MB) finish_ongoing_codegen
time: 0.000; rss: 834MB -> 831MB ( -3MB) serialize_work_products
time: 0.000; rss: 774MB -> 770MB ( -4MB) link_binary_check_files_are_writeable
time: 10.027; rss: 770MB -> 771MB ( +0MB) run_linker
time: 10.032; rss: 774MB -> 771MB ( -3MB) link_binary
time: 10.032; rss: 774MB -> 771MB ( -3MB) link_crate
time: 10.242; rss: 976MB -> 771MB ( -205MB) link
time: 20.540; rss: 26MB -> 92MB ( +66MB) total
hard to interpret this, but those linking times are just crazy. also why is the memory residency almost 1GB here? what is happening?
is that trace different than what it is when you try to build the cli binary?
well, yes? different crates get built if you do that?
I also compared this with the test suite of ntpd-rs and our numbers are just absolutely crazy here
for linking, it might be
interesting re our ealier discussion: macro expansion does not seem to be a big problem (though it's not great)
time: 0.164; rss: 26MB -> 64MB ( +37MB) total
time: 0.000; rss: 33MB -> 35MB ( +2MB) parse_crate
time: 0.708; rss: 39MB -> 313MB ( +274MB) expand_crate
time: 0.708; rss: 39MB -> 313MB ( +274MB) macro_expand_crate
time: 0.024; rss: 313MB -> 313MB ( +0MB) maybe_building_test_harness
time: 0.014; rss: 313MB -> 313MB ( +0MB) AST_validation
time: 0.007; rss: 313MB -> 313MB ( +0MB) finalize_imports
time: 0.017; rss: 313MB -> 313MB ( +1MB) finalize_macro_resolutions
time: 0.127; rss: 313MB -> 352MB ( +38MB) late_resolve_crate
Is memory just a cumulative thing where we have so much running in parallel or the linker loading so many crates at once?
I don't know and I cannot find the docs on it. I've asked on the rust zullip what these numbers really mean
well, so it turns out that a big part of the problem here is debug info. the binary is ~450mb, but after stripping only 50mb remains
yay for debug info
this is something we should think about when we start wanting to generate debug info in Roc dev builds
I wonder if there's some way to like cache it in a way where the surgical linker can staple it in without it having to be regenerated from scratch every time
like cache it on a per module basis or something, give or take specializations maybe needing special treatment somehow
Yeah, I have not even though about linking debug info at all...not sure how stapling dwarf together works
Re different traces Folkert-what I mean is, are the times for specialization/linking significantly different for the test binary vs cli binary?
I remember looking into this about a year ago and the large number of tests have a big part in it. if you comment out a bunch of them, or break them up into test crates, each crate is much faster to compile and link. Not sure why this was though
did Jack provide any insights?
no so this is what I want to figure out right. also those numbers are wrong because mold does not end up getting used ...
because the rustflags are overwritten when you enable the -Ztime-passes
. these are more real numbers
time: 0.344; rss: 664MB -> 677MB ( +13MB) serialize_dep_graph
time: 0.038; rss: 677MB -> 449MB ( -227MB) free_global_ctxt
time: 0.010; rss: 411MB -> 411MB ( +0MB) incr_comp_finalize_session_directory
time: 0.466; rss: 411MB -> 411MB ( +0MB) run_linker
time: 0.472; rss: 411MB -> 409MB ( -2MB) link_binary
time: 0.472; rss: 411MB -> 409MB ( -2MB) link_crate
time: 0.488; rss: 449MB -> 409MB ( -40MB) link
time: 2.927; rss: 26MB -> 75MB ( +49MB) total
af which point the macro expansion from before actually becomes relevant again
total time is off here because cargo clean is weird in this scenario, but link time should always be constant. So here it generates simpler debug info (only lines) and uses mold
I did some more tests and chatted with Jack. I think we should remove our use of indoc!
. it is a proc macro and therefore cannot be cached. We can use a thread_local!
String
to perform the stripping of the whitespace at runtime. We only run the test once so whether this work happens at comptime or runtime for test_gen is not really relevant
you know what would make it really fast? if we discovered and loaded the tests from disk like we do like for UItest. Then we would only need to compile one thing, the test runner
but it's annoying with all the small tests we have
and also the output here can't be captured nicely in a string generally
I think the Zig compiler tests use a big .zig file with a bunch of test functions in it. The test runner just calls them all. I wonder if we could do something similar.
well we'd need to be able to have multiple mains
but that is an interesting direction to explore
One main could call them all.
They could take an empty record and return a Boolean?
Maybe we make a list of Booleans and check they're all true?
Last updated: Jul 06 2025 at 12:14 UTC