test_gen performance · compiler development

so, I'm looking at our test_gen crate with some of the rust internal tooling and, wth?

time:   1.480; rss:  881MB -> 1178MB ( +297MB)  codegen_to_LLVM_IR
time:   2.771; rss:  786MB -> 1178MB ( +392MB)  codegen_crate
time:   0.163; rss: 1178MB -> 1204MB (  +26MB)  encode_query_results
time:   0.206; rss: 1178MB -> 1222MB (  +44MB)  incr_comp_serialize_result_cache
time:   0.206; rss: 1178MB -> 1222MB (  +44MB)  incr_comp_persist_result_cache
time:   0.206; rss: 1178MB -> 1222MB (  +44MB)  serialize_dep_graph
time:   0.054; rss: 1222MB ->  976MB ( -246MB)  free_global_ctxt
time:   2.366; rss:  909MB ->  879MB (  -30MB)  LLVM_passes
time:   0.000; rss:  869MB ->  857MB (  -13MB)  join_worker_thread
time:   0.001; rss:  857MB ->  834MB (  -23MB)  copy_all_cgu_workproducts_to_incr_comp_cache_dir
time:   0.209; rss:  976MB ->  834MB ( -142MB)  finish_ongoing_codegen
time:   0.000; rss:  834MB ->  831MB (   -3MB)  serialize_work_products
time:   0.000; rss:  774MB ->  770MB (   -4MB)  link_binary_check_files_are_writeable
time:  10.027; rss:  770MB ->  771MB (   +0MB)  run_linker
time:  10.032; rss:  774MB ->  771MB (   -3MB)  link_binary
time:  10.032; rss:  774MB ->  771MB (   -3MB)  link_crate
time:  10.242; rss:  976MB ->  771MB ( -205MB)  link
time:  20.540; rss:   26MB ->   92MB (  +66MB)  total

hard to interpret this, but those linking times are just crazy. also why is the memory residency almost 1GB here? what is happening?

Ayaz Hafiz (Oct 15 2023 at 14:02):

Folkert de Vries (Oct 15 2023 at 14:32):

I also compared this with the test suite of ntpd-rs and our numbers are just absolutely crazy here

Folkert de Vries (Oct 15 2023 at 14:33):

Folkert de Vries (Oct 15 2023 at 14:34):

interesting re our ealier discussion: macro expansion does not seem to be a big problem (though it's not great)

time:   0.164; rss:   26MB ->   64MB (  +37MB)  total
time:   0.000; rss:   33MB ->   35MB (   +2MB)  parse_crate
time:   0.708; rss:   39MB ->  313MB ( +274MB)  expand_crate
time:   0.708; rss:   39MB ->  313MB ( +274MB)  macro_expand_crate
time:   0.024; rss:  313MB ->  313MB (   +0MB)  maybe_building_test_harness
time:   0.014; rss:  313MB ->  313MB (   +0MB)  AST_validation
time:   0.007; rss:  313MB ->  313MB (   +0MB)  finalize_imports
time:   0.017; rss:  313MB ->  313MB (   +1MB)  finalize_macro_resolutions
time:   0.127; rss:  313MB ->  352MB (  +38MB)  late_resolve_crate

Brendan Hansknecht (Oct 15 2023 at 14:35):

Is memory just a cumulative thing where we have so much running in parallel or the linker loading so many crates at once?

Folkert de Vries (Oct 15 2023 at 14:36):

I don't know and I cannot find the docs on it. I've asked on the rust zullip what these numbers really mean

Folkert de Vries (Oct 15 2023 at 15:18):

well, so it turns out that a big part of the problem here is debug info. the binary is ~450mb, but after stripping only 50mb remains

Brendan Hansknecht (Oct 15 2023 at 15:35):

Richard Feldman (Oct 15 2023 at 15:45):

this is something we should think about when we start wanting to generate debug info in Roc dev builds

Richard Feldman (Oct 15 2023 at 15:46):

I wonder if there's some way to like cache it in a way where the surgical linker can staple it in without it having to be regenerated from scratch every time

Richard Feldman (Oct 15 2023 at 15:47):

like cache it on a per module basis or something, give or take specializations maybe needing special treatment somehow

Brendan Hansknecht (Oct 15 2023 at 15:50):

Yeah, I have not even though about linking debug info at all...not sure how stapling dwarf together works

Ayaz Hafiz (Oct 15 2023 at 15:54):

Re different traces Folkert-what I mean is, are the times for specialization/linking significantly different for the test binary vs cli binary?

Ayaz Hafiz (Oct 15 2023 at 15:55):

I remember looking into this about a year ago and the large number of tests have a big part in it. if you comment out a bunch of them, or break them up into test crates, each crate is much faster to compile and link. Not sure why this was though

Ayaz Hafiz (Oct 15 2023 at 15:55):

Folkert de Vries (Oct 15 2023 at 16:00):

no so this is what I want to figure out right. also those numbers are wrong because mold does not end up getting used ...

Folkert de Vries (Oct 15 2023 at 16:00):

because the rustflags are overwritten when you enable the -Ztime-passes. these are more real numbers

time:   0.344; rss:  664MB ->  677MB (  +13MB)  serialize_dep_graph
time:   0.038; rss:  677MB ->  449MB ( -227MB)  free_global_ctxt
time:   0.010; rss:  411MB ->  411MB (   +0MB)  incr_comp_finalize_session_directory
time:   0.466; rss:  411MB ->  411MB (   +0MB)  run_linker
time:   0.472; rss:  411MB ->  409MB (   -2MB)  link_binary
time:   0.472; rss:  411MB ->  409MB (   -2MB)  link_crate
time:   0.488; rss:  449MB ->  409MB (  -40MB)  link
time:   2.927; rss:   26MB ->   75MB (  +49MB)  total

Folkert de Vries (Oct 15 2023 at 16:01):

total time is off here because cargo clean is weird in this scenario, but link time should always be constant. So here it generates simpler debug info (only lines) and uses mold

Folkert de Vries (Oct 20 2023 at 19:16):

I did some more tests and chatted with Jack. I think we should remove our use of indoc!. it is a proc macro and therefore cannot be cached. We can use a thread_local! String to perform the stripping of the whitespace at runtime. We only run the test once so whether this work happens at comptime or runtime for test_gen is not really relevant

Ayaz Hafiz (Oct 20 2023 at 19:22):

you know what would make it really fast? if we discovered and loaded the tests from disk like we do like for UItest. Then we would only need to compile one thing, the test runner

Folkert de Vries (Oct 20 2023 at 19:23):

Folkert de Vries (Oct 20 2023 at 19:24):

Brian Carroll (Oct 20 2023 at 19:55):

I think the Zig compiler tests use a big .zig file with a bunch of test functions in it. The test runner just calls them all. I wonder if we could do something similar.

Folkert de Vries (Oct 20 2023 at 19:56):

Brian Carroll (Oct 20 2023 at 19:59):

One main could call them all.
They could take an empty record and return a Boolean?
Maybe we make a list of Booleans and check they're all true?

Stream: compiler development

Topic: test_gen performance

Folkert de Vries (Oct 15 2023 at 13:31):

Ayaz Hafiz (Oct 15 2023 at 14:02):

Folkert de Vries (Oct 15 2023 at 14:32):

Folkert de Vries (Oct 15 2023 at 14:32):

Folkert de Vries (Oct 15 2023 at 14:33):

Folkert de Vries (Oct 15 2023 at 14:34):

Brendan Hansknecht (Oct 15 2023 at 14:35):

Folkert de Vries (Oct 15 2023 at 14:36):

Folkert de Vries (Oct 15 2023 at 15:18):

Brendan Hansknecht (Oct 15 2023 at 15:35):

Richard Feldman (Oct 15 2023 at 15:45):

Richard Feldman (Oct 15 2023 at 15:46):

Richard Feldman (Oct 15 2023 at 15:47):

Brendan Hansknecht (Oct 15 2023 at 15:50):

Ayaz Hafiz (Oct 15 2023 at 15:54):

Ayaz Hafiz (Oct 15 2023 at 15:55):

Ayaz Hafiz (Oct 15 2023 at 15:55):

Folkert de Vries (Oct 15 2023 at 16:00):

Folkert de Vries (Oct 15 2023 at 16:00):

Folkert de Vries (Oct 15 2023 at 16:01):

Folkert de Vries (Oct 20 2023 at 19:16):

Ayaz Hafiz (Oct 20 2023 at 19:22):

Folkert de Vries (Oct 20 2023 at 19:23):

Folkert de Vries (Oct 20 2023 at 19:24):

Brian Carroll (Oct 20 2023 at 19:55):

Folkert de Vries (Oct 20 2023 at 19:56):

Folkert de Vries (Oct 20 2023 at 19:56):

Brian Carroll (Oct 20 2023 at 19:59):