Stream: compiler development

Topic: slow build with lots of `?`


view this post on Zulip Anton (Jun 09 2025 at 12:42):

Continuing from #compiler development > casual conversation @ 💬 @Brendan Hansknecht

long_15 is for roc build on this code:

app [main!] { cli: platform "https://github.com/roc-lang/basic-cli/releases/download/0.19.0/Hj-J_zxz7V9YurCSTFcFdu6cQJie4guzsPMUi5kBYUk.tar.br" }

import cli.Stdout
import cli.Arg exposing [Arg]

main! : List Arg => Result {} _
main! = |_args|
    Stdout.line!("1")?
    Stdout.line!("2")?
    Stdout.line!("3")?
    Stdout.line!("4")?
    Stdout.line!("5")?
    Stdout.line!("6")?
    Stdout.line!("7")?
    Stdout.line!("8")?
    Stdout.line!("9")?
    Stdout.line!("10")?
    Stdout.line!("11")?
    Stdout.line!("12")?
    Stdout.line!("13")?
    Stdout.line!("14")?
    Stdout.line!("15")

20 is the same thing with 5 more Stdout.line! calls. The flamegraph for 15 looks normal but for 20 it looks off. It spends a lot of time in handle_mm_fault though so that may be what is going wrong.

flamegraph_long_15.svg
flamegraph_long_20.svg

view this post on Zulip Anton (Jun 10 2025 at 09:49):

The potential call explosion does seem clear from the 15 flamegraph.

view this post on Zulip Anton (Jun 10 2025 at 10:15):

Based on the heaptrack data; deep recursion looks likely
heaptrack_15vs20.png

view this post on Zulip Anton (Jun 10 2025 at 10:15):

I'll see if ftrace can give me some nice stats

view this post on Zulip Anton (Jun 10 2025 at 15:37):

uftrace tells a very different and more plausible story than the flamegraph:

# uftrace diff
#  [0] base: uftrace.data       (from uftrace record --filter roc_* ./target/debug/roc build temp_long_15.roc)
#  [1] diff: uftrace20.data/    (from uftrace record --filter roc_* ./target/debug/roc build temp_long_20.roc)
#
    Total time     Self time         Calls   Function
   ===========   ===========   ===========   ====================
    +13.007  m     +0.084 us            +0   roc_build::program::build_file
    +13.007  m     +0.497 us            +0   roc_cli::build
    +13.007  m    +14.717 us            +0   roc::main
    +13.007  m     +0.385 us            +0   roc_load::load
    +13.007  m     +0.218 us            +0   roc_load::load_and_monomorphize
    +13.007  m    +23.707 us            +0   roc_load_internal::file::load_single_threaded
    +13.007  m     -1.656 us            +0   roc_load_internal::file::load
    +13.007  m    +39.427 us            +0   roc_load_internal::file::state_thread_step
    +13.007  m   +101.288 us            +0   roc_load_internal::file::update
    +13.007  m     +2.436 us            +0   roc_mono::inc_dec::insert_inc_dec_operations
    +13.007  m     +4.556 us            +0   roc_mono::inc_dec::insert_inc_dec_operations_proc
    +13.007  m    +10.585  s      +4571136   roc_mono::inc_dec::insert_refcount_operations_stmt
     +7.007  m     +2.716  s     +11681792   roc_mono::inc_dec::insert_refcount_operations_stmt::_{{closure}}
     +5.042  m    +16.057  s     +65704457   core::hash::BuildHasher::hash_one
     +5.015  m     +4.405  s     +67737523   core::hash::impls::_<impl core::hash::Hash for &T>::hash
     +4.053  m   +190.553 ms      +2031656   _<core::iter::adapters::rev::Rev<I>>::fold
     +4.053  m     +1.089  s      +2031656   core::iter::traits::double_ended::DoubleEndedIterator::rfold
     +4.048  m    +10.363  s     +65700970   _<roc_module::symbol::Symbol>::hash
     +4.027  m    +59.520  s     +135469517   _<wyhash::traits::WyHash>::write
     +4.026  m     +8.766  s     +131405247   core::hash::Hasher::write_u32
     +3.027  m     +9.369  s      +5079040   roc_mono::inc_dec::insert_refcount_operations_binding
     +3.013  m     +4.815  s     +27426891   hashbrown::map::HashMap<K,V,S,A>::contains_key
     +2.048  m     +1.880  s     +11681792   roc_mono::inc_dec::SymbolRcTypes::get
     +2.046  m     +3.340  s     +14222412   hashbrown::map::HashMap<K,V,S,A>::insert
     +2.046  m     +1.857  s     +23363644   std::collections::hash::set::HashSet<T,S>::contains
     +2.022  m     +5.588  s     +65704262   core::hash::impls::_<impl core::hash::Hash for u32>::hash
     +2.015  m     +5.469  s     +65700985   _<core::num::nonzero::NonZero<u32>>::hash
     +2.010  m   +234.683 ms      +1524052   bumpalo::collections::vec::Vec<T>::from_iter_in
     +2.009  m   +831.661 ms      +1524247   _<bumpalo::collections::vec::Vec<T>>::extend
     +2.004  m     +1.001  m     +135469517   wyhash::functions::wyhash_core
     +2.002  m   +714.215 ms      +9651156   std::collections::hash::map::HashMap<K,V,S>::insert
     +1.059  m   +337.363 ms      +2038914   _<core::iter::adapters::map::Map<I,F>>::next
     +1.059  m     +8.820  s     +51200480   _<hashbrown::map::HashMap<K,V,S,A>>::clone
     +1.059  m   +117.193 ms      +1020568   core::ops::function::impls::_<impl core::ops::function::FnOnce<A> for &mut F>::call_once
     +1.057  m   +393.806 ms      +5586944   roc_mono::inc_dec::RefcountEnvironment::add_symbol
     +1.056  m   +607.791 ms      +5586944   roc_mono::inc_dec::RefcountEnvironment::add_symbol_with
     +1.049  m     +8.621  s     +51200480   _<hashbrown::raw::RawTable<T,A>>::clone
     +1.047  m   +102.891 ms      +1015828   bumpalo::collections::collect_in::CollectIn::collect_in
     +1.047  m    +83.509 ms      +1015828   _<bumpalo::collections::vec::Vec<T>>::from_iter_in
     +1.045  m   +547.355 ms      +5588649   core::iter::traits::iterator::Iterator::collect
     +1.044  m     +1.318  s      +5586949   _<std::collections::hash::set::HashSet<T,S>>::from_iter
     +1.043  m     +1.095  s      +7110656   roc_mono::inc_dec::RefcountEnvironment::get_symbol_rc_type
     +1.043  m     +1.366  s      +5586949   _<hashbrown::set::HashSet<T,S,A>>::extend
     +1.041  m     +1.801  s      +5586964   _<hashbrown::map::HashMap<K,V,S,A>>::extend
     +1.039  m   +590.532 ms      +6096312   _<core::iter::adapters::map::Map<I,F>>::fold
     +1.036  m   +532.162 ms      +6603467   core::iter::traits::iterator::Iterator::for_each
     +1.030  m     +2.694  s     +14222412   hashbrown::raw::RawTable<T,A>::find_or_find_insert_slot
     +1.024  m     +4.486  s     +54755838   _<hashbrown::raw::RawTable<T,A>>::drop
     +1.022  m   +412.975 ms      +2539520   _<roc_mono::inc_dec::RefcountEnvironment>::clone
     +1.022  m    +21.982  s     +44698317   hashbrown::raw::RawTable<T,A>::find
     +1.022  m   +401.394 ms      +5079495   _<std::collections::hash::map::HashMap<K,V,S>>::clone
     +1.020  m   +705.180 ms      +9144322   std::collections::hash::map::HashMap<K,V,S>::get
     +1.019  m     +6.564  s     +54755838   hashbrown::raw::RawTableInner::drop_inner_table
     +1.019  m     +1.780  s      +9144437   hashbrown::map::HashMap<K,V,S,A>::get
     +1.015  m     +1.023  s     +28901461   _<hashbrown::raw::RawTable<T,A>>::clone_from_spec
     +1.014  m     +8.608  s      +2539575   hashbrown::raw::RawTable<T,A>::clone_from_impl
     +1.013  m    +25.222  s     +270939034   _<core::slice::iter::Chunks<T>>::next
     +1.010  m   +438.165 ms      +5079040   roc_mono::inc_dec::RefcountEnvironment::consume_rc_symbol
     +1.010  m     +2.412  s     +36570443   hashbrown::raw::RawTable<T,A>::get
     +1.008  m   +443.216 ms      +4063252   _<core::iter::adapters::filter::Filter<I,P>>::fold
     +1.005  m   +943.156 ms      +3047424   roc_mono::inc_dec::RefcountEnvironment::borrowed_usages
     +1.004  m   +475.935 ms      +2539520   core::ptr::drop_in_place<roc_mono::inc_dec::RefcountEnvironment>
     +1.003  m   +374.460 ms      +2031616   roc_mono::inc_dec::RefcountEnvironment::consume_symbol
     +1.002  m     +1.004  s     +21333133   hashbrown::raw::RawTable<T,A>::reserve
     +1.001  m   +467.025 ms      +3555368   core::iter::adapters::filter::filter_fold::_{{closure}}
     +1.001  m     +5.078  s      +8175857   hashbrown::raw::RawTable<T,A>::reserve_rehash
     +1.000  m   +268.364 ms      +2539520   core::ptr::drop_in_place<std::collections::hash::map::HashMap<roc_mono::ir::JoinPointId,std::collections::hash::set::HashSet<roc_module::symbol::Symbol,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>>
     +1.000  m   +234.728 ms      +2539520   core::ptr::drop_in_place<hashbrown::map::HashMap<roc_mono::ir::JoinPointId,std::collections::hash::set::HashSet<roc_module::symbol::Symbol,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>>

view this post on Zulip Anton (Jun 10 2025 at 15:39):

I'm going to see if I can reproduce this with a simple platform, basic-cli may cause a bunch of distractions when debugging.

view this post on Zulip Anton (Jun 10 2025 at 17:52):

Reproduction with simple platform :tada:
The list argument is key, even though it is unused! It does not happen with myargs = 5 or {}, but it does also happen with "".

app [main!] { pf: platform "../test-platform-effects-zig/main.roc" }

import pf.Effect

main! : {} => {}
main! = |{}|
    myargs = []
    when wrapper!(myargs) is
        Ok(he) -> Effect.put_line!(Inspect.to_str(he))
        Err(e) -> Effect.put_line!(Inspect.to_str(e))

wrapper! = |_args|
    Effect.foo_line!("1")?
    Effect.foo_line!("2")?
    Effect.foo_line!("3")?
    Effect.foo_line!("4")?
    Effect.foo_line!("5")?
    Effect.foo_line!("6")?
    Effect.foo_line!("7")?
    Effect.foo_line!("8")?
    Effect.foo_line!("9")?
    Effect.foo_line!("10")?
    Effect.foo_line!("11")?
    Effect.foo_line!("12")?
    Effect.foo_line!("13")?
    Effect.foo_line!("14")?
    Effect.foo_line!("15")?
    Effect.foo_line!("16")?
    Effect.foo_line!("17")

view this post on Zulip Anton (Jun 10 2025 at 18:15):

Using myargs = [] instead of myargs = 0 leads to 73628 more calls of roc_mono::inc_dec::insert_refcount_operations_stmt

view this post on Zulip Brendan Hansknecht (Jun 10 2025 at 18:28):

Wow....crazy :rolling_on_the_floor_laughing:

view this post on Zulip Brendan Hansknecht (Jun 10 2025 at 18:40):

I wonder what we are scanning over that explodes like that

view this post on Zulip Brendan Hansknecht (Jun 10 2025 at 18:40):

Or recursiving over

view this post on Zulip Ayaz Hafiz (Jun 10 2025 at 23:38):

is this still doing the closure thing?

view this post on Zulip Ayaz Hafiz (Jun 10 2025 at 23:38):

where Effect is a continuation monad

view this post on Zulip Ayaz Hafiz (Jun 10 2025 at 23:38):

if so, that's probably why

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 00:46):

Ah yeah, I thought it didn't have closures, but probably has repeat calls to Result.try

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 00:46):

Otherwise, it is a gigantic nested when is

view this post on Zulip Anton (Jun 11 2025 at 18:51):

Ayaz Hafiz said:

is this still doing the closure thing?

Yeah it's using closures, but do you know why that would result in a massive difference depending on the type of myargs?

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 18:53):

Lists are refcount

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 18:54):

Ints and records are not

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 18:54):

And all values are being nestedly captured for each closure

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 18:54):

Still really should not explode like that....feels exponential...probably a bad algorithm.

view this post on Zulip Anton (Jun 11 2025 at 18:55):

Thanks Brendan :)
On a further simplified example I did find it was repeating the same lengthy "recursive walk" multiple times so memoization is possible

view this post on Zulip Brendan Hansknecht (Jun 11 2025 at 18:55):

Ah, makes sense

view this post on Zulip Anton (Jun 13 2025 at 18:24):

I know how I want to solve this, I'm currently entangled in a web of algorithmic and lifetime complexity :web: :smile:

view this post on Zulip Anton (Jun 14 2025 at 18:02):

The optimizations I tried where either not significant or resulted in wrong reference counts when running all tests. Let's make sure to write unit tests for complicated functions in the new compiler :p
I'm going to give it one more day to see if I can skip work for unused variables.

view this post on Zulip Brendan Hansknecht (Jun 14 2025 at 18:32):

Did you try changing the desugaring to use when is instead of result.try (assuming it is currently using result.try)?

view this post on Zulip Anton (Jun 16 2025 at 11:58):

No, not yet, that's a good idea. I'll explore that for a bit.

view this post on Zulip Anton (Jun 16 2025 at 12:27):

Turns out they are actually using when. It goes: TrySuffix > LowLevelTry > Try > When. Earlier I was looking at the IR after specialization and it wasn't clear to me that the problematic IR came from when expressions.

view this post on Zulip Anton (Jun 16 2025 at 12:53):

I like the overview we get of the output all phases with the snapshots in the new compiler, it requires substantial effort to get that output for some phases from the old compiler.

view this post on Zulip Brendan Hansknecht (Jun 16 2025 at 15:20):

Interesting.... So this isn't from nested closures

view this post on Zulip Anton (Jun 16 2025 at 15:43):

it's due to nested joins:

procedure : `#UserApp.wrapper!` [C {}, C {}]
procedure = `#UserApp.wrapper!` (`#UserApp.7`: List []):
    joinpoint `#UserApp.70` `#UserApp.66`:
        joinpoint `#UserApp.58` `#UserApp.54`:
            joinpoint `#UserApp.46` `#UserApp.42`:
                joinpoint `#UserApp.34` `#UserApp.30`:
                    joinpoint `#UserApp.22` `#UserApp.18`:
                        let `#UserApp.17` : Str = "6";
                        let `#UserApp.16` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.17`;
                        ret `#UserApp.16`;
                    in
                    let `#UserApp.29` : Str = "5";
                    let `#UserApp.21` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.29`;
                    let `#UserApp.26` : U8 = 1i64;
                    let `#UserApp.27` : U8 = GetTagId `#UserApp.21`;
                    let `#UserApp.28` : Int1 = lowlevel Eq `#UserApp.26` `#UserApp.27`;
                    if `#UserApp.28` then
                        let `#UserApp.19` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.21`;
                        jump `#UserApp.22` `#UserApp.19`;
                    else
                        let `#UserApp.20` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.21`;
                        let `#UserApp.25` : [C {}, C {}] = TagId(0) `#UserApp.20`;
                        ret `#UserApp.25`;
                in
                let `#UserApp.41` : Str = "4";
                let `#UserApp.33` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.41`;
                let `#UserApp.38` : U8 = 1i64;
                let `#UserApp.39` : U8 = GetTagId `#UserApp.33`;
                let `#UserApp.40` : Int1 = lowlevel Eq `#UserApp.38` `#UserApp.39`;
                if `#UserApp.40` then
                    let `#UserApp.31` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.33`;
                    jump `#UserApp.34` `#UserApp.31`;
                else
                    let `#UserApp.32` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.33`;
                    let `#UserApp.37` : [C {}, C {}] = TagId(0) `#UserApp.32`;
                    ret `#UserApp.37`;
            in
            let `#UserApp.53` : Str = "3";
            let `#UserApp.45` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.53`;
            let `#UserApp.50` : U8 = 1i64;
            let `#UserApp.51` : U8 = GetTagId `#UserApp.45`;
            let `#UserApp.52` : Int1 = lowlevel Eq `#UserApp.50` `#UserApp.51`;
            if `#UserApp.52` then
                let `#UserApp.43` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.45`;
                jump `#UserApp.46` `#UserApp.43`;
            else
                let `#UserApp.44` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.45`;
                let `#UserApp.49` : [C {}, C {}] = TagId(0) `#UserApp.44`;
                ret `#UserApp.49`;
        in
        let `#UserApp.65` : Str = "2";
        let `#UserApp.57` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.65`;
        let `#UserApp.62` : U8 = 1i64;
        let `#UserApp.63` : U8 = GetTagId `#UserApp.57`;
        let `#UserApp.64` : Int1 = lowlevel Eq `#UserApp.62` `#UserApp.63`;
        if `#UserApp.64` then
            let `#UserApp.55` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.57`;
            jump `#UserApp.58` `#UserApp.55`;
        else
            let `#UserApp.56` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.57`;
            let `#UserApp.61` : [C {}, C {}] = TagId(0) `#UserApp.56`;
            ret `#UserApp.61`;
    in
    let `#UserApp.77` : Str = "1";
    let `#UserApp.69` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.77`;
    let `#UserApp.74` : U8 = 1i64;
    let `#UserApp.75` : U8 = GetTagId `#UserApp.69`;
    let `#UserApp.76` : Int1 = lowlevel Eq `#UserApp.74` `#UserApp.75`;
    if `#UserApp.76` then
        let `#UserApp.67` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.69`;
        jump `#UserApp.70` `#UserApp.67`;
    else
        let `#UserApp.68` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.69`;
        let `#UserApp.73` : [C {}, C {}] = TagId(0) `#UserApp.68`;
        ret `#UserApp.73`;

view this post on Zulip Anton (Jun 16 2025 at 16:06):

It's this loop specifically.

view this post on Zulip Anton (Jun 17 2025 at 18:14):

Alright, I was able to hack together a fix for my specific problematic file as proof of concept :)
I should be able to turn this proof of concept into a proper fix :fingers_crossed:

view this post on Zulip Brendan Hansknecht (Jun 18 2025 at 02:48):

Congrats @Anton :sparkler:

view this post on Zulip Ayaz Hafiz (Jun 18 2025 at 04:29):

what was the issue?

view this post on Zulip Anton (Jun 18 2025 at 09:16):

This loop does expensive fixed point iteration even when all variables that could potentially be reference counted are unused.

view this post on Zulip Anton (Jun 18 2025 at 15:31):

I made a tool so that I can understand how the recursion iterates over the code :)
debug_tool.mp4

view this post on Zulip Anthony Bullard (Jun 18 2025 at 17:59):

This is awesome. I want to know how you built it!

view this post on Zulip Anton (Jun 20 2025 at 09:02):

Here is the source code index.html

view this post on Zulip Anton (Jun 20 2025 at 09:06):

I used the regular Claude web interface to start, for js you can have a live preview on the side so that is very useful. This was my initial back and forth with claude: https://claude.ai/share/7ff2aacf-b7a5-40ad-839c-46b0e70b1882

view this post on Zulip Anton (Jun 20 2025 at 09:09):

Claude servers were overloaded so then I switched to the latest gemini pro 2.5 (free on https://aistudio.google.com). I don't think it has the same live js preview, so I had to a bunch of manual copy, paste and run. In between I did some good old fashion human brain debugging and that's it :)

view this post on Zulip Anton (Jun 20 2025 at 09:13):

From several past experiments, I've found vanilla js to improve things a lot. Models can have a difficult time fixing React errors, but they often default to React if you don't specify what to use.

view this post on Zulip Luke Boswell (Jun 20 2025 at 09:15):

I've had good experiences using Elm. Though I haven't built anything substantial with it, mostly just smaller demo things.

view this post on Zulip Anton (Jun 20 2025 at 09:17):

Good to know, I should give roc htmx a try as well :)

view this post on Zulip Anton (Jun 23 2025 at 18:28):

status update: I'm trying to achieve a more ambitious improvement then earlier, doing less work for unused variables is pretty easy. I'm now also trying to prevent work for vars after their last use. There is a way to this with a lot of reasonably simple code changes but I'm trying to see if I can make it work with a lot less code changes.

This nested joinpoint case is very common in current Roc and blows up 2^n so it's worth spending some time on in my opinion.

I believe we still plan on doing perceus refcounting in the new compiler @Richard Feldman?

view this post on Zulip Richard Feldman (Jun 23 2025 at 20:11):

for sure!

view this post on Zulip Anton (Jun 27 2025 at 16:42):

It is done :tada: :fireworks: :partying_face: PR#7884
fix_comparison_plot.png

view this post on Zulip Richard Feldman (Jun 27 2025 at 17:05):

wooooooooow, amazing @Anton!!!

view this post on Zulip Anthony Bullard (Jun 27 2025 at 18:06):

great job @Anton '

view this post on Zulip Brendan Hansknecht (Jun 28 2025 at 06:42):

Wait....but I thought the goal of all graphs was to go up and to the right.....

view this post on Zulip Brendan Hansknecht (Jun 28 2025 at 06:42):

:grinning_face_with_smiling_eyes: Congrats on fixing this @Anton

view this post on Zulip Anton (Jun 28 2025 at 09:12):

Brendan Hansknecht said:

Wait....but I thought the goal of all graphs was to go up and to the right.....

I should have plotted the speedup % :p


Last updated: Jul 06 2025 at 12:14 UTC