slow build with lots of `?` · compiler development

20 is the same thing with 5 more Stdout.line! calls. The flamegraph for 15 looks normal but for 20 it looks off. It spends a lot of time in handle_mm_fault though so that may be what is going wrong.

Anton (Jun 10 2025 at 09:49):

Anton (Jun 10 2025 at 10:15):

Anton (Jun 10 2025 at 15:37):

# uftrace diff
#  [0] base: uftrace.data       (from uftrace record --filter roc_* ./target/debug/roc build temp_long_15.roc)
#  [1] diff: uftrace20.data/    (from uftrace record --filter roc_* ./target/debug/roc build temp_long_20.roc)
#
    Total time     Self time         Calls   Function
   ===========   ===========   ===========   ====================
    +13.007  m     +0.084 us            +0   roc_build::program::build_file
    +13.007  m     +0.497 us            +0   roc_cli::build
    +13.007  m    +14.717 us            +0   roc::main
    +13.007  m     +0.385 us            +0   roc_load::load
    +13.007  m     +0.218 us            +0   roc_load::load_and_monomorphize
    +13.007  m    +23.707 us            +0   roc_load_internal::file::load_single_threaded
    +13.007  m     -1.656 us            +0   roc_load_internal::file::load
    +13.007  m    +39.427 us            +0   roc_load_internal::file::state_thread_step
    +13.007  m   +101.288 us            +0   roc_load_internal::file::update
    +13.007  m     +2.436 us            +0   roc_mono::inc_dec::insert_inc_dec_operations
    +13.007  m     +4.556 us            +0   roc_mono::inc_dec::insert_inc_dec_operations_proc
    +13.007  m    +10.585  s      +4571136   roc_mono::inc_dec::insert_refcount_operations_stmt
     +7.007  m     +2.716  s     +11681792   roc_mono::inc_dec::insert_refcount_operations_stmt::_{{closure}}
     +5.042  m    +16.057  s     +65704457   core::hash::BuildHasher::hash_one
     +5.015  m     +4.405  s     +67737523   core::hash::impls::_<impl core::hash::Hash for &T>::hash
     +4.053  m   +190.553 ms      +2031656   _<core::iter::adapters::rev::Rev<I>>::fold
     +4.053  m     +1.089  s      +2031656   core::iter::traits::double_ended::DoubleEndedIterator::rfold
     +4.048  m    +10.363  s     +65700970   _<roc_module::symbol::Symbol>::hash
     +4.027  m    +59.520  s     +135469517   _<wyhash::traits::WyHash>::write
     +4.026  m     +8.766  s     +131405247   core::hash::Hasher::write_u32
     +3.027  m     +9.369  s      +5079040   roc_mono::inc_dec::insert_refcount_operations_binding
     +3.013  m     +4.815  s     +27426891   hashbrown::map::HashMap<K,V,S,A>::contains_key
     +2.048  m     +1.880  s     +11681792   roc_mono::inc_dec::SymbolRcTypes::get
     +2.046  m     +3.340  s     +14222412   hashbrown::map::HashMap<K,V,S,A>::insert
     +2.046  m     +1.857  s     +23363644   std::collections::hash::set::HashSet<T,S>::contains
     +2.022  m     +5.588  s     +65704262   core::hash::impls::_<impl core::hash::Hash for u32>::hash
     +2.015  m     +5.469  s     +65700985   _<core::num::nonzero::NonZero<u32>>::hash
     +2.010  m   +234.683 ms      +1524052   bumpalo::collections::vec::Vec<T>::from_iter_in
     +2.009  m   +831.661 ms      +1524247   _<bumpalo::collections::vec::Vec<T>>::extend
     +2.004  m     +1.001  m     +135469517   wyhash::functions::wyhash_core
     +2.002  m   +714.215 ms      +9651156   std::collections::hash::map::HashMap<K,V,S>::insert
     +1.059  m   +337.363 ms      +2038914   _<core::iter::adapters::map::Map<I,F>>::next
     +1.059  m     +8.820  s     +51200480   _<hashbrown::map::HashMap<K,V,S,A>>::clone
     +1.059  m   +117.193 ms      +1020568   core::ops::function::impls::_<impl core::ops::function::FnOnce<A> for &mut F>::call_once
     +1.057  m   +393.806 ms      +5586944   roc_mono::inc_dec::RefcountEnvironment::add_symbol
     +1.056  m   +607.791 ms      +5586944   roc_mono::inc_dec::RefcountEnvironment::add_symbol_with
     +1.049  m     +8.621  s     +51200480   _<hashbrown::raw::RawTable<T,A>>::clone
     +1.047  m   +102.891 ms      +1015828   bumpalo::collections::collect_in::CollectIn::collect_in
     +1.047  m    +83.509 ms      +1015828   _<bumpalo::collections::vec::Vec<T>>::from_iter_in
     +1.045  m   +547.355 ms      +5588649   core::iter::traits::iterator::Iterator::collect
     +1.044  m     +1.318  s      +5586949   _<std::collections::hash::set::HashSet<T,S>>::from_iter
     +1.043  m     +1.095  s      +7110656   roc_mono::inc_dec::RefcountEnvironment::get_symbol_rc_type
     +1.043  m     +1.366  s      +5586949   _<hashbrown::set::HashSet<T,S,A>>::extend
     +1.041  m     +1.801  s      +5586964   _<hashbrown::map::HashMap<K,V,S,A>>::extend
     +1.039  m   +590.532 ms      +6096312   _<core::iter::adapters::map::Map<I,F>>::fold
     +1.036  m   +532.162 ms      +6603467   core::iter::traits::iterator::Iterator::for_each
     +1.030  m     +2.694  s     +14222412   hashbrown::raw::RawTable<T,A>::find_or_find_insert_slot
     +1.024  m     +4.486  s     +54755838   _<hashbrown::raw::RawTable<T,A>>::drop
     +1.022  m   +412.975 ms      +2539520   _<roc_mono::inc_dec::RefcountEnvironment>::clone
     +1.022  m    +21.982  s     +44698317   hashbrown::raw::RawTable<T,A>::find
     +1.022  m   +401.394 ms      +5079495   _<std::collections::hash::map::HashMap<K,V,S>>::clone
     +1.020  m   +705.180 ms      +9144322   std::collections::hash::map::HashMap<K,V,S>::get
     +1.019  m     +6.564  s     +54755838   hashbrown::raw::RawTableInner::drop_inner_table
     +1.019  m     +1.780  s      +9144437   hashbrown::map::HashMap<K,V,S,A>::get
     +1.015  m     +1.023  s     +28901461   _<hashbrown::raw::RawTable<T,A>>::clone_from_spec
     +1.014  m     +8.608  s      +2539575   hashbrown::raw::RawTable<T,A>::clone_from_impl
     +1.013  m    +25.222  s     +270939034   _<core::slice::iter::Chunks<T>>::next
     +1.010  m   +438.165 ms      +5079040   roc_mono::inc_dec::RefcountEnvironment::consume_rc_symbol
     +1.010  m     +2.412  s     +36570443   hashbrown::raw::RawTable<T,A>::get
     +1.008  m   +443.216 ms      +4063252   _<core::iter::adapters::filter::Filter<I,P>>::fold
     +1.005  m   +943.156 ms      +3047424   roc_mono::inc_dec::RefcountEnvironment::borrowed_usages
     +1.004  m   +475.935 ms      +2539520   core::ptr::drop_in_place<roc_mono::inc_dec::RefcountEnvironment>
     +1.003  m   +374.460 ms      +2031616   roc_mono::inc_dec::RefcountEnvironment::consume_symbol
     +1.002  m     +1.004  s     +21333133   hashbrown::raw::RawTable<T,A>::reserve
     +1.001  m   +467.025 ms      +3555368   core::iter::adapters::filter::filter_fold::_{{closure}}
     +1.001  m     +5.078  s      +8175857   hashbrown::raw::RawTable<T,A>::reserve_rehash
     +1.000  m   +268.364 ms      +2539520   core::ptr::drop_in_place<std::collections::hash::map::HashMap<roc_mono::ir::JoinPointId,std::collections::hash::set::HashSet<roc_module::symbol::Symbol,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>>
     +1.000  m   +234.728 ms      +2539520   core::ptr::drop_in_place<hashbrown::map::HashMap<roc_mono::ir::JoinPointId,std::collections::hash::set::HashSet<roc_module::symbol::Symbol,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>,core::hash::BuildHasherDefault<wyhash::traits::WyHash>>>

Anton (Jun 10 2025 at 15:39):

I'm going to see if I can reproduce this with a simple platform, basic-cli may cause a bunch of distractions when debugging.

Anton (Jun 10 2025 at 17:52):

Reproduction with simple platform :tada:
The list argument is key, even though it is unused! It does not happen with myargs = 5 or {}, but it does also happen with "".

app [main!] { pf: platform "../test-platform-effects-zig/main.roc" }

import pf.Effect

main! : {} => {}
main! = |{}|
    myargs = []
    when wrapper!(myargs) is
        Ok(he) -> Effect.put_line!(Inspect.to_str(he))
        Err(e) -> Effect.put_line!(Inspect.to_str(e))

wrapper! = |_args|
    Effect.foo_line!("1")?
    Effect.foo_line!("2")?
    Effect.foo_line!("3")?
    Effect.foo_line!("4")?
    Effect.foo_line!("5")?
    Effect.foo_line!("6")?
    Effect.foo_line!("7")?
    Effect.foo_line!("8")?
    Effect.foo_line!("9")?
    Effect.foo_line!("10")?
    Effect.foo_line!("11")?
    Effect.foo_line!("12")?
    Effect.foo_line!("13")?
    Effect.foo_line!("14")?
    Effect.foo_line!("15")?
    Effect.foo_line!("16")?
    Effect.foo_line!("17")

Anton (Jun 10 2025 at 18:15):

Using myargs = [] instead of myargs = 0 leads to 73628 more calls of roc_mono::inc_dec::insert_refcount_operations_stmt

Brendan Hansknecht (Jun 10 2025 at 18:28):

Brendan Hansknecht (Jun 10 2025 at 18:40):

Ayaz Hafiz (Jun 10 2025 at 23:38):

Brendan Hansknecht (Jun 11 2025 at 00:46):

Ah yeah, I thought it didn't have closures, but probably has repeat calls to Result.try

Brendan Hansknecht (Jun 11 2025 at 00:46):

Anton (Jun 11 2025 at 18:51):

Yeah it's using closures, but do you know why that would result in a massive difference depending on the type of myargs?

Brendan Hansknecht (Jun 11 2025 at 18:53):

Brendan Hansknecht (Jun 11 2025 at 18:54):

Still really should not explode like that....feels exponential...probably a bad algorithm.

Anton (Jun 11 2025 at 18:55):

Thanks Brendan :)
On a further simplified example I did find it was repeating the same lengthy "recursive walk" multiple times so memoization is possible

Brendan Hansknecht (Jun 11 2025 at 18:55):

Anton (Jun 13 2025 at 18:24):

I know how I want to solve this, I'm currently entangled in a web of algorithmic and lifetime complexity :web: :smile:

Anton (Jun 14 2025 at 18:02):

The optimizations I tried where either not significant or resulted in wrong reference counts when running all tests. Let's make sure to write unit tests for complicated functions in the new compiler :p
I'm going to give it one more day to see if I can skip work for unused variables.

Brendan Hansknecht (Jun 14 2025 at 18:32):

Did you try changing the desugaring to use when is instead of result.try (assuming it is currently using result.try)?

Anton (Jun 16 2025 at 11:58):

Anton (Jun 16 2025 at 12:27):

Turns out they are actually using when. It goes: TrySuffix > LowLevelTry > Try > When. Earlier I was looking at the IR after specialization and it wasn't clear to me that the problematic IR came from when expressions.

Anton (Jun 16 2025 at 12:53):

I like the overview we get of the output all phases with the snapshots in the new compiler, it requires substantial effort to get that output for some phases from the old compiler.

Brendan Hansknecht (Jun 16 2025 at 15:20):

Anton (Jun 16 2025 at 15:43):

procedure : `#UserApp.wrapper!` [C {}, C {}]
procedure = `#UserApp.wrapper!` (`#UserApp.7`: List []):
    joinpoint `#UserApp.70` `#UserApp.66`:
        joinpoint `#UserApp.58` `#UserApp.54`:
            joinpoint `#UserApp.46` `#UserApp.42`:
                joinpoint `#UserApp.34` `#UserApp.30`:
                    joinpoint `#UserApp.22` `#UserApp.18`:
                        let `#UserApp.17` : Str = "6";
                        let `#UserApp.16` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.17`;
                        ret `#UserApp.16`;
                    in
                    let `#UserApp.29` : Str = "5";
                    let `#UserApp.21` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.29`;
                    let `#UserApp.26` : U8 = 1i64;
                    let `#UserApp.27` : U8 = GetTagId `#UserApp.21`;
                    let `#UserApp.28` : Int1 = lowlevel Eq `#UserApp.26` `#UserApp.27`;
                    if `#UserApp.28` then
                        let `#UserApp.19` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.21`;
                        jump `#UserApp.22` `#UserApp.19`;
                    else
                        let `#UserApp.20` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.21`;
                        let `#UserApp.25` : [C {}, C {}] = TagId(0) `#UserApp.20`;
                        ret `#UserApp.25`;
                in
                let `#UserApp.41` : Str = "4";
                let `#UserApp.33` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.41`;
                let `#UserApp.38` : U8 = 1i64;
                let `#UserApp.39` : U8 = GetTagId `#UserApp.33`;
                let `#UserApp.40` : Int1 = lowlevel Eq `#UserApp.38` `#UserApp.39`;
                if `#UserApp.40` then
                    let `#UserApp.31` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.33`;
                    jump `#UserApp.34` `#UserApp.31`;
                else
                    let `#UserApp.32` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.33`;
                    let `#UserApp.37` : [C {}, C {}] = TagId(0) `#UserApp.32`;
                    ret `#UserApp.37`;
            in
            let `#UserApp.53` : Str = "3";
            let `#UserApp.45` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.53`;
            let `#UserApp.50` : U8 = 1i64;
            let `#UserApp.51` : U8 = GetTagId `#UserApp.45`;
            let `#UserApp.52` : Int1 = lowlevel Eq `#UserApp.50` `#UserApp.51`;
            if `#UserApp.52` then
                let `#UserApp.43` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.45`;
                jump `#UserApp.46` `#UserApp.43`;
            else
                let `#UserApp.44` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.45`;
                let `#UserApp.49` : [C {}, C {}] = TagId(0) `#UserApp.44`;
                ret `#UserApp.49`;
        in
        let `#UserApp.65` : Str = "2";
        let `#UserApp.57` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.65`;
        let `#UserApp.62` : U8 = 1i64;
        let `#UserApp.63` : U8 = GetTagId `#UserApp.57`;
        let `#UserApp.64` : Int1 = lowlevel Eq `#UserApp.62` `#UserApp.63`;
        if `#UserApp.64` then
            let `#UserApp.55` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.57`;
            jump `#UserApp.58` `#UserApp.55`;
        else
            let `#UserApp.56` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.57`;
            let `#UserApp.61` : [C {}, C {}] = TagId(0) `#UserApp.56`;
            ret `#UserApp.61`;
    in
    let `#UserApp.77` : Str = "1";
    let `#UserApp.69` : [C {}, C {}] = CallByName `pf.Effect.foo_line!` `#UserApp.77`;
    let `#UserApp.74` : U8 = 1i64;
    let `#UserApp.75` : U8 = GetTagId `#UserApp.69`;
    let `#UserApp.76` : Int1 = lowlevel Eq `#UserApp.74` `#UserApp.75`;
    if `#UserApp.76` then
        let `#UserApp.67` : {} = UnionAtIndex (Id 1) (Index 0) `#UserApp.69`;
        jump `#UserApp.70` `#UserApp.67`;
    else
        let `#UserApp.68` : {} = UnionAtIndex (Id 0) (Index 0) `#UserApp.69`;
        let `#UserApp.73` : [C {}, C {}] = TagId(0) `#UserApp.68`;
        ret `#UserApp.73`;

Anton (Jun 16 2025 at 16:06):

Anton (Jun 17 2025 at 18:14):

Alright, I was able to hack together a fix for my specific problematic file as proof of concept :)
I should be able to turn this proof of concept into a proper fix :fingers_crossed:

Brendan Hansknecht (Jun 18 2025 at 02:48):

Ayaz Hafiz (Jun 18 2025 at 04:29):

Anton (Jun 18 2025 at 09:16):

This loop does expensive fixed point iteration even when all variables that could potentially be reference counted are unused.

Anton (Jun 18 2025 at 15:31):

I made a tool so that I can understand how the recursion iterates over the code :)
debug_tool.mp4

Anthony Bullard (Jun 18 2025 at 17:59):

Anton (Jun 20 2025 at 09:02):

Anton (Jun 20 2025 at 09:06):

Anton (Jun 20 2025 at 09:09):

Claude servers were overloaded so then I switched to the latest gemini pro 2.5 (free on https://aistudio.google.com). I don't think it has the same live js preview, so I had to a bunch of manual copy, paste and run. In between I did some good old fashion human brain debugging and that's it :)

Anton (Jun 20 2025 at 09:13):

From several past experiments, I've found vanilla js to improve things a lot. Models can have a difficult time fixing React errors, but they often default to React if you don't specify what to use.

Luke Boswell (Jun 20 2025 at 09:15):

I've had good experiences using Elm. Though I haven't built anything substantial with it, mostly just smaller demo things.

Anton (Jun 20 2025 at 09:17):

Anton (Jun 23 2025 at 18:28):

status update: I'm trying to achieve a more ambitious improvement then earlier, doing less work for unused variables is pretty easy. I'm now also trying to prevent work for vars after their last use. There is a way to this with a lot of reasonably simple code changes but I'm trying to see if I can make it work with a lot less code changes.

This nested joinpoint case is very common in current Roc and blows up 2^n so it's worth spending some time on in my opinion.

I believe we still plan on doing perceus refcounting in the new compiler @Richard Feldman?

Stream: compiler development

Topic: slow build with lots of `?`

Anton (Jun 09 2025 at 12:42):

Anton (Jun 10 2025 at 09:49):

Anton (Jun 10 2025 at 10:15):

Anton (Jun 10 2025 at 10:15):

Anton (Jun 10 2025 at 15:37):

Anton (Jun 10 2025 at 15:39):

Anton (Jun 10 2025 at 17:52):

Anton (Jun 10 2025 at 18:15):

Brendan Hansknecht (Jun 10 2025 at 18:28):

Brendan Hansknecht (Jun 10 2025 at 18:40):

Brendan Hansknecht (Jun 10 2025 at 18:40):

Ayaz Hafiz (Jun 10 2025 at 23:38):

Ayaz Hafiz (Jun 10 2025 at 23:38):

Ayaz Hafiz (Jun 10 2025 at 23:38):

Brendan Hansknecht (Jun 11 2025 at 00:46):

Brendan Hansknecht (Jun 11 2025 at 00:46):

Anton (Jun 11 2025 at 18:51):

Brendan Hansknecht (Jun 11 2025 at 18:53):

Brendan Hansknecht (Jun 11 2025 at 18:54):

Brendan Hansknecht (Jun 11 2025 at 18:54):

Brendan Hansknecht (Jun 11 2025 at 18:54):

Anton (Jun 11 2025 at 18:55):

Brendan Hansknecht (Jun 11 2025 at 18:55):

Anton (Jun 13 2025 at 18:24):

Anton (Jun 14 2025 at 18:02):

Brendan Hansknecht (Jun 14 2025 at 18:32):

Anton (Jun 16 2025 at 11:58):

Anton (Jun 16 2025 at 12:27):

Anton (Jun 16 2025 at 12:53):

Brendan Hansknecht (Jun 16 2025 at 15:20):

Anton (Jun 16 2025 at 15:43):

Anton (Jun 16 2025 at 16:06):

Anton (Jun 17 2025 at 18:14):

Brendan Hansknecht (Jun 18 2025 at 02:48):

Ayaz Hafiz (Jun 18 2025 at 04:29):

Anton (Jun 18 2025 at 09:16):

Anton (Jun 18 2025 at 15:31):

Anthony Bullard (Jun 18 2025 at 17:59):

Anton (Jun 20 2025 at 09:02):

Anton (Jun 20 2025 at 09:06):

Anton (Jun 20 2025 at 09:09):

Anton (Jun 20 2025 at 09:13):

Luke Boswell (Jun 20 2025 at 09:15):

Anton (Jun 20 2025 at 09:17):

Anton (Jun 23 2025 at 18:28):

Richard Feldman (Jun 23 2025 at 20:11):

Anton (Jun 27 2025 at 16:42):

Richard Feldman (Jun 27 2025 at 17:05):

Anthony Bullard (Jun 27 2025 at 18:06):

Brendan Hansknecht (Jun 28 2025 at 06:42):

Brendan Hansknecht (Jun 28 2025 at 06:42):

Anton (Jun 28 2025 at 09:12):