For the change to put the size of a list (number of elements) on the heap, the size is only put on the heap for lists that actually have refcounted elements. So if it is a string (list u8), there will be no size on the heap. As such, I was thinking that this would be modeled directly and piped around through the bitcode.
This is a base commit of just changing the zig bitcode to see what that would look like (obviously a lot more pipelining is needed in all of the backends). Does this approach seam reasonable? Is there a better way I should do this?
https://github.com/roc-lang/roc/commit/638cc7780c46b94239af90d431b1e84c806921d6
Note: the zig in this change is definitly not fully converted. I also need to go through and remove a bunch of recursive refcount changes that will no longer happen. Now, recursive increments should only happen on creating a new unique copy of a list and recursive decrements should only happen on deallocation. All other increments and decrements should only happen to the outer list non-recursively.
@Folkert de Vries I assume you are most likely to have input here. Though the input may just be that I am gonna have a really fun time updating every single location that can call the bitcode paths.
On the rust side, this variable will be initialized as interner.contains_refcounted(element_layout)
And for types like unions and strings that are allocated with these helpers but don't have an element type, it will just be set to false.
One extra piece that I am realizing that will be kinda inconvenient to pipeline through is specifically for list incref. If a list has refcounted elements and is incrementing from a refcount of 1 to 2, we need to save it's size to the heap. This means that we either need to pass all of this information to all incref functions, or we need to make a custom incref function for list and distinguish this in all cases (so like dispatching correctly when we have a struct that has a union in it that has a list of refcounted data in that).
Although may be best to overall reduce the footprint of all of these versions and just make a list specific version of incref and decref that is exposed to the backends. Then each backend has to code gen based on if something is a list or not and only for lists they would pass in whether or not an element is refcount. Probably will be easier to follow and a bit more performant than if these variables are pervasive to literally all recounting functions.
.... Just trying to brainstorm what will end up being the nicest way to do this.
did I miss something? why do we want this?
This is to fix the reverse list refcounting.
When list recount is normally incremented or decremented it will just be the main list from now on
Only on making a new unique copy of a list or fully deallocating a list with the recursive refcount need to happen.
Since a seamless slice does not know the size of the original list, it is unable to free all of the elements if it is the last reference to the list.
So we have to put the size on the heap to enable this feature.
When chatting with Richard about this, he suggested that to avoid adding the size before all strings and lists even ones where it isn't really needed, we can decided to added it or not and do this extra logic based on if the element is refcounted or not. If the element is not refcounted, then it never has recursive refocounts anyway.
Main goal is to fix perf bugs like this:
Brendan Hansknecht said:
Purple is refcounting. It is 98.6% of the program execution:
Screenshot-2023-12-09-at-10.30.06AM.png
ah, sure
that makes sense
@Folkert de Vries, One other question on this. If I pull all the list refcounting into it's own function. So Lists always call roc_builtins.list.incref
or something like that, do you see any issue with building the dispatch? I assume that all of the generation for our refcounting functions will have all the information needed to be able to at compile time pick which version to call (generic refcount vs list refcount), correct?
we generate specific functions to inc/dec a list. you'll need to adjust those functions (and only those functions) I think?
Ok
I am starting to look at this a bit more and realizing that tons more list functions now need to take an element incref
and element decref
function. Cause anything that might free a list (even a potentially empty one) now needs an element decref function. And anything that might make a new allocation of a list now needs an element incref function.
So instead of having one main location for incref and decref, it is much more distributed. Just so many args to pass around then pipe through all calling functions. Not hard, just a lot of work and a good chance that I will mess something up and create some mem leaks or other issues.
yes (that is why this never happened). in valgrind we trust I suppose
yeah, if this wasn't so important for performance, I definitely wouldn't want to take on the complexity.
Like I guess we could try the borrowing idea first and see if this is still needed. With the borrowing idea, functions that just read would be saved from the horrid recursive refcounts. Functions that might modify hopefully would take a unique list anyway. So they would take ownership of the list and no refcount changes would be needed.
Borrowing would just be analysis of the mono ir. Starting at the leaf node functions and working back to the root. For any function that a list could be modified or return, we label it as owning. For all other functions, borrowing. Then for borrowing functions, we skip all the refcount inc and dec. Not sure how hard that analysis would be in practice, but seems like it would mostly be a matter of walking the ir and tracking variables.
I think the biggest concern for borrowing will be records. Cause we would probably check for borrowing of the record as a whole rather than for borrowing of each field in a record. Thus a function that changes one field in a record, but doesn't touch a refcounted list field would also need to increment the refcount of that list.
Just debating the effort, gain, and complexity trade off. Especially considering this is a big pain point hit by many many people working on AOC currently, so would be nice to fix them sooner (even if not a complete fix but more a workaround).
yeah my overall thoughts on this are:
let
inference has theoretically abysmal worst-case performance, in practice it just turns out people don't write programs in the way that would trigger the pathological case.
- Regarding borrowing of record fields, I assume that could be treated as a separate project, right? Like it wouldn't have to block trying out a borrowing design that didn't treat records differently (at least at first).
100%
Do you think it is likely that borrow inference could remove the need for these other changes that deal with recursive refcounts? Or do you think we would likely need both?
personally I have no idea; others would have a better idea about that than I would :big_smile:
fair enough. I think it would fix all of the cases getting hit by the AOC currently.
So. I decided to do a manual MVC of borrowing. The borrowing is just hardcoded for an example AOC app that recently hit big perf issues the other day.
Changes here: https://github.com/roc-lang/roc/compare/main...borrowing-hacks
At least for this arbitrary app, this 100% fixes the perf issue. Comparing the borrowing hack branch to a manual workaround of using U8
s instead of strings:
Benchmark 1: /tmp/aoc-borrowing
Time (mean ± σ): 259.1 ms ± 1.8 ms [User: 255.9 ms, System: 1.3 ms]
Range (min … max): 256.8 ms … 263.0 ms 11 runs
Benchmark 2: /tmp/aoc-workaround
Time (mean ± σ): 253.7 ms ± 4.5 ms [User: 249.8 ms, System: 1.5 ms]
Range (min … max): 247.7 ms … 262.5 ms 11 runs
Summary
'/tmp/aoc-workaround' ran
1.02 ± 0.02 times faster than '/tmp/aoc-borrowing'
Without the fix, this app takes ~70 seconds on my M1 mac. So borrowing is definitely very promising. Still not sure how often borrowing will fail to help. I guess the cases where we will still need size on heap will be cases where the same value is passed to multiple different functions that each might modify it. So none of those function can borrow it and we need to increment and decrements refcounts around each function. Of course, if the function modifies, the refcounting was needed anyway and is not cost. So it would only matter if the functions are unlikely to modify the value. So long term we may still want both, but maybe there is another better optimization for situations like those.
wow, 70 seconds to 250ms is...quite a difference :laughing:
how does the borrowing inference algorithm work here?
incrementing and decrementing 12100
refcounts in a hot loop is slow....who would have guessed.
In this case there is no inference algorithm, I just manually labeled some functions and joinpoints.
Then updated the refcount insertion algorithm to obey those new annotations.
The manual labeling is at the bottom of the code changes.
gotcha... @Folkert de Vries @Ayaz Hafiz I know you've both talked about thoughts around borrow inference in the past - what do you think of the idea of trying it out and seeing if there's an explosion of specializations in practice vs not? (Maybe there's some way we could answer that question without implementing the whole thing?)
Why would it explode specialization?
In my mind, each function has exactly one borrowing signature based on its implementation.
Don't we already have a borrow inference algorithm?
At least Morphic definitely specializes based on borrows
The specialization is because you can you specialize a function to take its arguments as owned, and one to take arguments as borrowed (so potentially 2^N in the number of arguments)
Sure, but if you can take as borrowed, why would you ever take as owned?
Because it could avoid a ref count if the borrowed version needs a ref
count in some branch
Hmm....I guess I need to see an example. Cause in my mind if a function borrows, all functions it calls must also borrow, so it just fully avoids refocounts.
With branches we have:
# this is called from a function that owns a.
# it has to own a cause it calls another owning function
if ... then
owningFn a
else
# references a to generate some new value
borrowingFn a
The first one has to be owning cause it mutates. The second one we have the choice, of borrowing or owning. That said, it is always fine to borrow (sometime minor delay in freeing memory) and depending on the exact use borrowing will be better.
The second branch is one of these two:
# borrow
else
borrowingFn a
dec a
# own
else
# will Dec at some point inside of the function.
borrowingForceOwningFn a
Those two are essentially the same, both have one decref, but owning frees memory potentially earlier. That said, it is the only case where borrowing is a disadvantage. If the borrowing function returns a value and then we use a, it will be strictly better for the function to borrow. Would look something like this:
# borrow
else
someVal = borrowingFn a
# use some val and a
# own
else
inc a
# will Dec at some point inside of the function.
someVal = borrowingForceOwningFn a
# use some val and a
Am I missing something. Besides the chance of a delayed free, I see no disadvantage of always borrowing when possible.
yeah I think with the simple heuristic (any mutation opportunity -> owned, otherwise borrowed) you don't get excess specializations. There is just the risk of keeping allocations around for much longer than needed if something gets borrowed but is not actually used in practice
maybe there should be some mechanism to force ownership regardless, a fake mutation
also, the "can it be mutated" value is not that easy to determine precisely for tag unions: we determine reuse based on drops, but drops are of course inserted differently for owned/borrowed values
you can statically know that there are no reuse opportunities at all in a function, but once there are, you must pass the value as owned even if in practice the value is never updated because the structure of the function does not allow it
Folkert de Vries said:
yeah I think with the simple heuristic (any mutation opportunity -> owned, otherwise borrowed) you don't get excess specializations. There is just the risk of keeping allocations around for much longer than needed if something gets borrowed but is not actually used in practice
I could be wrong, but I suspect this would be fine in Roc's case because things are always given to the host as owned, so if they did go longer than necessary without being deallocated, it would only be until the next io operation
I think the heuristic would be in practices:
potentially passed to an owning function or returned from the function -> owned
otherwise -> borrow
Also, borrowing is the default in rust for cases like this, and it tends not to be a memory probably. So I bet it would be fine in Roc.
I know this is jumping back some, but I decided to attempt to update the entire bitcode for size on heap (just to see what it would look like in practice). I'm sure it has bugs, but I think this is the full bitcode updated (wish we had more zig tests cause they can all catch memory issues). Well, also need to remove our eager element decrementing in various functions.
https://github.com/roc-lang/roc/compare/main...list-size-on-heap
Changes the api for essentially every list function and the utils refcounting functions. So that would need to be wired through to every calling location in the compiler. Would be a matter of generating and passing more element increment and decrement functions to these various locations. Would also require ensuring all list refcounting goes through the new zig exposed list refcounting functions.
So I think it would be a slog with a lot of bugs to work through once wired up, but definitely doable. Should be a lot of the same at least with a similar pattern of changes for all of the functions.
I still think the borrowing changes will be a fundamental need, but as I was manually testing borrowing and thinking about edge cases, I get more and more the feeling that both of these changes will be needed.
I buy that! Does anyone think we shouldn't at least try both and see what we run into in practice? (Especially @Folkert de Vries or @Ayaz Hafiz :big_smile:)
yes that sounds like a good idea
(wish we had more zig tests cause they can all catch memory issues)
@Brendan Hansknecht I just remembered you saying this :p could you make an issue for it? This definitely sounds like something we should do.
Last updated: Jul 06 2025 at 12:14 UTC