Stream: compiler development

Topic: Size on heap


view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 05:18):

For the change to put the size of a list (number of elements) on the heap, the size is only put on the heap for lists that actually have refcounted elements. So if it is a string (list u8), there will be no size on the heap. As such, I was thinking that this would be modeled directly and piped around through the bitcode.

This is a base commit of just changing the zig bitcode to see what that would look like (obviously a lot more pipelining is needed in all of the backends). Does this approach seam reasonable? Is there a better way I should do this?

https://github.com/roc-lang/roc/commit/638cc7780c46b94239af90d431b1e84c806921d6

Note: the zig in this change is definitly not fully converted. I also need to go through and remove a bunch of recursive refcount changes that will no longer happen. Now, recursive increments should only happen on creating a new unique copy of a list and recursive decrements should only happen on deallocation. All other increments and decrements should only happen to the outer list non-recursively.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 05:20):

@Folkert de Vries I assume you are most likely to have input here. Though the input may just be that I am gonna have a really fun time updating every single location that can call the bitcode paths.

On the rust side, this variable will be initialized as interner.contains_refcounted(element_layout)

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 05:20):

And for types like unions and strings that are allocated with these helpers but don't have an element type, it will just be set to false.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 06:04):

One extra piece that I am realizing that will be kinda inconvenient to pipeline through is specifically for list incref. If a list has refcounted elements and is incrementing from a refcount of 1 to 2, we need to save it's size to the heap. This means that we either need to pass all of this information to all incref functions, or we need to make a custom incref function for list and distinguish this in all cases (so like dispatching correctly when we have a struct that has a union in it that has a list of refcounted data in that).

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 06:49):

Although may be best to overall reduce the footprint of all of these versions and just make a list specific version of incref and decref that is exposed to the backends. Then each backend has to code gen based on if something is a list or not and only for lists they would pass in whether or not an element is refcount. Probably will be easier to follow and a bit more performant than if these variables are pervasive to literally all recounting functions.

.... Just trying to brainstorm what will end up being the nicest way to do this.

view this post on Zulip Folkert de Vries (Dec 11 2023 at 08:16):

did I miss something? why do we want this?

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:28):

This is to fix the reverse list refcounting.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:29):

When list recount is normally incremented or decremented it will just be the main list from now on

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:30):

Only on making a new unique copy of a list or fully deallocating a list with the recursive refcount need to happen.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:31):

Since a seamless slice does not know the size of the original list, it is unable to free all of the elements if it is the last reference to the list.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:32):

So we have to put the size on the heap to enable this feature.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:33):

When chatting with Richard about this, he suggested that to avoid adding the size before all strings and lists even ones where it isn't really needed, we can decided to added it or not and do this extra logic based on if the element is refcounted or not. If the element is not refcounted, then it never has recursive refocounts anyway.

view this post on Zulip Brendan Hansknecht (Dec 11 2023 at 15:38):

Main goal is to fix perf bugs like this:

Brendan Hansknecht said:

Purple is refcounting. It is 98.6% of the program execution:
Screenshot-2023-12-09-at-10.30.06AM.png

view this post on Zulip Folkert de Vries (Dec 12 2023 at 16:21):

ah, sure

view this post on Zulip Folkert de Vries (Dec 12 2023 at 16:21):

that makes sense

view this post on Zulip Brendan Hansknecht (Dec 13 2023 at 21:24):

@Folkert de Vries, One other question on this. If I pull all the list refcounting into it's own function. So Lists always call roc_builtins.list.incref or something like that, do you see any issue with building the dispatch? I assume that all of the generation for our refcounting functions will have all the information needed to be able to at compile time pick which version to call (generic refcount vs list refcount), correct?

view this post on Zulip Folkert de Vries (Dec 13 2023 at 21:58):

we generate specific functions to inc/dec a list. you'll need to adjust those functions (and only those functions) I think?

view this post on Zulip Brendan Hansknecht (Dec 13 2023 at 21:59):

Ok

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:16):

I am starting to look at this a bit more and realizing that tons more list functions now need to take an element incref and element decref function. Cause anything that might free a list (even a potentially empty one) now needs an element decref function. And anything that might make a new allocation of a list now needs an element incref function.

So instead of having one main location for incref and decref, it is much more distributed. Just so many args to pass around then pipe through all calling functions. Not hard, just a lot of work and a good chance that I will mess something up and create some mem leaks or other issues.

view this post on Zulip Folkert de Vries (Dec 19 2023 at 17:17):

yes (that is why this never happened). in valgrind we trust I suppose

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:21):

yeah, if this wasn't so important for performance, I definitely wouldn't want to take on the complexity.

Like I guess we could try the borrowing idea first and see if this is still needed. With the borrowing idea, functions that just read would be saved from the horrid recursive refcounts. Functions that might modify hopefully would take a unique list anyway. So they would take ownership of the list and no refcount changes would be needed.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:25):

Borrowing would just be analysis of the mono ir. Starting at the leaf node functions and working back to the root. For any function that a list could be modified or return, we label it as owning. For all other functions, borrowing. Then for borrowing functions, we skip all the refcount inc and dec. Not sure how hard that analysis would be in practice, but seems like it would mostly be a matter of walking the ir and tracking variables.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:27):

I think the biggest concern for borrowing will be records. Cause we would probably check for borrowing of the record as a whole rather than for borrowing of each field in a record. Thus a function that changes one field in a record, but doesn't touch a refcounted list field would also need to increment the refcount of that list.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:39):

Just debating the effort, gain, and complexity trade off. Especially considering this is a big pain point hit by many many people working on AOC currently, so would be nice to fix them sooner (even if not a complete fix but more a workaround).

view this post on Zulip Richard Feldman (Dec 19 2023 at 17:50):

yeah my overall thoughts on this are:

  1. In general I'm very much in favor of trying out more aggressive borrowing strategies. I know there's a concern about potentially blowing up the number of specializations (and therefore binary size and compile times), but that seems like one of those things where we can't really know how big (or little) of a problem it is without actually trying it. It seems very hard to predict how often it would come up in real-world programs. Like maybe in practice it's totally fine because—much like how let inference has theoretically abysmal worst-case performance, in practice it just turns out people don't write programs in the way that would trigger the pathological case.
  2. Regarding borrowing of record fields, I assume that could be treated as a separate project, right? Like it wouldn't have to block trying out a borrowing design that didn't treat records differently (at least at first).
  3. The upside potential on borrowing inference seems clearly higher in the sense that there would presumably be a lot of refcounts eliminated across programs even in other scenarios than this one.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:51):

  1. Regarding borrowing of record fields, I assume that could be treated as a separate project, right? Like it wouldn't have to block trying out a borrowing design that didn't treat records differently (at least at first).

100%

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:52):

Do you think it is likely that borrow inference could remove the need for these other changes that deal with recursive refcounts? Or do you think we would likely need both?

view this post on Zulip Richard Feldman (Dec 19 2023 at 17:55):

personally I have no idea; others would have a better idea about that than I would :big_smile:

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 17:55):

fair enough. I think it would fix all of the cases getting hit by the AOC currently.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 22:49):

So. I decided to do a manual MVC of borrowing. The borrowing is just hardcoded for an example AOC app that recently hit big perf issues the other day.

Changes here: https://github.com/roc-lang/roc/compare/main...borrowing-hacks

At least for this arbitrary app, this 100% fixes the perf issue. Comparing the borrowing hack branch to a manual workaround of using U8s instead of strings:

Benchmark 1: /tmp/aoc-borrowing
  Time (mean ± σ):     259.1 ms ±   1.8 ms    [User: 255.9 ms, System: 1.3 ms]
  Range (min … max):   256.8 ms … 263.0 ms    11 runs

Benchmark 2: /tmp/aoc-workaround
  Time (mean ± σ):     253.7 ms ±   4.5 ms    [User: 249.8 ms, System: 1.5 ms]
  Range (min … max):   247.7 ms … 262.5 ms    11 runs

Summary
  '/tmp/aoc-workaround' ran
    1.02 ± 0.02 times faster than '/tmp/aoc-borrowing'

Without the fix, this app takes ~70 seconds on my M1 mac. So borrowing is definitely very promising. Still not sure how often borrowing will fail to help. I guess the cases where we will still need size on heap will be cases where the same value is passed to multiple different functions that each might modify it. So none of those function can borrow it and we need to increment and decrements refcounts around each function. Of course, if the function modifies, the refcounting was needed anyway and is not cost. So it would only matter if the functions are unlikely to modify the value. So long term we may still want both, but maybe there is another better optimization for situations like those.

view this post on Zulip Richard Feldman (Dec 19 2023 at 22:52):

wow, 70 seconds to 250ms is...quite a difference :laughing:

view this post on Zulip Richard Feldman (Dec 19 2023 at 22:53):

how does the borrowing inference algorithm work here?

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 22:54):

incrementing and decrementing 12100 refcounts in a hot loop is slow....who would have guessed.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 23:01):

In this case there is no inference algorithm, I just manually labeled some functions and joinpoints.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 23:02):

Then updated the refcount insertion algorithm to obey those new annotations.

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 23:04):

The manual labeling is at the bottom of the code changes.

view this post on Zulip Richard Feldman (Dec 19 2023 at 23:18):

gotcha... @Folkert de Vries @Ayaz Hafiz I know you've both talked about thoughts around borrow inference in the past - what do you think of the idea of trying it out and seeing if there's an explosion of specializations in practice vs not? (Maybe there's some way we could answer that question without implementing the whole thing?)

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 23:19):

Why would it explode specialization?

view this post on Zulip Brendan Hansknecht (Dec 19 2023 at 23:20):

In my mind, each function has exactly one borrowing signature based on its implementation.

view this post on Zulip Ayaz Hafiz (Dec 20 2023 at 02:20):

Don't we already have a borrow inference algorithm?

view this post on Zulip Ayaz Hafiz (Dec 20 2023 at 02:21):

At least Morphic definitely specializes based on borrows

view this post on Zulip Ayaz Hafiz (Dec 20 2023 at 02:21):

The specialization is because you can you specialize a function to take its arguments as owned, and one to take arguments as borrowed (so potentially 2^N in the number of arguments)

view this post on Zulip Brendan Hansknecht (Dec 20 2023 at 02:48):

Sure, but if you can take as borrowed, why would you ever take as owned?

view this post on Zulip Ayaz Hafiz (Dec 20 2023 at 03:02):

Because it could avoid a ref count if the borrowed version needs a ref
count in some branch

view this post on Zulip Brendan Hansknecht (Dec 20 2023 at 03:41):

Hmm....I guess I need to see an example. Cause in my mind if a function borrows, all functions it calls must also borrow, so it just fully avoids refocounts.

With branches we have:

# this is called from a function that owns a.
# it has to own a cause it calls another owning function
if ... then
   owningFn a
else
    # references a to generate some new value
    borrowingFn a

The first one has to be owning cause it mutates. The second one we have the choice, of borrowing or owning. That said, it is always fine to borrow (sometime minor delay in freeing memory) and depending on the exact use borrowing will be better.

The second branch is one of these two:

# borrow
else
    borrowingFn a
    dec a

# own
else
    # will Dec at some point inside of the function.
    borrowingForceOwningFn a

Those two are essentially the same, both have one decref, but owning frees memory potentially earlier. That said, it is the only case where borrowing is a disadvantage. If the borrowing function returns a value and then we use a, it will be strictly better for the function to borrow. Would look something like this:

# borrow
else
    someVal = borrowingFn a
    # use some val and a

# own
else
    inc a
    # will Dec at some point inside of the function.
    someVal = borrowingForceOwningFn a
    # use some val and a

Am I missing something. Besides the chance of a delayed free, I see no disadvantage of always borrowing when possible.

view this post on Zulip Folkert de Vries (Dec 20 2023 at 10:09):

yeah I think with the simple heuristic (any mutation opportunity -> owned, otherwise borrowed) you don't get excess specializations. There is just the risk of keeping allocations around for much longer than needed if something gets borrowed but is not actually used in practice

view this post on Zulip Folkert de Vries (Dec 20 2023 at 10:11):

maybe there should be some mechanism to force ownership regardless, a fake mutation

view this post on Zulip Folkert de Vries (Dec 20 2023 at 10:12):

also, the "can it be mutated" value is not that easy to determine precisely for tag unions: we determine reuse based on drops, but drops are of course inserted differently for owned/borrowed values

view this post on Zulip Folkert de Vries (Dec 20 2023 at 10:13):

you can statically know that there are no reuse opportunities at all in a function, but once there are, you must pass the value as owned even if in practice the value is never updated because the structure of the function does not allow it

view this post on Zulip Richard Feldman (Dec 20 2023 at 16:34):

Folkert de Vries said:

yeah I think with the simple heuristic (any mutation opportunity -> owned, otherwise borrowed) you don't get excess specializations. There is just the risk of keeping allocations around for much longer than needed if something gets borrowed but is not actually used in practice

I could be wrong, but I suspect this would be fine in Roc's case because things are always given to the host as owned, so if they did go longer than necessary without being deallocated, it would only be until the next io operation

view this post on Zulip Brendan Hansknecht (Dec 20 2023 at 17:02):

I think the heuristic would be in practices:

potentially passed to an owning function or returned from the function -> owned
otherwise -> borrow

Also, borrowing is the default in rust for cases like this, and it tends not to be a memory probably. So I bet it would be fine in Roc.

view this post on Zulip Brendan Hansknecht (Dec 20 2023 at 17:19):

I know this is jumping back some, but I decided to attempt to update the entire bitcode for size on heap (just to see what it would look like in practice). I'm sure it has bugs, but I think this is the full bitcode updated (wish we had more zig tests cause they can all catch memory issues). Well, also need to remove our eager element decrementing in various functions.
https://github.com/roc-lang/roc/compare/main...list-size-on-heap

Changes the api for essentially every list function and the utils refcounting functions. So that would need to be wired through to every calling location in the compiler. Would be a matter of generating and passing more element increment and decrement functions to these various locations. Would also require ensuring all list refcounting goes through the new zig exposed list refcounting functions.

So I think it would be a slog with a lot of bugs to work through once wired up, but definitely doable. Should be a lot of the same at least with a similar pattern of changes for all of the functions.

I still think the borrowing changes will be a fundamental need, but as I was manually testing borrowing and thinking about edge cases, I get more and more the feeling that both of these changes will be needed.

view this post on Zulip Richard Feldman (Dec 20 2023 at 17:44):

I buy that! Does anyone think we shouldn't at least try both and see what we run into in practice? (Especially @Folkert de Vries or @Ayaz Hafiz :big_smile:)

view this post on Zulip Folkert de Vries (Dec 20 2023 at 17:45):

yes that sounds like a good idea

view this post on Zulip Anton (Dec 22 2023 at 15:03):

(wish we had more zig tests cause they can all catch memory issues)

@Brendan Hansknecht I just remembered you saying this :p could you make an issue for it? This definitely sounds like something we should do.

view this post on Zulip Brendan Hansknecht (Dec 22 2023 at 15:30):

#6301


Last updated: Jul 06 2025 at 12:14 UTC