Size on heap · compiler development

For the change to put the size of a list (number of elements) on the heap, the size is only put on the heap for lists that actually have refcounted elements. So if it is a string (list u8), there will be no size on the heap. As such, I was thinking that this would be modeled directly and piped around through the bitcode.

This is a base commit of just changing the zig bitcode to see what that would look like (obviously a lot more pipelining is needed in all of the backends). Does this approach seam reasonable? Is there a better way I should do this?

Note: the zig in this change is definitly not fully converted. I also need to go through and remove a bunch of recursive refcount changes that will no longer happen. Now, recursive increments should only happen on creating a new unique copy of a list and recursive decrements should only happen on deallocation. All other increments and decrements should only happen to the outer list non-recursively.

Brendan Hansknecht (Dec 11 2023 at 05:20):

@Folkert de Vries I assume you are most likely to have input here. Though the input may just be that I am gonna have a really fun time updating every single location that can call the bitcode paths.

On the rust side, this variable will be initialized as interner.contains_refcounted(element_layout)

Brendan Hansknecht (Dec 11 2023 at 05:20):

And for types like unions and strings that are allocated with these helpers but don't have an element type, it will just be set to false.

Brendan Hansknecht (Dec 11 2023 at 06:04):

One extra piece that I am realizing that will be kinda inconvenient to pipeline through is specifically for list incref. If a list has refcounted elements and is incrementing from a refcount of 1 to 2, we need to save it's size to the heap. This means that we either need to pass all of this information to all incref functions, or we need to make a custom incref function for list and distinguish this in all cases (so like dispatching correctly when we have a struct that has a union in it that has a list of refcounted data in that).

Brendan Hansknecht (Dec 11 2023 at 06:49):

Although may be best to overall reduce the footprint of all of these versions and just make a list specific version of incref and decref that is exposed to the backends. Then each backend has to code gen based on if something is a list or not and only for lists they would pass in whether or not an element is refcount. Probably will be easier to follow and a bit more performant than if these variables are pervasive to literally all recounting functions.

.... Just trying to brainstorm what will end up being the nicest way to do this.

Folkert de Vries (Dec 11 2023 at 08:16):

Brendan Hansknecht (Dec 11 2023 at 15:28):

Brendan Hansknecht (Dec 11 2023 at 15:29):

When list recount is normally incremented or decremented it will just be the main list from now on

Brendan Hansknecht (Dec 11 2023 at 15:30):

Only on making a new unique copy of a list or fully deallocating a list with the recursive refcount need to happen.

Brendan Hansknecht (Dec 11 2023 at 15:31):

Since a seamless slice does not know the size of the original list, it is unable to free all of the elements if it is the last reference to the list.

Brendan Hansknecht (Dec 11 2023 at 15:32):

Brendan Hansknecht (Dec 11 2023 at 15:33):

When chatting with Richard about this, he suggested that to avoid adding the size before all strings and lists even ones where it isn't really needed, we can decided to added it or not and do this extra logic based on if the element is refcounted or not. If the element is not refcounted, then it never has recursive refocounts anyway.

Brendan Hansknecht (Dec 11 2023 at 15:38):

Folkert de Vries (Dec 12 2023 at 16:21):

Brendan Hansknecht (Dec 13 2023 at 21:24):

@Folkert de Vries, One other question on this. If I pull all the list refcounting into it's own function. So Lists always call roc_builtins.list.incref or something like that, do you see any issue with building the dispatch? I assume that all of the generation for our refcounting functions will have all the information needed to be able to at compile time pick which version to call (generic refcount vs list refcount), correct?

Folkert de Vries (Dec 13 2023 at 21:58):

we generate specific functions to inc/dec a list. you'll need to adjust those functions (and only those functions) I think?

Brendan Hansknecht (Dec 13 2023 at 21:59):

Brendan Hansknecht (Dec 19 2023 at 17:16):

I am starting to look at this a bit more and realizing that tons more list functions now need to take an element incref and element decref function. Cause anything that might free a list (even a potentially empty one) now needs an element decref function. And anything that might make a new allocation of a list now needs an element incref function.

So instead of having one main location for incref and decref, it is much more distributed. Just so many args to pass around then pipe through all calling functions. Not hard, just a lot of work and a good chance that I will mess something up and create some mem leaks or other issues.

Folkert de Vries (Dec 19 2023 at 17:17):

Brendan Hansknecht (Dec 19 2023 at 17:21):

yeah, if this wasn't so important for performance, I definitely wouldn't want to take on the complexity.

Like I guess we could try the borrowing idea first and see if this is still needed. With the borrowing idea, functions that just read would be saved from the horrid recursive refcounts. Functions that might modify hopefully would take a unique list anyway. So they would take ownership of the list and no refcount changes would be needed.

Brendan Hansknecht (Dec 19 2023 at 17:25):

Borrowing would just be analysis of the mono ir. Starting at the leaf node functions and working back to the root. For any function that a list could be modified or return, we label it as owning. For all other functions, borrowing. Then for borrowing functions, we skip all the refcount inc and dec. Not sure how hard that analysis would be in practice, but seems like it would mostly be a matter of walking the ir and tracking variables.

Brendan Hansknecht (Dec 19 2023 at 17:27):

I think the biggest concern for borrowing will be records. Cause we would probably check for borrowing of the record as a whole rather than for borrowing of each field in a record. Thus a function that changes one field in a record, but doesn't touch a refcounted list field would also need to increment the refcount of that list.

Brendan Hansknecht (Dec 19 2023 at 17:39):

Just debating the effort, gain, and complexity trade off. Especially considering this is a big pain point hit by many many people working on AOC currently, so would be nice to fix them sooner (even if not a complete fix but more a workaround).

Richard Feldman (Dec 19 2023 at 17:50):

Brendan Hansknecht (Dec 19 2023 at 17:51):

Brendan Hansknecht (Dec 19 2023 at 17:52):

Do you think it is likely that borrow inference could remove the need for these other changes that deal with recursive refcounts? Or do you think we would likely need both?

Richard Feldman (Dec 19 2023 at 17:55):

personally I have no idea; others would have a better idea about that than I would :big_smile:

Brendan Hansknecht (Dec 19 2023 at 17:55):

fair enough. I think it would fix all of the cases getting hit by the AOC currently.

Brendan Hansknecht (Dec 19 2023 at 22:49):

At least for this arbitrary app, this 100% fixes the perf issue. Comparing the borrowing hack branch to a manual workaround of using U8s instead of strings:

Benchmark 1: /tmp/aoc-borrowing
  Time (mean ± σ):     259.1 ms ±   1.8 ms    [User: 255.9 ms, System: 1.3 ms]
  Range (min … max):   256.8 ms … 263.0 ms    11 runs

Benchmark 2: /tmp/aoc-workaround
  Time (mean ± σ):     253.7 ms ±   4.5 ms    [User: 249.8 ms, System: 1.5 ms]
  Range (min … max):   247.7 ms … 262.5 ms    11 runs

Summary
  '/tmp/aoc-workaround' ran
    1.02 ± 0.02 times faster than '/tmp/aoc-borrowing'

Without the fix, this app takes ~70 seconds on my M1 mac. So borrowing is definitely very promising. Still not sure how often borrowing will fail to help. I guess the cases where we will still need size on heap will be cases where the same value is passed to multiple different functions that each might modify it. So none of those function can borrow it and we need to increment and decrements refcounts around each function. Of course, if the function modifies, the refcounting was needed anyway and is not cost. So it would only matter if the functions are unlikely to modify the value. So long term we may still want both, but maybe there is another better optimization for situations like those.

Richard Feldman (Dec 19 2023 at 22:52):

Richard Feldman (Dec 19 2023 at 22:53):

Brendan Hansknecht (Dec 19 2023 at 22:54):

incrementing and decrementing 12100 refcounts in a hot loop is slow....who would have guessed.

Brendan Hansknecht (Dec 19 2023 at 23:01):

In this case there is no inference algorithm, I just manually labeled some functions and joinpoints.

Brendan Hansknecht (Dec 19 2023 at 23:02):

Brendan Hansknecht (Dec 19 2023 at 23:04):

Richard Feldman (Dec 19 2023 at 23:18):

gotcha... @Folkert de Vries @Ayaz Hafiz I know you've both talked about thoughts around borrow inference in the past - what do you think of the idea of trying it out and seeing if there's an explosion of specializations in practice vs not? (Maybe there's some way we could answer that question without implementing the whole thing?)

Brendan Hansknecht (Dec 19 2023 at 23:19):

Brendan Hansknecht (Dec 19 2023 at 23:20):

In my mind, each function has exactly one borrowing signature based on its implementation.

Ayaz Hafiz (Dec 20 2023 at 02:20):

Ayaz Hafiz (Dec 20 2023 at 02:21):

The specialization is because you can you specialize a function to take its arguments as owned, and one to take arguments as borrowed (so potentially 2^N in the number of arguments)

Brendan Hansknecht (Dec 20 2023 at 02:48):

Ayaz Hafiz (Dec 20 2023 at 03:02):

Because it could avoid a ref count if the borrowed version needs a ref
count in some branch

Brendan Hansknecht (Dec 20 2023 at 03:41):

Hmm....I guess I need to see an example. Cause in my mind if a function borrows, all functions it calls must also borrow, so it just fully avoids refocounts.

# this is called from a function that owns a.
# it has to own a cause it calls another owning function
if ... then
   owningFn a
else
    # references a to generate some new value
    borrowingFn a

The first one has to be owning cause it mutates. The second one we have the choice, of borrowing or owning. That said, it is always fine to borrow (sometime minor delay in freeing memory) and depending on the exact use borrowing will be better.

# borrow
else
    borrowingFn a
    dec a

# own
else
    # will Dec at some point inside of the function.
    borrowingForceOwningFn a

Those two are essentially the same, both have one decref, but owning frees memory potentially earlier. That said, it is the only case where borrowing is a disadvantage. If the borrowing function returns a value and then we use a, it will be strictly better for the function to borrow. Would look something like this:

# borrow
else
    someVal = borrowingFn a
    # use some val and a

# own
else
    inc a
    # will Dec at some point inside of the function.
    someVal = borrowingForceOwningFn a
    # use some val and a

Am I missing something. Besides the chance of a delayed free, I see no disadvantage of always borrowing when possible.

Folkert de Vries (Dec 20 2023 at 10:09):

yeah I think with the simple heuristic (any mutation opportunity -> owned, otherwise borrowed) you don't get excess specializations. There is just the risk of keeping allocations around for much longer than needed if something gets borrowed but is not actually used in practice

Folkert de Vries (Dec 20 2023 at 10:11):

maybe there should be some mechanism to force ownership regardless, a fake mutation

Folkert de Vries (Dec 20 2023 at 10:12):

also, the "can it be mutated" value is not that easy to determine precisely for tag unions: we determine reuse based on drops, but drops are of course inserted differently for owned/borrowed values

Folkert de Vries (Dec 20 2023 at 10:13):

you can statically know that there are no reuse opportunities at all in a function, but once there are, you must pass the value as owned even if in practice the value is never updated because the structure of the function does not allow it

Richard Feldman (Dec 20 2023 at 16:34):

I could be wrong, but I suspect this would be fine in Roc's case because things are always given to the host as owned, so if they did go longer than necessary without being deallocated, it would only be until the next io operation

Brendan Hansknecht (Dec 20 2023 at 17:02):

potentially passed to an owning function or returned from the function -> owned
otherwise -> borrow

Also, borrowing is the default in rust for cases like this, and it tends not to be a memory probably. So I bet it would be fine in Roc.

Brendan Hansknecht (Dec 20 2023 at 17:19):

I know this is jumping back some, but I decided to attempt to update the entire bitcode for size on heap (just to see what it would look like in practice). I'm sure it has bugs, but I think this is the full bitcode updated (wish we had more zig tests cause they can all catch memory issues). Well, also need to remove our eager element decrementing in various functions.
https://github.com/roc-lang/roc/compare/main...list-size-on-heap

Changes the api for essentially every list function and the utils refcounting functions. So that would need to be wired through to every calling location in the compiler. Would be a matter of generating and passing more element increment and decrement functions to these various locations. Would also require ensuring all list refcounting goes through the new zig exposed list refcounting functions.

So I think it would be a slog with a lot of bugs to work through once wired up, but definitely doable. Should be a lot of the same at least with a similar pattern of changes for all of the functions.

I still think the borrowing changes will be a fundamental need, but as I was manually testing borrowing and thinking about edge cases, I get more and more the feeling that both of these changes will be needed.

Richard Feldman (Dec 20 2023 at 17:44):

I buy that! Does anyone think we shouldn't at least try both and see what we run into in practice? (Especially @Folkert de Vries or @Ayaz Hafiz :big_smile:)

Folkert de Vries (Dec 20 2023 at 17:45):

Anton (Dec 22 2023 at 15:03):

@Brendan Hansknecht I just remembered you saying this :p could you make an issue for it? This definitely sounds like something we should do.

Stream: compiler development

Topic: Size on heap

Brendan Hansknecht (Dec 11 2023 at 05:18):

Brendan Hansknecht (Dec 11 2023 at 05:20):

Brendan Hansknecht (Dec 11 2023 at 05:20):

Brendan Hansknecht (Dec 11 2023 at 06:04):

Brendan Hansknecht (Dec 11 2023 at 06:49):

Folkert de Vries (Dec 11 2023 at 08:16):

Brendan Hansknecht (Dec 11 2023 at 15:28):

Brendan Hansknecht (Dec 11 2023 at 15:29):

Brendan Hansknecht (Dec 11 2023 at 15:30):

Brendan Hansknecht (Dec 11 2023 at 15:31):

Brendan Hansknecht (Dec 11 2023 at 15:32):

Brendan Hansknecht (Dec 11 2023 at 15:33):

Brendan Hansknecht (Dec 11 2023 at 15:38):

Folkert de Vries (Dec 12 2023 at 16:21):

Folkert de Vries (Dec 12 2023 at 16:21):

Brendan Hansknecht (Dec 13 2023 at 21:24):

Folkert de Vries (Dec 13 2023 at 21:58):

Brendan Hansknecht (Dec 13 2023 at 21:59):

Brendan Hansknecht (Dec 19 2023 at 17:16):

Folkert de Vries (Dec 19 2023 at 17:17):

Brendan Hansknecht (Dec 19 2023 at 17:21):

Brendan Hansknecht (Dec 19 2023 at 17:25):

Brendan Hansknecht (Dec 19 2023 at 17:27):

Brendan Hansknecht (Dec 19 2023 at 17:39):

Richard Feldman (Dec 19 2023 at 17:50):

Brendan Hansknecht (Dec 19 2023 at 17:51):

Brendan Hansknecht (Dec 19 2023 at 17:52):

Richard Feldman (Dec 19 2023 at 17:55):

Brendan Hansknecht (Dec 19 2023 at 17:55):

Brendan Hansknecht (Dec 19 2023 at 22:49):

Richard Feldman (Dec 19 2023 at 22:52):

Richard Feldman (Dec 19 2023 at 22:53):

Brendan Hansknecht (Dec 19 2023 at 22:54):

Brendan Hansknecht (Dec 19 2023 at 23:01):

Brendan Hansknecht (Dec 19 2023 at 23:02):

Brendan Hansknecht (Dec 19 2023 at 23:04):

Richard Feldman (Dec 19 2023 at 23:18):

Brendan Hansknecht (Dec 19 2023 at 23:19):

Brendan Hansknecht (Dec 19 2023 at 23:20):

Ayaz Hafiz (Dec 20 2023 at 02:20):

Ayaz Hafiz (Dec 20 2023 at 02:21):

Ayaz Hafiz (Dec 20 2023 at 02:21):

Brendan Hansknecht (Dec 20 2023 at 02:48):

Ayaz Hafiz (Dec 20 2023 at 03:02):

Brendan Hansknecht (Dec 20 2023 at 03:41):

Folkert de Vries (Dec 20 2023 at 10:09):

Folkert de Vries (Dec 20 2023 at 10:11):

Folkert de Vries (Dec 20 2023 at 10:12):

Folkert de Vries (Dec 20 2023 at 10:13):

Richard Feldman (Dec 20 2023 at 16:34):

Brendan Hansknecht (Dec 20 2023 at 17:02):

Brendan Hansknecht (Dec 20 2023 at 17:19):

Richard Feldman (Dec 20 2023 at 17:44):

Folkert de Vries (Dec 20 2023 at 17:45):

Anton (Dec 22 2023 at 15:03):

Brendan Hansknecht (Dec 22 2023 at 15:30):