Stream: compiler development

Topic: SortWith Compare Capture Refcounting


view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 04:10):

With List.sortWith, we pass a compare function to zig.

The compare function could be capturing closure data. Which I think will actually be quite problematic.
We theoretically need to increment the captures to that function for every time the sort function calls compare.
Currently we just increment it for every element in the list.

To instead increment it for every call to the function, the only way would be to put the increment into the hot loop for sorting. I think that is a terrible idea.

I think we should either figure out a way to ban captures for the compare function or find a way to save and then reload the refcounts. Essentially, save the refcount, set the refcount to constant, run sort, set the refcount back to what it was originally.

Thoughts?

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 04:10):

I'm assuming this doesn't cause problems today cause in practice compare functions for lists never capture anything.

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 04:13):

For example worst case of quicksort is n^2 comparisons. Adding n^2 refcount increments to sorting would be brutal.

view this post on Zulip Luke Boswell (Jul 22 2024 at 04:51):

ban captures for the compare function

Would this prevent us from using curried functions?

Maybe this is a silly example...

module []

Color : [Red, Green, Blue]

priority : Color -> U64
priority = \c ->
    when c is
        Red -> 1
        Green -> 2
        Blue -> 3

sortFn = \priorityFn -> \left,right ->
    pl = priorityFn left
    pr = priorityFn right
    if pl < pr then
        LT
    else if pl > pr then
        GT
    else
        EQ

expect
    input = [Red, Blue, Green]
    actual = input |> List.sortWith (sortFn priority)
    expected = [Red, Green, Blue]
    actual == expected # true

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 05:00):

Yeah, it probably too restrictive. So probably need to do the constant thing.

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 05:01):

If you made priorityFn instead be something refcounted, like a list, it should break in current roc due to a double free.

view this post on Zulip Richard Feldman (Jul 22 2024 at 11:45):

Brendan Hansknecht said:

I think we should either figure out a way to ban captures for the compare function or find a way to save and then reload the refcounts. Essentially, save the refcount, set the refcount to constant, run sort, set the refcount back to what it was originally.

I don't think introducing the concept of banning captures is something we should do from a language complexity perspective, but the save/restore wouldn't be thread safe, right?

view this post on Zulip Richard Feldman (Jul 22 2024 at 11:51):

as in, if the host had that roc value shared across multiple threads, potentially having its refcount (try to be) incremented and/or decremented on the other threads, then the save/restore would break those because it would be set to constant

view this post on Zulip Richard Feldman (Jul 22 2024 at 12:00):

We theoretically need to increment the captures to that function for every time the sort function calls compare.

since we know the length of the list before we start sorting, couldn't we increment it once by N up front, where N is the length of the list?

view this post on Zulip Richard Feldman (Jul 22 2024 at 12:00):

(and then decrement by N at the end)

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 15:14):

This is for sorting and is the compare function. It isn't N increments. It is one increment per compare.

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 15:15):

So n log n on average, but an unknown amount in general.

view this post on Zulip Richard Feldman (Jul 22 2024 at 15:26):

ah right :thumbs_up:

view this post on Zulip Richard Feldman (Jul 22 2024 at 15:31):

so far it seems like doing the refcount increment/decrement each time is the best option to me, although it is definitely unfortunate for performance

view this post on Zulip Richard Feldman (Jul 22 2024 at 15:31):

at least in the case where you actually capture something

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 15:40):

Yeah, I guess this is a case where I should have zig generate two version of the code to also get the conditional out of the hot loop. Though the conditional would be 100% predictable, so maybe not a big deal.

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 15:56):

One other related topic (that maybe we need to collect more data on before we decide), but we could technically have zig generate n different functions at comptime to dispatch across things like element width.

So code bloat that will be garbage collected by llvm for perf gains.

I think this makes solid sense for sorting cause after a certain element width it will become more economical to create a list of pointers and sort that instead of sorting the original list. So we just need to support element width up to that size.

I'm not sure the perf cost of an indirection to call the copy function (I'm also not sure if llvm may also be smart enough to realize the function is constant, propagate it down the call stack, and inline it), but from looking at the implementation for fluxsort, they mention that using indirection to call the compare function when sorting primitives like integers costs about 2x in terms of perf. Given copy is called more often than cmp, I would assume it has an even higher perf cost. Will definitely depend on whether or not llvm is smart enough to optimize it.

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 16:23):

Also, I am thinking of porting blitsort to zig as our new base sorting algorithm. It is very similar to glidesort overall (not quite inplace, quicksort mixed with other sorting algorithm to deal with bad cases), but simpler and written is very clean c. Also glidesort makes a number of rust specific tradeoffs to make it easier to support panic safety. I don't think panic safety is worth it (nor does the glidesort author really, costs 15 to 20% of perf). If a compare function in roc passed to a sorting algorithm panics, I think it is fine to just accept the list may be in a broken state.

Otherwise, blitsort and glidesort are both based off of fluxsort and seem to have very similar perf and default memory characteristics. glidesort theoretically can reach a higher speed on super new hardware with more execution and memory channels due to some interleaving, but that doesn't seem worth the complexity.

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 16:25):

Also debated porting good old timsort, but it is significantly slower than glidesort and blitsort in many cases. For example, glidesort can be 2-4x faster than the standard rust sort (which is timsort).

view this post on Zulip Richard Feldman (Jul 22 2024 at 16:32):

I like both of those ideas!

view this post on Zulip Richard Feldman (Jul 22 2024 at 16:33):

I also agree that a totally predictable condition is fine to leave in :+1:

view this post on Zulip Brendan Hansknecht (Jul 22 2024 at 16:37):

Also, did a quick check of llvm ir, at least for the simple case of a short app that only sorts, it looks to be inlining the compare and copy functions.

Of course we are at the mercy of llvm inlining and constant propagation heuristics, but a good base note.

view this post on Zulip Brendan Hansknecht (Jul 23 2024 at 02:00):

I just realized that blitsort uses goto. That might make it hard to translate into zig (given zig doesn't support goto).

That said, most of it looks like it should translate ok, just not sure the perf impacts.

view this post on Zulip Brendan Hansknecht (Jul 23 2024 at 02:11):

A handful of it looks like it can be replaced with loops and labelled breaks.

Some of it looks like it will require move some code into a function.
The function would be inlined in multiple places in the same wrapper function. So may lead to duplicate code if llvm doesn't realize it can merge the duplicate code blocks and insert jumps. Otherwise, if the functions don't inline, it is the cost of an extra function call in the middle of some code. So worst case, I think I would force inlining and llvm may duplicate some code, but that is probably fine.


Last updated: Jul 06 2025 at 12:14 UTC