SortWith Compare Capture Refcounting · compiler development

The compare function could be capturing closure data. Which I think will actually be quite problematic.
We theoretically need to increment the captures to that function for every time the sort function calls compare.
Currently we just increment it for every element in the list.

To instead increment it for every call to the function, the only way would be to put the increment into the hot loop for sorting. I think that is a terrible idea.

I think we should either figure out a way to ban captures for the compare function or find a way to save and then reload the refcounts. Essentially, save the refcount, set the refcount to constant, run sort, set the refcount back to what it was originally.

Brendan Hansknecht (Jul 22 2024 at 04:10):

I'm assuming this doesn't cause problems today cause in practice compare functions for lists never capture anything.

Brendan Hansknecht (Jul 22 2024 at 04:13):

For example worst case of quicksort is n^2 comparisons. Adding n^2 refcount increments to sorting would be brutal.

Luke Boswell (Jul 22 2024 at 04:51):

module []

Color : [Red, Green, Blue]

priority : Color -> U64
priority = \c ->
    when c is
        Red -> 1
        Green -> 2
        Blue -> 3

sortFn = \priorityFn -> \left,right ->
    pl = priorityFn left
    pr = priorityFn right
    if pl < pr then
        LT
    else if pl > pr then
        GT
    else
        EQ

expect
    input = [Red, Blue, Green]
    actual = input |> List.sortWith (sortFn priority)
    expected = [Red, Green, Blue]
    actual == expected # true

Brendan Hansknecht (Jul 22 2024 at 05:00):

Brendan Hansknecht (Jul 22 2024 at 05:01):

If you made priorityFn instead be something refcounted, like a list, it should break in current roc due to a double free.

Richard Feldman (Jul 22 2024 at 11:45):

I don't think introducing the concept of banning captures is something we should do from a language complexity perspective, but the save/restore wouldn't be thread safe, right?

Richard Feldman (Jul 22 2024 at 11:51):

as in, if the host had that roc value shared across multiple threads, potentially having its refcount (try to be) incremented and/or decremented on the other threads, then the save/restore would break those because it would be set to constant

Richard Feldman (Jul 22 2024 at 12:00):

since we know the length of the list before we start sorting, couldn't we increment it once by N up front, where N is the length of the list?

Richard Feldman (Jul 22 2024 at 12:00):

Brendan Hansknecht (Jul 22 2024 at 15:14):

This is for sorting and is the compare function. It isn't N increments. It is one increment per compare.

Brendan Hansknecht (Jul 22 2024 at 15:15):

Richard Feldman (Jul 22 2024 at 15:26):

Richard Feldman (Jul 22 2024 at 15:31):

so far it seems like doing the refcount increment/decrement each time is the best option to me, although it is definitely unfortunate for performance

Richard Feldman (Jul 22 2024 at 15:31):

Brendan Hansknecht (Jul 22 2024 at 15:40):

Yeah, I guess this is a case where I should have zig generate two version of the code to also get the conditional out of the hot loop. Though the conditional would be 100% predictable, so maybe not a big deal.

Brendan Hansknecht (Jul 22 2024 at 15:56):

One other related topic (that maybe we need to collect more data on before we decide), but we could technically have zig generate n different functions at comptime to dispatch across things like element width.

I think this makes solid sense for sorting cause after a certain element width it will become more economical to create a list of pointers and sort that instead of sorting the original list. So we just need to support element width up to that size.

I'm not sure the perf cost of an indirection to call the copy function (I'm also not sure if llvm may also be smart enough to realize the function is constant, propagate it down the call stack, and inline it), but from looking at the implementation for fluxsort, they mention that using indirection to call the compare function when sorting primitives like integers costs about 2x in terms of perf. Given copy is called more often than cmp, I would assume it has an even higher perf cost. Will definitely depend on whether or not llvm is smart enough to optimize it.

Brendan Hansknecht (Jul 22 2024 at 16:23):

Also, I am thinking of porting blitsort to zig as our new base sorting algorithm. It is very similar to glidesort overall (not quite inplace, quicksort mixed with other sorting algorithm to deal with bad cases), but simpler and written is very clean c. Also glidesort makes a number of rust specific tradeoffs to make it easier to support panic safety. I don't think panic safety is worth it (nor does the glidesort author really, costs 15 to 20% of perf). If a compare function in roc passed to a sorting algorithm panics, I think it is fine to just accept the list may be in a broken state.

Otherwise, blitsort and glidesort are both based off of fluxsort and seem to have very similar perf and default memory characteristics. glidesort theoretically can reach a higher speed on super new hardware with more execution and memory channels due to some interleaving, but that doesn't seem worth the complexity.

Brendan Hansknecht (Jul 22 2024 at 16:25):

Also debated porting good old timsort, but it is significantly slower than glidesort and blitsort in many cases. For example, glidesort can be 2-4x faster than the standard rust sort (which is timsort).

Richard Feldman (Jul 22 2024 at 16:32):

Richard Feldman (Jul 22 2024 at 16:33):

Brendan Hansknecht (Jul 22 2024 at 16:37):

Also, did a quick check of llvm ir, at least for the simple case of a short app that only sorts, it looks to be inlining the compare and copy functions.

Of course we are at the mercy of llvm inlining and constant propagation heuristics, but a good base note.

Brendan Hansknecht (Jul 23 2024 at 02:00):

I just realized that blitsort uses goto. That might make it hard to translate into zig (given zig doesn't support goto).

That said, most of it looks like it should translate ok, just not sure the perf impacts.

Brendan Hansknecht (Jul 23 2024 at 02:11):

Some of it looks like it will require move some code into a function.
The function would be inlined in multiple places in the same wrapper function. So may lead to duplicate code if llvm doesn't realize it can merge the duplicate code blocks and insert jumps. Otherwise, if the functions don't inline, it is the cost of an extra function call in the middle of some code. So worst case, I think I would force inlining and llvm may duplicate some code, but that is probably fine.

Stream: compiler development

Topic: SortWith Compare Capture Refcounting

Brendan Hansknecht (Jul 22 2024 at 04:10):

Brendan Hansknecht (Jul 22 2024 at 04:10):

Brendan Hansknecht (Jul 22 2024 at 04:13):

Luke Boswell (Jul 22 2024 at 04:51):

Brendan Hansknecht (Jul 22 2024 at 05:00):

Brendan Hansknecht (Jul 22 2024 at 05:01):

Richard Feldman (Jul 22 2024 at 11:45):

Richard Feldman (Jul 22 2024 at 11:51):

Richard Feldman (Jul 22 2024 at 12:00):

Richard Feldman (Jul 22 2024 at 12:00):

Brendan Hansknecht (Jul 22 2024 at 15:14):

Brendan Hansknecht (Jul 22 2024 at 15:15):

Richard Feldman (Jul 22 2024 at 15:26):

Richard Feldman (Jul 22 2024 at 15:31):

Richard Feldman (Jul 22 2024 at 15:31):

Brendan Hansknecht (Jul 22 2024 at 15:40):

Brendan Hansknecht (Jul 22 2024 at 15:56):

Brendan Hansknecht (Jul 22 2024 at 16:23):

Brendan Hansknecht (Jul 22 2024 at 16:25):

Richard Feldman (Jul 22 2024 at 16:32):

Richard Feldman (Jul 22 2024 at 16:33):

Brendan Hansknecht (Jul 22 2024 at 16:37):

Brendan Hansknecht (Jul 23 2024 at 02:00):

Brendan Hansknecht (Jul 23 2024 at 02:11):