With List.sortWith
, we pass a compare function to zig.
The compare function could be capturing closure data. Which I think will actually be quite problematic.
We theoretically need to increment the captures to that function for every time the sort function calls compare.
Currently we just increment it for every element in the list.
To instead increment it for every call to the function, the only way would be to put the increment into the hot loop for sorting. I think that is a terrible idea.
I think we should either figure out a way to ban captures for the compare function or find a way to save and then reload the refcounts. Essentially, save the refcount, set the refcount to constant, run sort, set the refcount back to what it was originally.
Thoughts?
I'm assuming this doesn't cause problems today cause in practice compare functions for lists never capture anything.
For example worst case of quicksort is n^2
comparisons. Adding n^2
refcount increments to sorting would be brutal.
ban captures for the compare function
Would this prevent us from using curried functions?
Maybe this is a silly example...
module []
Color : [Red, Green, Blue]
priority : Color -> U64
priority = \c ->
when c is
Red -> 1
Green -> 2
Blue -> 3
sortFn = \priorityFn -> \left,right ->
pl = priorityFn left
pr = priorityFn right
if pl < pr then
LT
else if pl > pr then
GT
else
EQ
expect
input = [Red, Blue, Green]
actual = input |> List.sortWith (sortFn priority)
expected = [Red, Green, Blue]
actual == expected # true
Yeah, it probably too restrictive. So probably need to do the constant thing.
If you made priorityFn
instead be something refcounted, like a list, it should break in current roc due to a double free.
Brendan Hansknecht said:
I think we should either figure out a way to ban captures for the compare function or find a way to save and then reload the refcounts. Essentially, save the refcount, set the refcount to constant, run sort, set the refcount back to what it was originally.
I don't think introducing the concept of banning captures is something we should do from a language complexity perspective, but the save/restore wouldn't be thread safe, right?
as in, if the host had that roc value shared across multiple threads, potentially having its refcount (try to be) incremented and/or decremented on the other threads, then the save/restore would break those because it would be set to constant
We theoretically need to increment the captures to that function for every time the sort function calls compare.
since we know the length of the list before we start sorting, couldn't we increment it once by N up front, where N is the length of the list?
(and then decrement by N at the end)
This is for sorting and is the compare function. It isn't N increments. It is one increment per compare.
So n log n on average, but an unknown amount in general.
ah right :thumbs_up:
so far it seems like doing the refcount increment/decrement each time is the best option to me, although it is definitely unfortunate for performance
at least in the case where you actually capture something
Yeah, I guess this is a case where I should have zig generate two version of the code to also get the conditional out of the hot loop. Though the conditional would be 100% predictable, so maybe not a big deal.
One other related topic (that maybe we need to collect more data on before we decide), but we could technically have zig generate n
different functions at comptime to dispatch across things like element width.
So code bloat that will be garbage collected by llvm for perf gains.
I think this makes solid sense for sorting cause after a certain element width it will become more economical to create a list of pointers and sort that instead of sorting the original list. So we just need to support element width up to that size.
I'm not sure the perf cost of an indirection to call the copy function (I'm also not sure if llvm may also be smart enough to realize the function is constant, propagate it down the call stack, and inline it), but from looking at the implementation for fluxsort
, they mention that using indirection to call the compare function when sorting primitives like integers costs about 2x in terms of perf. Given copy is called more often than cmp, I would assume it has an even higher perf cost. Will definitely depend on whether or not llvm is smart enough to optimize it.
Also, I am thinking of porting blitsort to zig as our new base sorting algorithm. It is very similar to glidesort overall (not quite inplace, quicksort mixed with other sorting algorithm to deal with bad cases), but simpler and written is very clean c
. Also glidesort makes a number of rust specific tradeoffs to make it easier to support panic safety. I don't think panic safety is worth it (nor does the glidesort author really, costs 15 to 20% of perf). If a compare function in roc passed to a sorting algorithm panics, I think it is fine to just accept the list may be in a broken state.
Otherwise, blitsort
and glidesort
are both based off of fluxsort
and seem to have very similar perf and default memory characteristics. glidesort
theoretically can reach a higher speed on super new hardware with more execution and memory channels due to some interleaving, but that doesn't seem worth the complexity.
Also debated porting good old timsort
, but it is significantly slower than glidesort
and blitsort
in many cases. For example, glidesort
can be 2-4x faster than the standard rust sort (which is timsort
).
I like both of those ideas!
I also agree that a totally predictable condition is fine to leave in :+1:
Also, did a quick check of llvm ir, at least for the simple case of a short app that only sorts, it looks to be inlining the compare and copy functions.
Of course we are at the mercy of llvm inlining and constant propagation heuristics, but a good base note.
I just realized that blitsort
uses goto
. That might make it hard to translate into zig (given zig doesn't support goto
).
That said, most of it looks like it should translate ok, just not sure the perf impacts.
A handful of it looks like it can be replaced with loops and labelled breaks.
Some of it looks like it will require move some code into a function.
The function would be inlined in multiple places in the same wrapper function. So may lead to duplicate code if llvm doesn't realize it can merge the duplicate code blocks and insert jumps. Otherwise, if the functions don't inline, it is the cost of an extra function call in the middle of some code. So worst case, I think I would force inlining and llvm may duplicate some code, but that is probably fine.
Last updated: Jul 06 2025 at 12:14 UTC