Sort ability for floats · compiler development

Stream: compiler development

Topic: Sort ability for floats

Ben Plotke (Jun 03 2024 at 17:13):

After speaking with @Jasper Woudenberg , I decided to try and implement the Sort ability for floats. But quickly after starting, I realized there isn't a clear good way to do this. I made a list of strategies below, but was hoping to get some guidance from someone with expertise in the subject.

Strategies I am aware of:

Haskell's, implement compare in terms of < and ==, which always return false if either operand is a NaN. This resulted in compare always returning GT when there is a NaN involved. I am strongly apposed to this approach because, as the Haskell source code notes, this violates reflexivity which can result in data corruption.
Java's, if one or both operands are a NaN, do a raw bit comparison. If I understand correct, this will put all positive NaNs after positive infinity and all negative NaNs between positive and negative zero. I do not favor this approach because it puts the NaNs in two very different spots based on properties the vast majority of users will not care about.
C#'s, consider NaNs to be the smallest values. This strikes me as an acceptable approach.
Rust's, define a partial order type which float is a member. We could define PartialSort implements partialCompare: a, a -> Result [LessThan, Equivalent, GreaterThan] [Unorderable]. I like this approach because, a) it is honest about the fact that not all floats are comparable, and b), a PartialSort ability is likely useful for other types. I do not like that it will likely make working with floats more awkward.
IEEE-754-2008's totalOrder function, this total ordering considers -NaN to be the smallest number and +Nan to be the greatest number. I do not favor this approach because it splits the NaNs based on a property most users do not care about.

My instinct would be to go with the C# strategy and just make NaNs the smallest, though Rust's partial order approach strikes me as more in line with the Roc ethos.

Let me know what y'all think.

Richard Feldman (Jun 03 2024 at 17:47):

great question! We talked about this before:

Richard Feldman said:

so if we have a default List.sort for be floats, and a default compare for floats, and they both sort NaNs to the edge...what would be the situation where someone writes their own sorting algorithm using < instead of compare?

I think what we want here is:

all NaNs end up sorted to either the beginning or end
whatever comparison has the best performance, given that one constraint

Richard Feldman (Jun 03 2024 at 17:48):

so I think C#'s approach makes sense, although it would also be fine to consider NaN to be the largest value (instead of the smallest) if it turns out that's better for performance than treating it as the smallest

Richard Feldman (Jun 03 2024 at 17:49):

I assume it's probably the same perf either way

Ben Plotke (Jun 04 2024 at 12:44):

Thanks! That completely clarifies

Ben Plotke (Jun 07 2024 at 04:29):

I have another implementation question. Which of the two below options would be preferable from a performance perspective? I know a few more instructions can be preferable to branching, but I do not have a feel for how many instructions.
a) A function with if-elses. This has a lot of branches, but the top two cases are the common ones. (I am guessing this does not actually need to be a function, but I am completely new to LLVM and compilers, so I would need some pointers on how to do that.)

float_sort(a, b)
    if a < b
        return 2
    if a > b
        return 1
    if a == b
        return 0
    if b == b  // a is NaN, so smaller
        return 2
    if a == a
        return 1
    return 0

b) This expression (a == a) + 2*(b == b) - (a < b) - 2*(a > b) - 3*(a == b). This has no branches, but it is a lot of instructions. (It would also warrant a comment with a table demonstrating the logic)

Brendan Hansknecht (Jun 07 2024 at 04:31):

Almost certainly branchless will be better for average performance. That said, the top implement may turn into cmovs anyway, though probably not due to returns. But if you assign that if back to a variable, probably would cmov and be quite good...though not fully sure.

Brendan Hansknecht (Jun 07 2024 at 04:31):

Other way, I would bet on the expression being the better option

Last updated: Aug 17 2025 at 12:14 UTC