Stream: ideas

Topic: float ordering algorithms


view this post on Zulip Richard Feldman (Mar 16 2022 at 01:57):

ok this is interesting: https://godbolt.org/z/Y4Gdonf7d

view this post on Zulip Richard Feldman (Mar 16 2022 at 01:58):

it appears that compare x y == Same optimizes to a single-instruction "compare if these two floats are equal, but while treating NaNs as equal" operation 🤨

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:02):

Yeah, allowing NaN == NaN is faster.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:03):

Did a c++ bench run of the 2 different versions. (with clang and checking it generated the same instructions as rust)

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:03):

nice!

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:03):

wait, so compare x y == Same actually optimizes to code that runs faster than if you did the traditional floating point equality check which returns false for two NaNs?

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:04):

yes

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:04):

wooooow

view this post on Zulip Derek Gustafson (Mar 16 2022 at 02:04):

Your compare function matches the C cmp function for binary search. Probably been hardware support for that for decades.

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:04):

so that means there would be probably never be demand for a builtin which does the traditional "return false for two NaNs" case, yeah?

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:05):

(except for the obligatory "I want to port code that relies on that behavior" use case)

view this post on Zulip Derek Gustafson (Mar 16 2022 at 02:06):

I knew what you were linking to without even needing to look

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:06):

it's a classic! :grinning_face_with_smiling_eyes:

view this post on Zulip Derek Gustafson (Mar 16 2022 at 02:06):

You'll probably get complaints that Roc isn't IEEE compliant

view this post on Zulip Derek Gustafson (Mar 16 2022 at 02:07):

But, I'm not sure that's enough of a downside to warrant breaking this fairly elegant design/performance matchup

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:07):

it actually is though!

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:08):

all the supported operations conform to IEEE-754

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:08):

we never say NaN is equal to another NaN

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:08):

we just sort them to the same places

view this post on Zulip Derek Gustafson (Mar 16 2022 at 02:09):

okay. missed the subtelty in that argument

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:11):

it's very pedantic :laughing:

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:12):

I find this quite reasonable

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:14):

it certainly rules out all the error-prone cases

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:15):

I think the only unknown is how painful it will make testing to have == not work on nested structures that want to contain a float or two

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:17):

I wonder if it makes sense to patch atan2, div, and mod to work the same way for 0 and -0 given that it turns out the existence of -0 means that x == y cannot guarantee f x == f y anyway due to binary serialization :thinking:

view this post on Zulip Derek Gustafson (Mar 16 2022 at 02:17):

Probably less painful than a test failing because the LSB is off by 1

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:19):

fair point!

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:20):

Oh, but we actually kinda still have a problem. NaN will be the same with every number.

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:20):

:joy:

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:26):

ok so then I think it has to be this:

fn compare(x: f64, y: f64) -> i8 {
    let is_gt = (x > y) as i8; // --> 1
    let is_lt = (x < y) as i8; // --> -1
    let is_x_nan = x.is_nan() as i8;
    let is_y_nan = y.is_nan() as i8;

    is_y_nan - is_x_nan + is_gt - is_lt
}

(edit: made it branchless!)

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:26):

so then NaN sorts to lower than anything else

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:27):

that means compare x y == Same no longer gets you the single-instruction version

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:27):

but !(x < y || x > y) still does

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:27):

so it's still achievable if you really need it

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:46):

branchless version of compare that has NaN compare to itself as Same, and sorts earlier than everything else:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=402d8026688b239b3eb8da2fa471f849

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:46):

assembly:

        ucomisd xmm0, xmm1
        seta    sil
        ucomisd xmm1, xmm0
        seta    dl
        ucomisd xmm0, xmm0
        setp    cl
        ucomisd xmm1, xmm1
        setp    al
        sub     al, cl
        add     al, sil
        sub     al, dl
        ret

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:51):

We probably don't want it to be branchless unless we expect NaN to be common.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:51):

https://www.quick-bench.com/q/9UEMY26B2QJMC_tPrWhxJx-lIm0

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:52):

hm, true

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:55):

The above link is slightly off and slower than it should be for the branchless vesion. This is correct:
https://www.quick-bench.com/q/sDiORDcEEjBhv3DXYGKE8f9o_nc

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 02:56):

Still slow but not as slow

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:56):

    if x.is_nan() || y.is_nan() {
        let is_gt = (x > y) as i8; // ---> 1
        let is_lt = (x < y) as i8; // ---> -1
        let is_x_nan = x.is_nan() as i8;
        let is_y_nan = y.is_nan() as i8;

        is_y_nan - is_x_nan + is_gt - is_lt
    } else {
        if x == y {
            0
        } else if x > y {
            1
        } else {
            -1
        }
    }

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:58):

huh, interestingly negating the condition and switching the branches actually optimizes to what appears to be better code in the case where neither is NaN:

    if !(x.is_nan() || y.is_nan()) {
        if x == y {
            0
        } else if x > y {
            1
        } else {
            -1
        }
    } else {
        let is_gt = (x > y) as i8; // ---> 1
        let is_lt = (x < y) as i8; // ---> -1
        let is_x_nan = x.is_nan() as i8;
        let is_y_nan = y.is_nan() as i8;

        is_y_nan - is_x_nan + is_gt - is_lt
    }

view this post on Zulip Richard Feldman (Mar 16 2022 at 02:59):

        ucomisd xmm0, xmm0
        jp      .LBB4_4
        ucomisd xmm1, xmm1
        jp      .LBB4_4
        ucomisd xmm0, xmm1
        jne     .LBB4_6
        jp      .LBB4_6
        xor     eax, eax
        ret
.LBB4_4:
        ucomisd xmm1, xmm1
        setp    al
        ucomisd xmm0, xmm0
        setp    cl
        ucomisd xmm0, xmm1
        seta    dl
        ucomisd xmm1, xmm0
        seta    sil
        sub     al, cl
        add     al, dl
        sub     al, sil
        ret
.LBB4_6:
        seta    al
        add     al, al
        add     al, -1
        ret

vs

        ucomisd xmm0, xmm0
        jp      .LBB3_2
        ucomisd xmm1, xmm1
        jp      .LBB3_2
        mov     al, 1
        ucomisd xmm0, xmm1
        jne     .LBB3_4
        jp      .LBB3_4
        ret
.LBB3_2:
        ucomisd xmm0, xmm0
        setp    sil
        ucomisd xmm0, xmm1
        seta    cl
        ucomisd xmm1, xmm0
        seta    dl
        ucomisd xmm1, xmm1
        setp    al
        sub     al, sil
        add     al, cl
        cmp     al, dl
        sete    al
        ret
.LBB3_4:
        xor     eax, eax
        ret

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:05):

When using those to generate equality, it isn't great:
https://www.quick-bench.com/q/JdET2lFO_EURwMX4EES1RgQmvYE

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:06):

I guess I should probably modify the benchmark to actually generate the full comparison in all case.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:16):

Ok. Full comparison makes a big difference here. This final version is the closest to regular comparison: https://www.quick-bench.com/q/HoelTlk5T2tAagguaax6TASi4ho

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:16):

but still 2.5x slower

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:20):

that's unfortunate

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:21):

here's an interesting thought: if the design is "NaNs always sort to the front" (or to the back, if that's somehow faster; either way seems fine to me, really) what if we let NaNs have nonsensical comparisons?

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:22):

could that help somehow?

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:22):

e.g. try to make it be that if either value is NaN we return -1 or something like that

(edit: wait, that would incorrectly order a bunch of things that aren't NaNs :laughing: )

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:23):

so the sorting algorithms would slow down in the presence of NaNs because they'd do way more swaps than necessary, but they'd still get a predictable answer

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:23):

but maybe an algorithm like that could have the case where there are no NaNs go much faster?

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:25):

Ok here is the benchmark of full comparison.

For Cmp, NaN is always -1.
For all other cases, NaN is less than everything else but equal to itself.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:26):

When looking at full comparison, the overhead can be pretty low

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:26):

It is just when use full comparison to derive a specific comparison that it has major problems with performance.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:28):

so the sorting algorithms would slow down in the presence of NaNs because they'd do way more swaps than necessary, but they'd still get a predictable answer

I think this is what would happen in most languages rn, but maybe I am wrong.

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:29):

I wonder if they would ever get stuck in an infinite loop :thinking:

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:30):

oh that algorithm actually causes a problem though

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:30):

because not only does cmp(NaN, 1.0) return -1, so does cmp(1.0, NaN)

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:31):

so I think the else branch would need to do a nan check

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:35):

because not only does cmp(NaN, 1.0) return -1, so does cmp(1.0, NaN)

intentionally so. I think that is how it would get implemented in c++ or similar.

Though I guess in reality most sorting algorithms would just use the comparison operators directly where they would just get false if they saw a nan.

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:36):

(moved this into its own thread so others can catch up on the design stuff more easily!)

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:37):

would the NaNs actually get sorted to the front then?

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:37):

if they always return -1 regardless of which way the comparison goes

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:38):

I guess the idea is that the non-NaN value would get compared to its other neighbor and move up?

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:39):

It heavily depends on the algorithm used for sorting. If you used quicksort, is a nan was a pivot, nothing would move. If anything else was a pivot, all nans would move.

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:46):

hrm, yeah that's definitely a problem :sweat_smile:

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:46):

Actually that is wrong. You would be checking if cmp pivot num == 1 and moving elements if that is true. So that would just never swap nans. Which I guess would make them sort to the end of the list because they never move.

If you use cmp num pivot == -1 instead. nans will always move. So they will be sorted to the beginning of the list.
-> which is actually really broken if nan ever is the pivot because it will move every single element to the other side of it. So all nans that are pivots will be sorted the end of a list and all nans that aren't pivots will be sorted to the begining. So yeah, broken.

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:46):

:thumbs_up:

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:47):

So I think this generally works if you are specifically checking only < or only checking >, but if your algorithm has more checks, it may break due to nans being inconsistent.

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:52):

maybe a better design would be making the default Ordering be consistent (but slower even if there are no NaNs) and then if you're confident your floats contain no NaNs, you can use sortWith and provide a custom compare that does if x > y { ... } else if x < y { ... } else { ... }

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:52):

so basically assuming the else branch could only possibly be "they are equal" - which will be fine as long as you have no NaNs, and potentially break in exciting ways otherwise :stuck_out_tongue:

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 03:54):

What if this was the default:

if(x==y){
  cmp = 0;
} else if (x > y || std::isnan(y)) {
  cmp = 1;
} else {
  cmp = 1;
}

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:54):

how's the performance on that?

view this post on Zulip Richard Feldman (Mar 16 2022 at 03:54):

(also, shouldn't that be -1 at the end?)

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 04:02):

Yep. Hopefully that doesn't ruin the performance...

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 04:06):

https://www.quick-bench.com/q/SGd6RHzRVTyFY8AlPGzZijI4ILE

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 04:07):

So it is slower than NaN always resulting in same/equality but marginally better than what we had before with NaN being equal to NaN. About the same speed as NaN always being -1, but not broken liken NaN always resulting in -1.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 04:09):

If you can guarantee that you don't have NaN, it is still a comparision cost of 2x. But for most sorting algorithms that probably doesn't matter. The cost to move memory is probably way higher.

view this post on Zulip Brendan Hansknecht (Mar 16 2022 at 04:09):

But in a hot loop, would be expensive.

view this post on Zulip Richard Feldman (Mar 16 2022 at 04:11):

cool! :thumbs_up:

view this post on Zulip Richard Feldman (Mar 16 2022 at 04:13):

seems worth considering either that or the slowest but least error-prone as a default, especially considering there's really no way to prevent someone from using sortWith on a custom "fastest but breaks if there are NaNs" implementation


Last updated: Jun 16 2026 at 16:19 UTC