float ordering algorithms · ideas

it appears that compare x y == Same optimizes to a single-instruction "compare if these two floats are equal, but while treating NaNs as equal" operation 🤨

Brendan Hansknecht (Mar 16 2022 at 02:02):

Brendan Hansknecht (Mar 16 2022 at 02:03):

Did a c++ bench run of the 2 different versions. (with clang and checking it generated the same instructions as rust)

Richard Feldman (Mar 16 2022 at 02:03):

wait, so compare x y == Same actually optimizes to code that runs faster than if you did the traditional floating point equality check which returns false for two NaNs?

Brendan Hansknecht (Mar 16 2022 at 02:04):

Richard Feldman (Mar 16 2022 at 02:04):

Derek Gustafson (Mar 16 2022 at 02:04):

Your compare function matches the C cmp function for binary search. Probably been hardware support for that for decades.

Richard Feldman (Mar 16 2022 at 02:04):

so that means there would be probably never be demand for a builtin which does the traditional "return false for two NaNs" case, yeah?

Richard Feldman (Mar 16 2022 at 02:05):

Derek Gustafson (Mar 16 2022 at 02:06):

Richard Feldman (Mar 16 2022 at 02:06):

Derek Gustafson (Mar 16 2022 at 02:06):

Derek Gustafson (Mar 16 2022 at 02:07):

But, I'm not sure that's enough of a downside to warrant breaking this fairly elegant design/performance matchup

Richard Feldman (Mar 16 2022 at 02:07):

Richard Feldman (Mar 16 2022 at 02:08):

Derek Gustafson (Mar 16 2022 at 02:09):

Richard Feldman (Mar 16 2022 at 02:11):

Brendan Hansknecht (Mar 16 2022 at 02:12):

Richard Feldman (Mar 16 2022 at 02:14):

Richard Feldman (Mar 16 2022 at 02:15):

I think the only unknown is how painful it will make testing to have == not work on nested structures that want to contain a float or two

Richard Feldman (Mar 16 2022 at 02:17):

I wonder if it makes sense to patch atan2, div, and mod to work the same way for 0 and -0 given that it turns out the existence of -0 means that x == y cannot guarantee f x == f y anyway due to binary serialization :thinking:

Derek Gustafson (Mar 16 2022 at 02:17):

Richard Feldman (Mar 16 2022 at 02:19):

Brendan Hansknecht (Mar 16 2022 at 02:20):

Oh, but we actually kinda still have a problem. NaN will be the same with every number.

Richard Feldman (Mar 16 2022 at 02:20):

Richard Feldman (Mar 16 2022 at 02:26):

fn compare(x: f64, y: f64) -> i8 {
    let is_gt = (x > y) as i8; // --> 1
    let is_lt = (x < y) as i8; // --> -1
    let is_x_nan = x.is_nan() as i8;
    let is_y_nan = y.is_nan() as i8;

    is_y_nan - is_x_nan + is_gt - is_lt
}

Richard Feldman (Mar 16 2022 at 02:26):

Richard Feldman (Mar 16 2022 at 02:27):

that means compare x y == Same no longer gets you the single-instruction version

Richard Feldman (Mar 16 2022 at 02:27):

Richard Feldman (Mar 16 2022 at 02:46):

branchless version of compare that has NaN compare to itself as Same, and sorts earlier than everything else:

Richard Feldman (Mar 16 2022 at 02:46):

        ucomisd xmm0, xmm1
        seta    sil
        ucomisd xmm1, xmm0
        seta    dl
        ucomisd xmm0, xmm0
        setp    cl
        ucomisd xmm1, xmm1
        setp    al
        sub     al, cl
        add     al, sil
        sub     al, dl
        ret

Brendan Hansknecht (Mar 16 2022 at 02:51):

Richard Feldman (Mar 16 2022 at 02:52):

Brendan Hansknecht (Mar 16 2022 at 02:55):

Brendan Hansknecht (Mar 16 2022 at 02:56):

Richard Feldman (Mar 16 2022 at 02:56):

    if x.is_nan() || y.is_nan() {
        let is_gt = (x > y) as i8; // ---> 1
        let is_lt = (x < y) as i8; // ---> -1
        let is_x_nan = x.is_nan() as i8;
        let is_y_nan = y.is_nan() as i8;

        is_y_nan - is_x_nan + is_gt - is_lt
    } else {
        if x == y {
            0
        } else if x > y {
            1
        } else {
            -1
        }
    }

Richard Feldman (Mar 16 2022 at 02:58):

huh, interestingly negating the condition and switching the branches actually optimizes to what appears to be better code in the case where neither is NaN:

    if !(x.is_nan() || y.is_nan()) {
        if x == y {
            0
        } else if x > y {
            1
        } else {
            -1
        }
    } else {
        let is_gt = (x > y) as i8; // ---> 1
        let is_lt = (x < y) as i8; // ---> -1
        let is_x_nan = x.is_nan() as i8;
        let is_y_nan = y.is_nan() as i8;

        is_y_nan - is_x_nan + is_gt - is_lt
    }

Richard Feldman (Mar 16 2022 at 02:59):

        ucomisd xmm0, xmm0
        jp      .LBB4_4
        ucomisd xmm1, xmm1
        jp      .LBB4_4
        ucomisd xmm0, xmm1
        jne     .LBB4_6
        jp      .LBB4_6
        xor     eax, eax
        ret
.LBB4_4:
        ucomisd xmm1, xmm1
        setp    al
        ucomisd xmm0, xmm0
        setp    cl
        ucomisd xmm0, xmm1
        seta    dl
        ucomisd xmm1, xmm0
        seta    sil
        sub     al, cl
        add     al, dl
        sub     al, sil
        ret
.LBB4_6:
        seta    al
        add     al, al
        add     al, -1
        ret

        ucomisd xmm0, xmm0
        jp      .LBB3_2
        ucomisd xmm1, xmm1
        jp      .LBB3_2
        mov     al, 1
        ucomisd xmm0, xmm1
        jne     .LBB3_4
        jp      .LBB3_4
        ret
.LBB3_2:
        ucomisd xmm0, xmm0
        setp    sil
        ucomisd xmm0, xmm1
        seta    cl
        ucomisd xmm1, xmm0
        seta    dl
        ucomisd xmm1, xmm1
        setp    al
        sub     al, sil
        add     al, cl
        cmp     al, dl
        sete    al
        ret
.LBB3_4:
        xor     eax, eax
        ret

Brendan Hansknecht (Mar 16 2022 at 03:05):

Brendan Hansknecht (Mar 16 2022 at 03:06):

I guess I should probably modify the benchmark to actually generate the full comparison in all case.

Brendan Hansknecht (Mar 16 2022 at 03:16):

Richard Feldman (Mar 16 2022 at 03:20):

Richard Feldman (Mar 16 2022 at 03:21):

here's an interesting thought: if the design is "NaNs always sort to the front" (or to the back, if that's somehow faster; either way seems fine to me, really) what if we let NaNs have nonsensical comparisons?

Richard Feldman (Mar 16 2022 at 03:22):

e.g. try to make it be that if either value is NaN we return -1 or something like that

(edit: wait, that would incorrectly order a bunch of things that aren't NaNs :laughing: )

Richard Feldman (Mar 16 2022 at 03:23):

so the sorting algorithms would slow down in the presence of NaNs because they'd do way more swaps than necessary, but they'd still get a predictable answer

Richard Feldman (Mar 16 2022 at 03:23):

but maybe an algorithm like that could have the case where there are no NaNs go much faster?

Brendan Hansknecht (Mar 16 2022 at 03:25):

For Cmp, NaN is always -1.
For all other cases, NaN is less than everything else but equal to itself.

Brendan Hansknecht (Mar 16 2022 at 03:26):

It is just when use full comparison to derive a specific comparison that it has major problems with performance.

Brendan Hansknecht (Mar 16 2022 at 03:28):

Richard Feldman (Mar 16 2022 at 03:29):

Richard Feldman (Mar 16 2022 at 03:30):

Richard Feldman (Mar 16 2022 at 03:31):

Brendan Hansknecht (Mar 16 2022 at 03:35):

intentionally so. I think that is how it would get implemented in c++ or similar.

Though I guess in reality most sorting algorithms would just use the comparison operators directly where they would just get false if they saw a nan.

Richard Feldman (Mar 16 2022 at 03:36):

(moved this into its own thread so others can catch up on the design stuff more easily!)

Richard Feldman (Mar 16 2022 at 03:37):

Richard Feldman (Mar 16 2022 at 03:38):

I guess the idea is that the non-NaN value would get compared to its other neighbor and move up?

Brendan Hansknecht (Mar 16 2022 at 03:39):

It heavily depends on the algorithm used for sorting. If you used quicksort, is a nan was a pivot, nothing would move. If anything else was a pivot, all nans would move.

Richard Feldman (Mar 16 2022 at 03:46):

Brendan Hansknecht (Mar 16 2022 at 03:46):

Actually that is wrong. You would be checking if cmp pivot num == 1 and moving elements if that is true. So that would just never swap nans. Which I guess would make them sort to the end of the list because they never move.

If you use cmp num pivot == -1 instead. nans will always move. So they will be sorted to the beginning of the list.
-> which is actually really broken if nan ever is the pivot because it will move every single element to the other side of it. So all nans that are pivots will be sorted the end of a list and all nans that aren't pivots will be sorted to the begining. So yeah, broken.

Richard Feldman (Mar 16 2022 at 03:46):

Brendan Hansknecht (Mar 16 2022 at 03:47):

So I think this generally works if you are specifically checking only < or only checking >, but if your algorithm has more checks, it may break due to nans being inconsistent.

Richard Feldman (Mar 16 2022 at 03:52):

maybe a better design would be making the default Ordering be consistent (but slower even if there are no NaNs) and then if you're confident your floats contain no NaNs, you can use sortWith and provide a custom compare that does if x > y { ... } else if x < y { ... } else { ... }

Richard Feldman (Mar 16 2022 at 03:52):

so basically assuming the else branch could only possibly be "they are equal" - which will be fine as long as you have no NaNs, and potentially break in exciting ways otherwise :stuck_out_tongue:

Brendan Hansknecht (Mar 16 2022 at 03:54):

if(x==y){
  cmp = 0;
} else if (x > y || std::isnan(y)) {
  cmp = 1;
} else {
  cmp = 1;
}

Richard Feldman (Mar 16 2022 at 03:54):

Brendan Hansknecht (Mar 16 2022 at 04:02):

Brendan Hansknecht (Mar 16 2022 at 04:06):

Brendan Hansknecht (Mar 16 2022 at 04:07):

So it is slower than NaN always resulting in same/equality but marginally better than what we had before with NaN being equal to NaN. About the same speed as NaN always being -1, but not broken liken NaN always resulting in -1.

Brendan Hansknecht (Mar 16 2022 at 04:09):

If you can guarantee that you don't have NaN, it is still a comparision cost of 2x. But for most sorting algorithms that probably doesn't matter. The cost to move memory is probably way higher.

Brendan Hansknecht (Mar 16 2022 at 04:09):

Richard Feldman (Mar 16 2022 at 04:11):

Richard Feldman (Mar 16 2022 at 04:13):

seems worth considering either that or the slowest but least error-prone as a default, especially considering there's really no way to prevent someone from using sortWith on a custom "fastest but breaks if there are NaNs" implementation

Stream: ideas

Topic: float ordering algorithms

Richard Feldman (Mar 16 2022 at 01:57):

Richard Feldman (Mar 16 2022 at 01:58):

Brendan Hansknecht (Mar 16 2022 at 02:02):

Brendan Hansknecht (Mar 16 2022 at 02:03):

Richard Feldman (Mar 16 2022 at 02:03):

Richard Feldman (Mar 16 2022 at 02:03):

Brendan Hansknecht (Mar 16 2022 at 02:04):

Richard Feldman (Mar 16 2022 at 02:04):

Derek Gustafson (Mar 16 2022 at 02:04):

Richard Feldman (Mar 16 2022 at 02:04):

Richard Feldman (Mar 16 2022 at 02:05):

Derek Gustafson (Mar 16 2022 at 02:06):

Richard Feldman (Mar 16 2022 at 02:06):

Derek Gustafson (Mar 16 2022 at 02:06):

Derek Gustafson (Mar 16 2022 at 02:07):

Richard Feldman (Mar 16 2022 at 02:07):

Richard Feldman (Mar 16 2022 at 02:08):

Richard Feldman (Mar 16 2022 at 02:08):

Richard Feldman (Mar 16 2022 at 02:08):

Derek Gustafson (Mar 16 2022 at 02:09):

Richard Feldman (Mar 16 2022 at 02:11):

Brendan Hansknecht (Mar 16 2022 at 02:12):

Richard Feldman (Mar 16 2022 at 02:14):

Richard Feldman (Mar 16 2022 at 02:15):

Richard Feldman (Mar 16 2022 at 02:17):

Derek Gustafson (Mar 16 2022 at 02:17):

Richard Feldman (Mar 16 2022 at 02:19):

Brendan Hansknecht (Mar 16 2022 at 02:20):

Richard Feldman (Mar 16 2022 at 02:20):

Richard Feldman (Mar 16 2022 at 02:26):

Richard Feldman (Mar 16 2022 at 02:26):

Richard Feldman (Mar 16 2022 at 02:27):

Richard Feldman (Mar 16 2022 at 02:27):

Richard Feldman (Mar 16 2022 at 02:27):

Richard Feldman (Mar 16 2022 at 02:46):

Richard Feldman (Mar 16 2022 at 02:46):

Brendan Hansknecht (Mar 16 2022 at 02:51):

Brendan Hansknecht (Mar 16 2022 at 02:51):

Richard Feldman (Mar 16 2022 at 02:52):

Brendan Hansknecht (Mar 16 2022 at 02:55):

Brendan Hansknecht (Mar 16 2022 at 02:56):

Richard Feldman (Mar 16 2022 at 02:56):

Richard Feldman (Mar 16 2022 at 02:58):

Richard Feldman (Mar 16 2022 at 02:59):

Brendan Hansknecht (Mar 16 2022 at 03:05):

Brendan Hansknecht (Mar 16 2022 at 03:06):

Brendan Hansknecht (Mar 16 2022 at 03:16):

Brendan Hansknecht (Mar 16 2022 at 03:16):

Richard Feldman (Mar 16 2022 at 03:20):

Richard Feldman (Mar 16 2022 at 03:21):

Richard Feldman (Mar 16 2022 at 03:22):

Richard Feldman (Mar 16 2022 at 03:22):

Richard Feldman (Mar 16 2022 at 03:23):

Richard Feldman (Mar 16 2022 at 03:23):

Brendan Hansknecht (Mar 16 2022 at 03:25):

Brendan Hansknecht (Mar 16 2022 at 03:26):

Brendan Hansknecht (Mar 16 2022 at 03:26):

Brendan Hansknecht (Mar 16 2022 at 03:28):

Richard Feldman (Mar 16 2022 at 03:29):

Richard Feldman (Mar 16 2022 at 03:30):

Richard Feldman (Mar 16 2022 at 03:30):

Richard Feldman (Mar 16 2022 at 03:31):

Brendan Hansknecht (Mar 16 2022 at 03:35):

Richard Feldman (Mar 16 2022 at 03:36):

Richard Feldman (Mar 16 2022 at 03:37):

Richard Feldman (Mar 16 2022 at 03:37):

Richard Feldman (Mar 16 2022 at 03:38):

Brendan Hansknecht (Mar 16 2022 at 03:39):

Richard Feldman (Mar 16 2022 at 03:46):

Brendan Hansknecht (Mar 16 2022 at 03:46):

Richard Feldman (Mar 16 2022 at 03:46):

Brendan Hansknecht (Mar 16 2022 at 03:47):

Richard Feldman (Mar 16 2022 at 03:52):

Richard Feldman (Mar 16 2022 at 03:52):

Brendan Hansknecht (Mar 16 2022 at 03:54):

Richard Feldman (Mar 16 2022 at 03:54):

Richard Feldman (Mar 16 2022 at 03:54):

Brendan Hansknecht (Mar 16 2022 at 04:02):