ok this is interesting: https://godbolt.org/z/Y4Gdonf7d
it appears that compare x y == Same optimizes to a single-instruction "compare if these two floats are equal, but while treating NaNs as equal" operation 🤨
Yeah, allowing NaN == NaN is faster.
Did a c++ bench run of the 2 different versions. (with clang and checking it generated the same instructions as rust)
nice!
wait, so compare x y == Same actually optimizes to code that runs faster than if you did the traditional floating point equality check which returns false for two NaNs?
yes
wooooow
Your compare function matches the C cmp function for binary search. Probably been hardware support for that for decades.
so that means there would be probably never be demand for a builtin which does the traditional "return false for two NaNs" case, yeah?
(except for the obligatory "I want to port code that relies on that behavior" use case)
I knew what you were linking to without even needing to look
it's a classic! :grinning_face_with_smiling_eyes:
You'll probably get complaints that Roc isn't IEEE compliant
But, I'm not sure that's enough of a downside to warrant breaking this fairly elegant design/performance matchup
it actually is though!
all the supported operations conform to IEEE-754
we never say NaN is equal to another NaN
we just sort them to the same places
okay. missed the subtelty in that argument
it's very pedantic :laughing:
I find this quite reasonable
it certainly rules out all the error-prone cases
I think the only unknown is how painful it will make testing to have == not work on nested structures that want to contain a float or two
I wonder if it makes sense to patch atan2, div, and mod to work the same way for 0 and -0 given that it turns out the existence of -0 means that x == y cannot guarantee f x == f y anyway due to binary serialization :thinking:
Probably less painful than a test failing because the LSB is off by 1
fair point!
Oh, but we actually kinda still have a problem. NaN will be the same with every number.
:joy:
ok so then I think it has to be this:
fn compare(x: f64, y: f64) -> i8 {
let is_gt = (x > y) as i8; // --> 1
let is_lt = (x < y) as i8; // --> -1
let is_x_nan = x.is_nan() as i8;
let is_y_nan = y.is_nan() as i8;
is_y_nan - is_x_nan + is_gt - is_lt
}
(edit: made it branchless!)
so then NaN sorts to lower than anything else
that means compare x y == Same no longer gets you the single-instruction version
but !(x < y || x > y) still does
so it's still achievable if you really need it
branchless version of compare that has NaN compare to itself as Same, and sorts earlier than everything else:
assembly:
ucomisd xmm0, xmm1
seta sil
ucomisd xmm1, xmm0
seta dl
ucomisd xmm0, xmm0
setp cl
ucomisd xmm1, xmm1
setp al
sub al, cl
add al, sil
sub al, dl
ret
We probably don't want it to be branchless unless we expect NaN to be common.
https://www.quick-bench.com/q/9UEMY26B2QJMC_tPrWhxJx-lIm0
hm, true
The above link is slightly off and slower than it should be for the branchless vesion. This is correct:
https://www.quick-bench.com/q/sDiORDcEEjBhv3DXYGKE8f9o_nc
Still slow but not as slow
if x.is_nan() || y.is_nan() {
let is_gt = (x > y) as i8; // ---> 1
let is_lt = (x < y) as i8; // ---> -1
let is_x_nan = x.is_nan() as i8;
let is_y_nan = y.is_nan() as i8;
is_y_nan - is_x_nan + is_gt - is_lt
} else {
if x == y {
0
} else if x > y {
1
} else {
-1
}
}
huh, interestingly negating the condition and switching the branches actually optimizes to what appears to be better code in the case where neither is NaN:
if !(x.is_nan() || y.is_nan()) {
if x == y {
0
} else if x > y {
1
} else {
-1
}
} else {
let is_gt = (x > y) as i8; // ---> 1
let is_lt = (x < y) as i8; // ---> -1
let is_x_nan = x.is_nan() as i8;
let is_y_nan = y.is_nan() as i8;
is_y_nan - is_x_nan + is_gt - is_lt
}
ucomisd xmm0, xmm0
jp .LBB4_4
ucomisd xmm1, xmm1
jp .LBB4_4
ucomisd xmm0, xmm1
jne .LBB4_6
jp .LBB4_6
xor eax, eax
ret
.LBB4_4:
ucomisd xmm1, xmm1
setp al
ucomisd xmm0, xmm0
setp cl
ucomisd xmm0, xmm1
seta dl
ucomisd xmm1, xmm0
seta sil
sub al, cl
add al, dl
sub al, sil
ret
.LBB4_6:
seta al
add al, al
add al, -1
ret
vs
ucomisd xmm0, xmm0
jp .LBB3_2
ucomisd xmm1, xmm1
jp .LBB3_2
mov al, 1
ucomisd xmm0, xmm1
jne .LBB3_4
jp .LBB3_4
ret
.LBB3_2:
ucomisd xmm0, xmm0
setp sil
ucomisd xmm0, xmm1
seta cl
ucomisd xmm1, xmm0
seta dl
ucomisd xmm1, xmm1
setp al
sub al, sil
add al, cl
cmp al, dl
sete al
ret
.LBB3_4:
xor eax, eax
ret
When using those to generate equality, it isn't great:
https://www.quick-bench.com/q/JdET2lFO_EURwMX4EES1RgQmvYE
I guess I should probably modify the benchmark to actually generate the full comparison in all case.
Ok. Full comparison makes a big difference here. This final version is the closest to regular comparison: https://www.quick-bench.com/q/HoelTlk5T2tAagguaax6TASi4ho
but still 2.5x slower
that's unfortunate
here's an interesting thought: if the design is "NaNs always sort to the front" (or to the back, if that's somehow faster; either way seems fine to me, really) what if we let NaNs have nonsensical comparisons?
could that help somehow?
e.g. try to make it be that if either value is NaN we return -1 or something like that
(edit: wait, that would incorrectly order a bunch of things that aren't NaNs :laughing: )
so the sorting algorithms would slow down in the presence of NaNs because they'd do way more swaps than necessary, but they'd still get a predictable answer
but maybe an algorithm like that could have the case where there are no NaNs go much faster?
Ok here is the benchmark of full comparison.
For Cmp, NaN is always -1.
For all other cases, NaN is less than everything else but equal to itself.
When looking at full comparison, the overhead can be pretty low
It is just when use full comparison to derive a specific comparison that it has major problems with performance.
so the sorting algorithms would slow down in the presence of NaNs because they'd do way more swaps than necessary, but they'd still get a predictable answer
I think this is what would happen in most languages rn, but maybe I am wrong.
I wonder if they would ever get stuck in an infinite loop :thinking:
oh that algorithm actually causes a problem though
because not only does cmp(NaN, 1.0) return -1, so does cmp(1.0, NaN)
so I think the else branch would need to do a nan check
because not only does
cmp(NaN, 1.0)return -1, so doescmp(1.0, NaN)
intentionally so. I think that is how it would get implemented in c++ or similar.
Though I guess in reality most sorting algorithms would just use the comparison operators directly where they would just get false if they saw a nan.
(moved this into its own thread so others can catch up on the design stuff more easily!)
would the NaNs actually get sorted to the front then?
if they always return -1 regardless of which way the comparison goes
I guess the idea is that the non-NaN value would get compared to its other neighbor and move up?
It heavily depends on the algorithm used for sorting. If you used quicksort, is a nan was a pivot, nothing would move. If anything else was a pivot, all nans would move.
hrm, yeah that's definitely a problem :sweat_smile:
Actually that is wrong. You would be checking if cmp pivot num == 1 and moving elements if that is true. So that would just never swap nans. Which I guess would make them sort to the end of the list because they never move.
If you use cmp num pivot == -1 instead. nans will always move. So they will be sorted to the beginning of the list.
-> which is actually really broken if nan ever is the pivot because it will move every single element to the other side of it. So all nans that are pivots will be sorted the end of a list and all nans that aren't pivots will be sorted to the begining. So yeah, broken.
:thumbs_up:
So I think this generally works if you are specifically checking only < or only checking >, but if your algorithm has more checks, it may break due to nans being inconsistent.
maybe a better design would be making the default Ordering be consistent (but slower even if there are no NaNs) and then if you're confident your floats contain no NaNs, you can use sortWith and provide a custom compare that does if x > y { ... } else if x < y { ... } else { ... }
so basically assuming the else branch could only possibly be "they are equal" - which will be fine as long as you have no NaNs, and potentially break in exciting ways otherwise :stuck_out_tongue:
What if this was the default:
if(x==y){
cmp = 0;
} else if (x > y || std::isnan(y)) {
cmp = 1;
} else {
cmp = 1;
}
how's the performance on that?
(also, shouldn't that be -1 at the end?)
Yep. Hopefully that doesn't ruin the performance...
https://www.quick-bench.com/q/SGd6RHzRVTyFY8AlPGzZijI4ILE
So it is slower than NaN always resulting in same/equality but marginally better than what we had before with NaN being equal to NaN. About the same speed as NaN always being -1, but not broken liken NaN always resulting in -1.
If you can guarantee that you don't have NaN, it is still a comparision cost of 2x. But for most sorting algorithms that probably doesn't matter. The cost to move memory is probably way higher.
But in a hot loop, would be expensive.
cool! :thumbs_up:
seems worth considering either that or the slowest but least error-prone as a default, especially considering there's really no way to prevent someone from using sortWith on a custom "fastest but breaks if there are NaNs" implementation
Last updated: Jun 16 2026 at 16:19 UTC