Branchless helper bug? · beginners

I've been experimenting with different ways to count the number of backslashes in a large file. I wrote a function which I hope will be branchless (but haven't checked the dissasembly yet) however it is giving me an incorrect answer and I am not sure if I am doing something wrong, or maybe we have a bug somewhere.

The below seems to work ok, the below tests pass etc... however when I run it on a large file of UTF-8 bytes that I know has 1230 backslashes it gives me an incorrect answer of 24_642_180.

countSlashesBranchlessHelp : Nat, U8 -> Nat
countSlashesBranchlessHelp = \count, byte ->
    byte
    |> Num.bitwiseXor '\\'          # XOR will be 0 if values are equal
    |> Num.subWrap 0b0000_0001      # MSB 1 if values are equal, 0 otherwise
    |> Num.shiftRightBy 7u8         # Shift MSB to LSB
    |> Num.toNat                    # Convert to Nat
    |> Num.addWrap count            # Add to running count

expect List.walk ['1','2','3'] 0 countSlashesBranchlessHelp == 0
expect List.walk ['1','2','\\','3','\\'] 0 countSlashesBranchlessHelp == 2

Anton (Aug 14 2023 at 11:56):

It could be valuable to progressively half the input file string with a function to see at which point the count starts going wrong.

Ayaz Hafiz (Aug 14 2023 at 13:59):

I think the xor/subtraction logic is incorrect. For example, for <, which is ascii code 60 0b0111100, the xor against \=92 will be 0b1011000 xor 0b0111100 = 0b1100100. The subtraction and right shift will then end up with 1 in the LSB, if I did my math right.

Luke Boswell (Aug 15 2023 at 07:56):

Luke Boswell (Aug 15 2023 at 08:36):

Update to my previous. I changed it to the below which now gives the expected answer!

countSlashesBranchlessHelp : Nat, U8 -> Nat
countSlashesBranchlessHelp = \count, byte ->

    # XOR will be 0 if values are equal
    a =  Num.bitwiseXor byte '\\'

    # Use shifts to propogate 1's
    b = Num.bitwiseOr a (Num.shiftRightBy a 4)
    c = Num.bitwiseOr b (Num.shiftRightBy b 2)
    d = Num.bitwiseOr c (Num.shiftRightBy c 1)

    # Invert the bits
    e = Num.bitwiseNot d

    # Mask for the LSB
    f = Num.bitwiseAnd e 1

    # Add to running count
    f |> Num.toNat |> Num.addWrap count

Luke Boswell (Aug 15 2023 at 08:43):

Unfortunately, though I think it looks nicer, if I write it out using a pipeline it is roughly 25% slower. I assume this is because it is because the lambdas are not being inlined in the function?

countSlashesBranchlessHelp : Nat, U8 -> Nat
countSlashesBranchlessHelp = \count, byte ->
    byte
    |> Num.bitwiseXor  '\\'                         # XOR will be 0 if values are equal
    |> \c -> Num.bitwiseOr c (Num.shiftRightBy c 4) # Use shifts to propogate 1's
    |> \c -> Num.bitwiseOr c (Num.shiftRightBy c 2)
    |> \c -> Num.bitwiseOr c (Num.shiftRightBy c 1)
    |> \c -> Num.bitwiseNot c                       # Invert the bits
    |> \c -> Num.bitwiseAnd c 1                     # Mask for the LSB
    |> Num.toNat                                    # Convert to Nat
    |> Num.addWrap count                            # Add to running count

Luke Boswell (Aug 15 2023 at 08:44):

I guess my question here is should I expect the pipeline version to run at the same speed? or is that unreasonable?

Folkert de Vries (Aug 15 2023 at 08:51):

Richard Feldman (Aug 15 2023 at 10:35):

we do want to do our own inlining at some point, but it's not a priority in the near term

Stream: beginners

Topic: Branchless helper bug?

Luke Boswell (Aug 14 2023 at 11:29):

Anton (Aug 14 2023 at 11:56):

Ayaz Hafiz (Aug 14 2023 at 13:59):

Luke Boswell (Aug 15 2023 at 07:56):

Luke Boswell (Aug 15 2023 at 08:36):

Luke Boswell (Aug 15 2023 at 08:43):

Luke Boswell (Aug 15 2023 at 08:44):

Folkert de Vries (Aug 15 2023 at 08:51):

Folkert de Vries (Aug 15 2023 at 08:51):

Richard Feldman (Aug 15 2023 at 10:35):

Brendan Hansknecht (Aug 15 2023 at 14:50):