Stream: beginners

Topic: Determine Endianness for Implementing SHA-1


view this post on Zulip Sam Mohr (Jun 09 2024 at 03:25):

I was trying to implement SHA-1 using Wikipedia's pseudocode and realized that I have no way to know whether the machine is running big or little endian mode. I need it for the UUID library I'm writing (specifically UUID v5). If I'm trying to convert a U128 to a series of big-endian U32 "words", how would I do that in Roc?

view this post on Zulip Sam Mohr (Jun 09 2024 at 03:38):

Worst-case, I take an { endianness: [Big, Little] } module param on the Sha-1 module, not ideal but it's better than assuming little endian

view this post on Zulip Notification Bot (Jun 09 2024 at 03:38):

A message was moved here from #beginners > List files in a directory by Sam Mohr.

view this post on Zulip Luke Boswell (Jun 09 2024 at 03:56):

I think this is by design. Im not sure. But I guess it is so that Roc code executes the same everwhere.

view this post on Zulip Luke Boswell (Jun 09 2024 at 03:57):

Would it help if the platform provided a primitive that told you what the underlying system was?

view this post on Zulip Sam Mohr (Jun 09 2024 at 03:58):

I'm not a pro on endianness and it's implementation in different languages, but that's actually the worry here: I'm not confident that Roc will execute the same way everywhere. If it can be either big or little endian, then byte-level operations may have different results on different machines.

view this post on Zulip Sam Mohr (Jun 09 2024 at 03:59):

Luke Boswell said:

Would it help if the platform provided a primitive that told you what the underlying system was?

Yes, I think that is necessary unless Roc makes a guarantee of which endianness it runs, which I'd be suprised by, since the architecture of the machine you're running on is optimized for one or the other.

view this post on Zulip Sam Mohr (Jun 09 2024 at 04:02):

Another reason why I'd see that being an issue is if Roc code can be distributed in a way that the platform could compile the Roc code to either big or little endian without setting such a flag that you're proposing, since the Roc is already compiled to a .so. Not sure if that's possible, but it could be a problem

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 05:31):

I think you should be able to implement this without worrying about endianess in roc.

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 05:31):

Sure you could get the info from a platform, but I don't think it is needed here

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 05:32):

Bit shifting is the magical way to convert from your sha big endian numbers to a native endian number and back

view this post on Zulip Sam Mohr (Jun 09 2024 at 05:34):

Brendan Hansknecht said:

Bit shifting is the magical way to convert from your sha big endian numbers to a native endian number and back

Okay, so yeah, if I bit shift (which I'm already doing), that'll normalize the result. If that's true, which I think it is, then no need to worry about this anymore.

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 05:34):

If you have the bytes [ 0x00, 0x00, 0x00, 0x07 ] as your 32 bit big endian number input, you do:

when bytes is
    [b3, b2, b1, b0, ..] ->
        num = (b3 << 24) |  (b3 << 16) |  (b3 << 8) |  b3

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 05:34):

Now num is whatever native endian happens to be

view this post on Zulip Sam Mohr (Jun 09 2024 at 05:40):

Well, just to make sure we're on the same page, I need to go the other way. SHA1 requires big endian, and I "don't know" what I'm running on. So how do I get from native to big, not the other way?

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 05:42):

Still doable with bitshifting:

num = 7
b0 = Num.toU8 (num)
b1 = Num.toU8 (num >> 8)
b2 = Num.toU8 (num >> 16)
b3 = Num.toU8 (num >> 24)

# Big endian out
[b3, b2, b1, b0]

# little endian out
[b0, b1, b2, b3]

view this post on Zulip Sam Mohr (Jun 09 2024 at 05:44):

Okay, this should be hard to screw up, thanks

view this post on Zulip Richard Feldman (Jun 09 2024 at 14:11):

oof, I didn’t realize that bit shifting leaks target details :sweat_smile:

view this post on Zulip Richard Feldman (Jun 09 2024 at 14:13):

as Luke mentioned earlier, it’s a design goal that Roc code should give the same answers regardless of what target it’s running on, and the fact that bit shifts give different answers depending on native endianness breaks that

view this post on Zulip Richard Feldman (Jun 09 2024 at 14:14):

so I think it would be best not to depend on that behavior, because I’d like to try to figure out a way to change it!

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 15:45):

oof, I didn’t realize that bit shifting leaks target details

It doesn't

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 15:47):

It always gives the same answer, which is why you can use it to extract bytes from native endian. Then you can order them as big or little endian

view this post on Zulip Brendan Hansknecht (Jun 09 2024 at 15:48):

So the roc users doesn't know if the number is actually stored in big or little endian. They can just extract the bytes then order it as they please.

view this post on Zulip Richard Feldman (Jun 09 2024 at 19:54):

sweet, thanks for clarifying!


Last updated: Jul 06 2025 at 12:14 UTC