I was trying to implement SHA-1 using Wikipedia's pseudocode and realized that I have no way to know whether the machine is running big or little endian mode. I need it for the UUID library I'm writing (specifically UUID v5). If I'm trying to convert a U128
to a series of big-endian U32 "words", how would I do that in Roc?
Worst-case, I take an { endianness: [Big, Little] }
module param on the Sha-1 module, not ideal but it's better than assuming little endian
A message was moved here from #beginners > List files in a directory by Sam Mohr.
I think this is by design. Im not sure. But I guess it is so that Roc code executes the same everwhere.
Would it help if the platform provided a primitive that told you what the underlying system was?
I'm not a pro on endianness and it's implementation in different languages, but that's actually the worry here: I'm not confident that Roc will execute the same way everywhere. If it can be either big or little endian, then byte-level operations may have different results on different machines.
Luke Boswell said:
Would it help if the platform provided a primitive that told you what the underlying system was?
Yes, I think that is necessary unless Roc makes a guarantee of which endianness it runs, which I'd be suprised by, since the architecture of the machine you're running on is optimized for one or the other.
Another reason why I'd see that being an issue is if Roc code can be distributed in a way that the platform could compile the Roc code to either big or little endian without setting such a flag that you're proposing, since the Roc is already compiled to a .so
. Not sure if that's possible, but it could be a problem
I think you should be able to implement this without worrying about endianess in roc.
Sure you could get the info from a platform, but I don't think it is needed here
Bit shifting is the magical way to convert from your sha big endian numbers to a native endian number and back
Brendan Hansknecht said:
Bit shifting is the magical way to convert from your sha big endian numbers to a native endian number and back
Okay, so yeah, if I bit shift (which I'm already doing), that'll normalize the result. If that's true, which I think it is, then no need to worry about this anymore.
If you have the bytes [ 0x00, 0x00, 0x00, 0x07 ]
as your 32 bit big endian number input, you do:
when bytes is
[b3, b2, b1, b0, ..] ->
num = (b3 << 24) | (b3 << 16) | (b3 << 8) | b3
Now num
is whatever native endian happens to be
Well, just to make sure we're on the same page, I need to go the other way. SHA1 requires big endian, and I "don't know" what I'm running on. So how do I get from native to big, not the other way?
Still doable with bitshifting:
num = 7
b0 = Num.toU8 (num)
b1 = Num.toU8 (num >> 8)
b2 = Num.toU8 (num >> 16)
b3 = Num.toU8 (num >> 24)
# Big endian out
[b3, b2, b1, b0]
# little endian out
[b0, b1, b2, b3]
Okay, this should be hard to screw up, thanks
oof, I didn’t realize that bit shifting leaks target details :sweat_smile:
as Luke mentioned earlier, it’s a design goal that Roc code should give the same answers regardless of what target it’s running on, and the fact that bit shifts give different answers depending on native endianness breaks that
so I think it would be best not to depend on that behavior, because I’d like to try to figure out a way to change it!
oof, I didn’t realize that bit shifting leaks target details
It doesn't
It always gives the same answer, which is why you can use it to extract bytes from native endian. Then you can order them as big or little endian
So the roc users doesn't know if the number is actually stored in big or little endian. They can just extract the bytes then order it as they please.
sweet, thanks for clarifying!
Last updated: Jul 06 2025 at 12:14 UTC