The current Num.bytesToU16
(and similar) functions don't take endianness into account, which means they'll give different answers on different CPUs. We've been eliminating those scenarios, and this seems like one worth eliminating too.
More broadly, I recently realized that adding things to Num
which rely on endianness (like bytesToU16
and similar) was probably premature. I think they should be removed for now and revisited later in the context of specific concrete use cases.
For example, binary serialization formats necessarily specify things like endianness as part of the format, so it's not clear to me how helpful dedicated Num
builtins for translating between bytes and certain integer sizes would be in practice.
I don't think this would block anything from being built in Roc. Consider Zig's std.mem.readVarPackedInt
. It supports decoding bit offsets, and does everything in userspace using (as far as I can see) operations that are available in Num
in Roc. If a function like that is implementable in userspace, I think all the relevant use cases should be unblocked here.
Obviously in the future we can revisit this if specific use cases come up which justify builtins, but I think in this case it's worth the forcing function of starting with userspace and seeing what that experience is like in practice.
The plan here was to change the api and make it take endianness
Also to make to return a tuple when going from a U16
to bytes.
The apis discussed where the following:
Num.u16ToBytes : U16, [BE, LE] -> (U8, U8)
Num.u16FromBytes : (U8, U8), [BE, LE] -> U16
Num.appendBytesToList : List U8, Num a, [BE, LE] -> List U8
yeah I remember the discussion, I just think a better plan is to try taking them out altogether :big_smile:
and seeing if it really feels like there's justification for making them builtins after all, based on how they end up being used in practice
I don't understand though. They will clearly be wanted for binary protocols
They also are much more performant as builtins then as bit shifting.
hm, why would they be more performant as builtins? :thinking:
Cause they just use type casting instead of bitshifting.
Feels like something llvm should be able to optimize to the same thing, but IIRC when working on hashing, that wasn't the case.
you mean in the case where the target endianness matches the requested endianness?
yeah I assumed LLVM would optimize that, surprising that it doesn't
Both are faster, but that is the most affected
well it's not safe to cast unless the endianness matches, right?
I can double check. Maybe I had something else off
otherwise you can end up with the wrong answer
There is a single instruction to flip the endianness though
whoa, I didn't know that! :astonished:
ok if LLVM doesn't optimize that, that's a very important consideration haha
Yeah cause all the network protocols are big endian but cpus are little endian.
Will go double check in godbolt now. Maybe before something else was messing up the generation.
appreciate it!
Ok, nvm, llvm gets it: https://zig.godbolt.org/z/xG4YPYcoG
I wonder what I hit before that messed this up... :shrug:
So verbose, but doable
Oh sorry, I remember the issue I was hitting. It is in bytes to integers.
We would need a way to load a tuple from a list. In current roc, you have to get each individual element. That is a huge cost.
So I guess the primitive that I would need is List.get8: List a, Nat -> Result (a, a, a, a, a, a, a, a) [OutOfBounds]
. Same for other numeric sizes.
Otherwise, you as stuck with n branches to check size (which hopefully optimize into one), and then loading each individual element a single byte at a time.
Which I guess the proposed Num.*FromBytes
apis don't actually fix. So good thing this was thought about now either way.
ha, makes sense!
yeah those list primitives seem totally reasonable :+1:
so how about we try adding those primitives and see how it goes in practice?
Oh wait, even better, we shouldn't need the primitives. Pattern matching and seamless slices for the win.
when List.dropFirst list index is
[a, b, c, d, e, f, g, h, ...] ->
_ ->
Yeah, I say lets leave it to userland. Should be trivial for someone to make a package for it if they want to.
wow, great point!!!
The specific use case I want this for is a binary encoder/decoder so I can efficiently cache data and reuse between calls in basic-webserver, we can add something like set : U64, List U8 -> Task {} [OutOfSpace]
and get : U64 -> Task (List U8) [NotFound]
and encode/decode in the app or even platform if we want.
Sure, so when you define the binary encoder/decoder, just have to manually implement this stuff in Encoder.i32
and friends.
It all can be done in userland
Maybe if you could give me a worked example for I32 or something I can make the rest happen. I'm not sure I follow how it all comes together
Lets pair, I think we have a few different things to discuss
Ok. so there are a few functions we do need for binary encoding in the std lib:
We need dec to/from i128. We also need f32/f64 to/from sign, exponent, and mantissa.
We should use these for the impl
inline fn floatExponentBits(comptime T: type) comptime_int
inline fn floatMantissaBits(comptime T: type) comptime_int
@Richard Feldman For dec, it obviously can't use Num.toI128
to get the underlying I128
. I'm not exactly sure a good name for it. I guess it technically is atto
in terms of metric prefixes. Num.decToAtto
doesn't sound even vaguely discoverable. Could do Num.decToBytes
and return a tuple of U8
. Or maybe Num.decToRaw
or something like that....yeah...not sure a good name here.
I like Num.decToRaw
or Num.decToBytes
Cause this will be useful for some of the fuzzing stuff I am currently looking at, wanted to pin down api a bit. Also, will be needed for binary encoding/decoding in roc.
For Dec, I think it can probably simply be this with a doc mentioning that these are integers scaled by 10^-18
Num.decToRaw : Dec -> I128
Num.decFromRaw : I128 -> Dec
For float, I think there are a few possibilities for the api.
Probably most direct would be:
Num.f32ToRaw : F32 -> { sign: Bool, exponent: U8, fraction: U32 }
Num.f64ToRaw : F64 -> { sign: Bool, exponent: U16, fraction: U64 }
-- Also the reverse
That said, we could also just allow bitcasting a float to/from a U32/U64.
Num.f32ToRaw : F32 -> U32
Num.f64ToRaw : F64 -> U64
-- Also the reverse
Or a direct byte function of some sort that gives a tuple, but I think that would be less useful.
For all the above float APIs, they could also use signed types instead of unsigned.
Anyone have any thoughts and what would be the nicest api here?
For extract parts of a float, we could make each part its own function, but we can't really do that for building a float. I mean we could, but it would be kinda strange to like apply the fractional part and then add on an exponent
Looking at the postcard wire format for no particular reason other than it looks like a useful reference, https://postcard.jamesmunns.com/wire-format#13---f64
They have that an f64 will be bitwise converted into a u64, and encoded as a little-endian array of eight bytes.
For example, the float value -32.005859375f64 would be bitwise represented as 0xc040_00c0_0000_0000u64, and encoded as [0x00, 0x00, 0x00, 0x00, 0xc0, 0x00, 0x40, 0xc0].
Would the above Num.f64ToRaw : F64 -> U64
be the same as this? I assume we might want the more explicit { sign: Bool, exponent: U16, fraction: U64 }
if we want to support really specific encoding/decodings?
I guess we can always just bitshift things around if we need a different ordering. Though maybe the API should be more like Num.f64ToRaw : F64, [BE, LE] -> U64
?
Converting to a U64 without an endian specifier should be fine. It will just stay in the same endian ambiguous form. Then you can write it in little endian form into the final buffer.
Cause both the float and the int will be in native endian. Then you write the int into the buffer lowest byte to highest byte to get a little endian buffer
what about these for names?
Num.withoutDecimalPt : Dec -> I128
Num.withDecimalPt : I128 -> Dec
Interesting. I definitely get the idea...
I definitely don't think I would ever think to reach for a function named that. Also, I am a bit concerned the name is too close to withoutDecimalPart
which sounds like it would return the whole number portion of the Dec.
yeah I just always try to avoid names that basically say "to internal implementation" because it pretty much guarantees you can never change the internal implementation
as opposed to a name that describes the transformation, which at least potentially leaves the door open to changing the internal representation in the future and then backporting the function to still return what it says it does
which in this particular case might never happen, but people look to builtins for naming conventions, so I want to avoid establishing "to internal representation" as a naming convention in builtins if possible! :big_smile:
True, but these function are actually for serde and binary protocols. They truly are meant to get the raw bytes.
These types just don't allow raw access like integers (via bit shifts and masks)
Personally, I think I would prefer just a single generic Num.bitCast
Num.bitCast
sounds nice. Does it always return a List U8
?
No, it converts any numeric type into any other numeric type by just directly moving the bits.
No checks. If the old type is smaller, zero pad (maybe sign extend?). If the old type is bigger truncate.
I think zig has @bitCast
that would be the same
I'm a little worried about that...I don't know if a lot of people appreciate the distinction between bit cast and numeric cast, and I could see people calling Num.bitCast
thinking it will work more like Num.intCast
What are intCasts semantics currently? I think it is a bitcast, just only for integers.
Though maybe it panics if a value doesn't fit or is supposed to panic if a value doesn't fit?
I'm not sure...also I'm not totally sure we should have that one either :laughing:
Fair enough. In that case, sounds like a couple of bespoke methods. One to get the parts of a floats and one to remove the decimal point from a dec is probably the way to go.
Then of course the reverse method for building the types.
Num.withoutDecimalPoint : Dec -> I128
Num.withDecimalPoint : I128 -> Dec
Num.f32ToParts : F32 -> { sign : Bool, exponent : U8, fraction : U32 }
-- plus reverse
-- plus for f64
sounds good to me!
I will try to implement those
What should happen if the fraction is bigger than allowed in f32FromParts? Ignore the extra bits, or should it return a result?
Just saw this Issue raised https://github.com/roc-lang/roc/issues/7739 -- is this thread effectively the direction we plan on going for this, so Num.f32ToParts
?
I thought we had an issue for this but maybe we never made one as I cant find it
They are already implemented?
https://github.com/roc-lang/roc/blob/966d0459e7ccb1bd28cb77c05b8419953ef167af/crates/compiler/builtins/roc/Num.roc#L154-L157
That said, in practice, I think this ended up being the wrong decide (cause roc doesn't have arbitrary width intgers
Frankly, at this point, I would suggest removing this and going with the raw conversion like the issue you linked suggested.
let user do the bit twiddling as they need.
Ahk, I forgot about that. I don't have a strong opinion here, but moving to the simpler API sounds good to me.
I think we should drop a comment/update on that Issue so Lars or someone else has a decision to reference and is unblocked to progress the change.
WDYT @Brendan Hansknecht ?
I commented on the issue
Last updated: Jul 06 2025 at 12:14 UTC