reading integers from bytes · API design

The current Num.bytesToU16 (and similar) functions don't take endianness into account, which means they'll give different answers on different CPUs. We've been eliminating those scenarios, and this seems like one worth eliminating too.

More broadly, I recently realized that adding things to Num which rely on endianness (like bytesToU16 and similar) was probably premature. I think they should be removed for now and revisited later in the context of specific concrete use cases.

For example, binary serialization formats necessarily specify things like endianness as part of the format, so it's not clear to me how helpful dedicated Num builtins for translating between bytes and certain integer sizes would be in practice.

I don't think this would block anything from being built in Roc. Consider Zig's std.mem.readVarPackedInt. It supports decoding bit offsets, and does everything in userspace using (as far as I can see) operations that are available in Num in Roc. If a function like that is implementable in userspace, I think all the relevant use cases should be unblocked here.

Obviously in the future we can revisit this if specific use cases come up which justify builtins, but I think in this case it's worth the forcing function of starting with userspace and seeing what that experience is like in practice.

Brendan Hansknecht (Jan 23 2024 at 20:43):

Brendan Hansknecht (Jan 23 2024 at 20:46):

Brendan Hansknecht (Jan 23 2024 at 20:48):

Num.u16ToBytes : U16, [BE, LE] -> (U8, U8)
Num.u16FromBytes : (U8, U8), [BE, LE] -> U16

Num.appendBytesToList : List U8, Num a, [BE, LE] -> List U8

Richard Feldman (Jan 23 2024 at 21:36):

yeah I remember the discussion, I just think a better plan is to try taking them out altogether :big_smile:

Richard Feldman (Jan 23 2024 at 21:37):

and seeing if it really feels like there's justification for making them builtins after all, based on how they end up being used in practice

Brendan Hansknecht (Jan 23 2024 at 21:37):

Richard Feldman (Jan 23 2024 at 21:38):

Brendan Hansknecht (Jan 23 2024 at 21:38):

Brendan Hansknecht (Jan 23 2024 at 21:39):

Feels like something llvm should be able to optimize to the same thing, but IIRC when working on hashing, that wasn't the case.

Richard Feldman (Jan 23 2024 at 21:39):

you mean in the case where the target endianness matches the requested endianness?

Richard Feldman (Jan 23 2024 at 21:40):

Brendan Hansknecht (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:40):

Brendan Hansknecht (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:40):

Brendan Hansknecht (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:41):

Brendan Hansknecht (Jan 23 2024 at 21:41):

Will go double check in godbolt now. Maybe before something else was messing up the generation.

Richard Feldman (Jan 23 2024 at 21:45):

Brendan Hansknecht (Jan 23 2024 at 21:52):

Brendan Hansknecht (Jan 23 2024 at 21:54):

We would need a way to load a tuple from a list. In current roc, you have to get each individual element. That is a huge cost.

So I guess the primitive that I would need is List.get8: List a, Nat -> Result (a, a, a, a, a, a, a, a) [OutOfBounds]. Same for other numeric sizes.

Brendan Hansknecht (Jan 23 2024 at 21:56):

Otherwise, you as stuck with n branches to check size (which hopefully optimize into one), and then loading each individual element a single byte at a time.

Brendan Hansknecht (Jan 23 2024 at 21:57):

Which I guess the proposed Num.*FromBytes apis don't actually fix. So good thing this was thought about now either way.

Richard Feldman (Jan 23 2024 at 21:58):

Richard Feldman (Jan 23 2024 at 21:59):

Richard Feldman (Jan 23 2024 at 22:00):

Brendan Hansknecht (Jan 23 2024 at 22:03):

Oh wait, even better, we shouldn't need the primitives. Pattern matching and seamless slices for the win.

when List.dropFirst list index is
    [a, b, c, d, e, f, g, h, ...] ->
    _ ->

Brendan Hansknecht (Jan 23 2024 at 22:04):

Yeah, I say lets leave it to userland. Should be trivial for someone to make a package for it if they want to.

Richard Feldman (Jan 23 2024 at 22:13):

Luke Boswell (Jan 23 2024 at 22:21):

The specific use case I want this for is a binary encoder/decoder so I can efficiently cache data and reuse between calls in basic-webserver, we can add something like set : U64, List U8 -> Task {} [OutOfSpace] and get : U64 -> Task (List U8) [NotFound] and encode/decode in the app or even platform if we want.

Brendan Hansknecht (Jan 23 2024 at 22:25):

Sure, so when you define the binary encoder/decoder, just have to manually implement this stuff in Encoder.i32 and friends.

Brendan Hansknecht (Jan 23 2024 at 22:26):

Luke Boswell (Jan 23 2024 at 22:27):

Maybe if you could give me a worked example for I32 or something I can make the rest happen. I'm not sure I follow how it all comes together

Brendan Hansknecht (Jan 23 2024 at 22:30):

Brendan Hansknecht (Jan 23 2024 at 23:23):

We need dec to/from i128. We also need f32/f64 to/from sign, exponent, and mantissa.

Luke Boswell (Jan 23 2024 at 23:25):

inline fn floatExponentBits(comptime T: type) comptime_int
inline fn floatMantissaBits(comptime T: type) comptime_int

Brendan Hansknecht (Jan 24 2024 at 01:23):

@Richard Feldman For dec, it obviously can't use Num.toI128 to get the underlying I128. I'm not exactly sure a good name for it. I guess it technically is atto in terms of metric prefixes. Num.decToAtto doesn't sound even vaguely discoverable. Could do Num.decToBytes and return a tuple of U8. Or maybe Num.decToRaw or something like that....yeah...not sure a good name here.

Luke Boswell (Jan 24 2024 at 01:50):

Brendan Hansknecht (Jan 28 2024 at 02:29):

Cause this will be useful for some of the fuzzing stuff I am currently looking at, wanted to pin down api a bit. Also, will be needed for binary encoding/decoding in roc.

For Dec, I think it can probably simply be this with a doc mentioning that these are integers scaled by 10^-18

Num.decToRaw : Dec -> I128
Num.decFromRaw : I128 -> Dec

For float, I think there are a few possibilities for the api.
Probably most direct would be:

Num.f32ToRaw : F32 -> { sign: Bool, exponent: U8, fraction: U32 }
Num.f64ToRaw : F64 -> { sign: Bool, exponent: U16, fraction: U64 }

-- Also the reverse

Num.f32ToRaw : F32 -> U32
Num.f64ToRaw : F64 -> U64

-- Also the reverse

Or a direct byte function of some sort that gives a tuple, but I think that would be less useful.

For all the above float APIs, they could also use signed types instead of unsigned.

Brendan Hansknecht (Jan 28 2024 at 02:30):

For extract parts of a float, we could make each part its own function, but we can't really do that for building a float. I mean we could, but it would be kinda strange to like apply the fractional part and then add on an exponent

Luke Boswell (Jan 28 2024 at 07:39):

They have that an f64 will be bitwise converted into a u64, and encoded as a little-endian array of eight bytes.

For example, the float value -32.005859375f64 would be bitwise represented as 0xc040_00c0_0000_0000u64, and encoded as [0x00, 0x00, 0x00, 0x00, 0xc0, 0x00, 0x40, 0xc0].

Would the above Num.f64ToRaw : F64 -> U64 be the same as this? I assume we might want the more explicit { sign: Bool, exponent: U16, fraction: U64 } if we want to support really specific encoding/decodings?

Luke Boswell (Jan 28 2024 at 07:41):

I guess we can always just bitshift things around if we need a different ordering. Though maybe the API should be more like Num.f64ToRaw : F64, [BE, LE] -> U64?

Brendan Hansknecht (Jan 28 2024 at 07:46):

Converting to a U64 without an endian specifier should be fine. It will just stay in the same endian ambiguous form. Then you can write it in little endian form into the final buffer.

Brendan Hansknecht (Jan 28 2024 at 07:48):

Cause both the float and the int will be in native endian. Then you write the int into the buffer lowest byte to highest byte to get a little endian buffer

Richard Feldman (Jan 28 2024 at 12:18):

Num.withoutDecimalPt : Dec -> I128
Num.withDecimalPt : I128 -> Dec

Brendan Hansknecht (Jan 28 2024 at 16:29):

I definitely don't think I would ever think to reach for a function named that. Also, I am a bit concerned the name is too close to withoutDecimalPart which sounds like it would return the whole number portion of the Dec.

Richard Feldman (Jan 28 2024 at 16:58):

yeah I just always try to avoid names that basically say "to internal implementation" because it pretty much guarantees you can never change the internal implementation

Richard Feldman (Jan 28 2024 at 16:59):

as opposed to a name that describes the transformation, which at least potentially leaves the door open to changing the internal representation in the future and then backporting the function to still return what it says it does

Richard Feldman (Jan 28 2024 at 17:00):

which in this particular case might never happen, but people look to builtins for naming conventions, so I want to avoid establishing "to internal representation" as a naming convention in builtins if possible! :big_smile:

Brendan Hansknecht (Jan 28 2024 at 17:22):

True, but these function are actually for serde and binary protocols. They truly are meant to get the raw bytes.

Brendan Hansknecht (Jan 28 2024 at 17:23):

These types just don't allow raw access like integers (via bit shifts and masks)

Brendan Hansknecht (Jan 28 2024 at 17:24):

Luke Boswell (Jan 28 2024 at 21:47):

Brendan Hansknecht (Jan 28 2024 at 23:19):

No, it converts any numeric type into any other numeric type by just directly moving the bits.

Brendan Hansknecht (Jan 28 2024 at 23:20):

No checks. If the old type is smaller, zero pad (maybe sign extend?). If the old type is bigger truncate.

Brendan Hansknecht (Jan 28 2024 at 23:20):

Richard Feldman (Jan 28 2024 at 23:34):

I'm a little worried about that...I don't know if a lot of people appreciate the distinction between bit cast and numeric cast, and I could see people calling Num.bitCast thinking it will work more like Num.intCast

Brendan Hansknecht (Jan 28 2024 at 23:38):

What are intCasts semantics currently? I think it is a bitcast, just only for integers.

Brendan Hansknecht (Jan 28 2024 at 23:41):

Though maybe it panics if a value doesn't fit or is supposed to panic if a value doesn't fit?

Richard Feldman (Jan 29 2024 at 00:23):

I'm not sure...also I'm not totally sure we should have that one either :laughing:

Brendan Hansknecht (Jan 29 2024 at 00:40):

Fair enough. In that case, sounds like a couple of bespoke methods. One to get the parts of a floats and one to remove the decimal point from a dec is probably the way to go.

Brendan Hansknecht (Jan 29 2024 at 00:40):

Brendan Hansknecht (Jan 29 2024 at 00:45):

Num.withoutDecimalPoint : Dec -> I128
Num.withDecimalPoint : I128 -> Dec

Num.f32ToParts : F32 -> { sign : Bool, exponent : U8, fraction : U32 }
-- plus reverse

-- plus for f64

Richard Feldman (Jan 29 2024 at 00:55):

Fabian Schmalzried (Mar 15 2024 at 07:50):

Fabian Schmalzried (Mar 20 2024 at 07:07):

What should happen if the fraction is bigger than allowed in f32FromParts? Ignore the extra bits, or should it return a result?

Luke Boswell (Apr 14 2025 at 01:47):

Luke Boswell (Apr 14 2025 at 01:49):

I thought we had an issue for this but maybe we never made one as I cant find it

Brendan Hansknecht (Apr 14 2025 at 02:12):

Brendan Hansknecht (Apr 14 2025 at 02:13):

That said, in practice, I think this ended up being the wrong decide (cause roc doesn't have arbitrary width intgers

Brendan Hansknecht (Apr 14 2025 at 02:13):

Frankly, at this point, I would suggest removing this and going with the raw conversion like the issue you linked suggested.

Brendan Hansknecht (Apr 14 2025 at 02:13):

Luke Boswell (Apr 14 2025 at 02:35):

Ahk, I forgot about that. I don't have a strong opinion here, but moving to the simpler API sounds good to me.

I think we should drop a comment/update on that Issue so Lars or someone else has a decision to reference and is unblocked to progress the change.

Stream: API design

Topic: reading integers from bytes

Richard Feldman (Jan 23 2024 at 20:26):

Brendan Hansknecht (Jan 23 2024 at 20:43):

Brendan Hansknecht (Jan 23 2024 at 20:46):

Brendan Hansknecht (Jan 23 2024 at 20:48):

Richard Feldman (Jan 23 2024 at 21:36):

Richard Feldman (Jan 23 2024 at 21:37):

Brendan Hansknecht (Jan 23 2024 at 21:37):

Brendan Hansknecht (Jan 23 2024 at 21:37):

Richard Feldman (Jan 23 2024 at 21:38):

Brendan Hansknecht (Jan 23 2024 at 21:38):

Brendan Hansknecht (Jan 23 2024 at 21:39):

Richard Feldman (Jan 23 2024 at 21:39):

Richard Feldman (Jan 23 2024 at 21:40):

Brendan Hansknecht (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:40):

Brendan Hansknecht (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:40):

Brendan Hansknecht (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:40):

Richard Feldman (Jan 23 2024 at 21:41):

Brendan Hansknecht (Jan 23 2024 at 21:41):

Brendan Hansknecht (Jan 23 2024 at 21:41):

Richard Feldman (Jan 23 2024 at 21:45):

Brendan Hansknecht (Jan 23 2024 at 21:52):

Brendan Hansknecht (Jan 23 2024 at 21:52):

Brendan Hansknecht (Jan 23 2024 at 21:54):

Brendan Hansknecht (Jan 23 2024 at 21:56):

Brendan Hansknecht (Jan 23 2024 at 21:57):

Richard Feldman (Jan 23 2024 at 21:58):

Richard Feldman (Jan 23 2024 at 21:59):

Richard Feldman (Jan 23 2024 at 22:00):

Brendan Hansknecht (Jan 23 2024 at 22:03):

Brendan Hansknecht (Jan 23 2024 at 22:04):

Richard Feldman (Jan 23 2024 at 22:13):

Luke Boswell (Jan 23 2024 at 22:21):

Brendan Hansknecht (Jan 23 2024 at 22:25):

Brendan Hansknecht (Jan 23 2024 at 22:26):

Luke Boswell (Jan 23 2024 at 22:27):

Brendan Hansknecht (Jan 23 2024 at 22:30):

Brendan Hansknecht (Jan 23 2024 at 23:23):

Luke Boswell (Jan 23 2024 at 23:25):

Brendan Hansknecht (Jan 24 2024 at 01:23):

Luke Boswell (Jan 24 2024 at 01:50):

Brendan Hansknecht (Jan 28 2024 at 02:29):

Brendan Hansknecht (Jan 28 2024 at 02:30):

Luke Boswell (Jan 28 2024 at 07:39):

Luke Boswell (Jan 28 2024 at 07:41):

Brendan Hansknecht (Jan 28 2024 at 07:46):

Brendan Hansknecht (Jan 28 2024 at 07:48):

Richard Feldman (Jan 28 2024 at 12:18):

Brendan Hansknecht (Jan 28 2024 at 16:29):

Richard Feldman (Jan 28 2024 at 16:58):

Richard Feldman (Jan 28 2024 at 16:59):

Richard Feldman (Jan 28 2024 at 17:00):

Brendan Hansknecht (Jan 28 2024 at 17:22):

Brendan Hansknecht (Jan 28 2024 at 17:23):

Brendan Hansknecht (Jan 28 2024 at 17:24):

Luke Boswell (Jan 28 2024 at 21:47):

Brendan Hansknecht (Jan 28 2024 at 23:19):

Brendan Hansknecht (Jan 28 2024 at 23:20):

Brendan Hansknecht (Jan 28 2024 at 23:20):

Richard Feldman (Jan 28 2024 at 23:34):

Brendan Hansknecht (Jan 28 2024 at 23:38):

Brendan Hansknecht (Jan 28 2024 at 23:41):

Richard Feldman (Jan 29 2024 at 00:23):

Brendan Hansknecht (Jan 29 2024 at 00:40):

Brendan Hansknecht (Jan 29 2024 at 00:40):

Brendan Hansknecht (Jan 29 2024 at 00:45):

Richard Feldman (Jan 29 2024 at 00:55):

Fabian Schmalzried (Mar 15 2024 at 07:50):

Fabian Schmalzried (Mar 20 2024 at 07:07):

Luke Boswell (Apr 14 2025 at 01:47):

Luke Boswell (Apr 14 2025 at 01:49):

Brendan Hansknecht (Apr 14 2025 at 02:12):

Brendan Hansknecht (Apr 14 2025 at 02:13):

Brendan Hansknecht (Apr 14 2025 at 02:13):

Brendan Hansknecht (Apr 14 2025 at 02:13):

Luke Boswell (Apr 14 2025 at 02:35):