Number formats · beginners · Zulip Chat Archive

Should Str.toU128 support e? Roc currently supports this for F32 and F64, but not Dec or Int *. This has come up when adding tests for Json decoding of integers, as large numbers can be validly represented using an exponent in Json, but then are not able to be decoded into any Roc type except F64.

» Num.toStr 340_000_000_000_000_000_000_000_000_000_000_000_000u128

"340000000000000000000000000000000000000" : Str
                         # val20

» Str.toU128 "340000000000000000000000000000000000000"

Ok 340000000000000000000000000000000000000 : Result U128 [InvalidNumStr]
                         # val21

» Str.toU128 "34e37"

Err InvalidNumStr : Result U128 [InvalidNumStr]
                         # val22

Luke Boswell (Apr 23 2023 at 02:16):

~~Also should we also support E and +, e.g. 12e+2,12E+2 in as these are currently not supported but commonly used across various programming languages and number formats (XML, Python etc).~~

Luke Boswell (Apr 23 2023 at 02:16):

And a related question, json numbers technically should only ever be encoded as a double precision float and therefore should be a maximum 21 bytes. However, if we encode a U128 using Roc it could be "340282366920938463463374607431768211455" which is 39 bytes long. So should we support a max number of bytes in a json string for Decoding of the max for a double precision float (21), OR the max for a niaive Roc (Num.toStr) of the max U128 (39) bytes?

Brendan Hansknecht (Apr 23 2023 at 02:23):

Brendan Hansknecht (Apr 23 2023 at 02:26):

Cause the only json native type is f64. Of course you need to check errors when converting from f64 to other types

Luke Boswell (Apr 23 2023 at 02:28):

I'm not sure. My first thought was to collect the bytes that make up a valid json number, and then try to convert to the desired Roc type. What you suggest may be easier and more reliable. Wouldn't that mean there are two conversions, Str -> F64 -> U128 etc

Brendan Hansknecht (Apr 23 2023 at 02:32):

That is fair. I guess it isn't needed, but is 7.000 a valid U128? In js it is the same thing as 7. Idk. Also, i think the cost of f64 to other type will be small compared to parsing bytes

Luke Boswell (Apr 23 2023 at 02:34):

Current

decodeU16 = Decode.custom \bytes, @Json {} ->
    { taken, rest } = takeJsonNumber bytes

    result =
        taken
        |> Str.fromUtf8
        |> Result.try Str.toU16
        |> Result.mapErr \_ -> TooShort

    { result, rest }

Proposed

decodeU16 = Decode.custom \bytes, @Json {} ->
    { taken, rest } = takeJsonNumber bytes

    result =
        taken
        |> Str.fromUtf8
        |> Result.try Str.toF64
        |> Result.map Num.round
        |> Result.map Num.toU16
        |> Result.mapErr \_ -> TooShort

    { result, rest }

Luke Boswell (Apr 23 2023 at 02:35):

I guess one downside, is we can't support anything larger than a F64, but I guess that is a limitation with JSON. A workaround could be to use a Str or something and handle in roc. I think what you have suggested is probably the right way to go

Brendan Hansknecht (Apr 23 2023 at 02:35):

Luke Boswell (Apr 23 2023 at 02:36):

Brendan Hansknecht (Apr 23 2023 at 02:37):

Brendan Hansknecht (Apr 23 2023 at 02:40):

I would think that if an end user specifically needs a big int, they will have to deal with conversions. Store in a string or multiple ints. I don't think the json decoder should automatically deal with it, but i may be wrong. I have not worked a ton with json decoding and specs.

Brendan Hansknecht (Apr 23 2023 at 02:41):

I guess for encoding, this is where it really gets problematic. You don't want encoding a U64 to fail or lose data because it can't fit losslessly in a F64.

Luke Boswell (Apr 23 2023 at 03:01):

This works, is it too hacky? we could make a builtin that checks for fraction part in a float

decodeU16 = Decode.custom \bytes, @Json {} ->
    { taken, rest } = takeJsonNumber bytes

    result =
        taken
        |> Str.fromUtf8
        |> Result.try Str.toF64
        |> Result.try hasNoFractionPart
        |> Result.map Num.round
        |> Result.map Num.toU16
        |> Result.mapErr \_ -> TooShort

    { result, rest }

hasNoFractionPart : F64 -> Result F64 [HasFractionPart]
hasNoFractionPart = \a ->
    fraction = Num.floor((a-Num.toFrac(Num.floor(a/1.0))*1.0)*1000)

    if fraction == 0 then
        Ok a
    else
        Err HasFractionPart

expect
    result = hasNoFractionPart 12.0

    Result.isOk result

expect
    result = hasNoFractionPart 12.1

    Result.isErr result

Luke Boswell (Apr 23 2023 at 03:05):

Actually, this has a problem ... we need to use Result.try Num.toU16Checked instead

Brendan Hansknecht (Apr 23 2023 at 03:05):

Brendan Hansknecht (Apr 23 2023 at 03:06):

That said, as i am thinking about this more especially in the context of encode, i am a lot less sure which approach is better. I think we definitely should look at what other tools do. For example serde_json.

Luke Boswell (Apr 23 2023 at 03:10):

I might just leave this for now, and add some TODO comments for a later deep dive. I'm focussing right now on adding more test coverage and identifying issues like this. Don't want to get too off course here.

Brendan Hansknecht (Apr 23 2023 at 03:16):

Also, i would advise making an idea thread specifically for adding e to parsing with integer types. I feel like this one has been pretty derailed at this point.

Brendan Hansknecht (Apr 23 2023 at 03:25):

Brendan Hansknecht (Apr 23 2023 at 03:28):

They look to lean into how JS defines a number and not explicitly support any large numbers. So you hit issues with anything larger than the max i54. They also do not support u128 and i128 by default.

Luke Boswell (Apr 23 2023 at 03:38):

For Decoding; it defers to the Str builtins for this, e.g. Str.toI128. It takes the bytes for a valid json string (double precision float-64) and then attempts to convert it to the desired Roc number type. If that fails, then it is a decoding failure.

The story for Encoding is less compliant with Json right now, we just use Num.toStr which works fine 90% of the time, but will include far too many bytes for valid json if a large number like a U128, Dec, or precise float is used. I'm not sure what our preferred behaviour in these situations should be, we don't have any errors and can't fail when encoding. Would we want to panic in this situation?

Brendan Hansknecht (Apr 23 2023 at 03:58):

Wait decode gets errors, but not encode? I'm sure we made this decision for a reason, but sounds strange. I'm sure with certain output formats, encoding will definitely have error cases that should get reported.

Ajai Nelson (Apr 23 2023 at 04:42):

At least according to Wikipedia, json doesn't really specify anything about number precision:

Brendan Hansknecht (Apr 23 2023 at 18:00):

If you encode data in json as an arbitrary precision number, for example: {"myint": 9223372036854775807}, that precision will be lost in all browser. myint may claim to be 9223372036854775807, but in reality, when it is loaded on the frontend, it will be 9223372036854776000. This can be a nasty footgun.

So even though json does not specify precision, we should take it into account in order to build robust systems. Numbers that are too large should not be encoded as numbers in json. They should be strings or some sort of special large number format.

Richard Feldman (Apr 23 2023 at 19:21):

we should change encode to return a Result so it's allowed to fail based on the value

Richard Feldman (Apr 23 2023 at 19:21):

Brendan Hansknecht (Apr 23 2023 at 19:58):

With all of this, here is my current thought of something that could work nicely:

By default always take F64 restrictions into account.

When encoding, make sure the value can fit into an f64 without precision loss. If that is the case, encode it. Otherwise, return an error due to loss of precision.

When decoding, essentially decode to f64, then make sure the value can successfully convert to the correct number type without precision loss. so 7.0 is fine as a u16, but 7.3 is not.

Note: Dec may make this complicated. We probably should just always encode Dec as a Str. Given precision is very impotant to Dec, we need to be extra careful. We don't want 10.30 dollars to become 10.300000001 dollars or similar.

Have parameterization to allow ease of use.

One of those options would be to enable arbitrary precision mode. In that mode, all number types that could lose precision when converting to f64 (maybe all number types in general?) would just be encode as strings.

Another option could be ignoring precision loss and just converting all numbers to F64 without failure on precision loss.

Have a way to enable a specific field to be encoded as a string even though it is a number

This is probably not gonna work super nice in Roc, but still will likely be important form some application. The options to support this that I can think of are either to let the user do it manually by converting the type to a String, or adding some sort of opaque wrapper around a type that is the version that encodes as a string. Neither of these sound great, maybe someone else has a better way that we could support this. I think an opaque wrapper is the only way in roc to get a custom encode for a type.

This is nice in rust for example cause you can do it with an annotation to the field.

Stream: beginners

Topic: Number formats

Luke Boswell (Apr 23 2023 at 02:16):

Luke Boswell (Apr 23 2023 at 02:16):

Luke Boswell (Apr 23 2023 at 02:16):

Brendan Hansknecht (Apr 23 2023 at 02:23):

Brendan Hansknecht (Apr 23 2023 at 02:26):

Luke Boswell (Apr 23 2023 at 02:28):

Brendan Hansknecht (Apr 23 2023 at 02:32):

Luke Boswell (Apr 23 2023 at 02:34):

Current

Proposed

Luke Boswell (Apr 23 2023 at 02:35):

Brendan Hansknecht (Apr 23 2023 at 02:35):

Luke Boswell (Apr 23 2023 at 02:36):

Brendan Hansknecht (Apr 23 2023 at 02:37):

Brendan Hansknecht (Apr 23 2023 at 02:37):

Brendan Hansknecht (Apr 23 2023 at 02:40):

Brendan Hansknecht (Apr 23 2023 at 02:41):

Luke Boswell (Apr 23 2023 at 03:01):

Luke Boswell (Apr 23 2023 at 03:05):

Brendan Hansknecht (Apr 23 2023 at 03:05):

Brendan Hansknecht (Apr 23 2023 at 03:06):

Luke Boswell (Apr 23 2023 at 03:10):

Brendan Hansknecht (Apr 23 2023 at 03:16):

Brendan Hansknecht (Apr 23 2023 at 03:25):

Brendan Hansknecht (Apr 23 2023 at 03:28):

Luke Boswell (Apr 23 2023 at 03:38):

Brendan Hansknecht (Apr 23 2023 at 03:58):

Ajai Nelson (Apr 23 2023 at 04:42):

Brendan Hansknecht (Apr 23 2023 at 18:00):

Richard Feldman (Apr 23 2023 at 19:21):

Richard Feldman (Apr 23 2023 at 19:21):

Brendan Hansknecht (Apr 23 2023 at 19:58):

By default always take `F64` restrictions into account.

Have parameterization to allow ease of use.

Have a way to enable a specific field to be encoded as a string even though it is a number

Stream: beginners

Topic: Number formats

Luke Boswell (Apr 23 2023 at 02:16):

Luke Boswell (Apr 23 2023 at 02:16):

Luke Boswell (Apr 23 2023 at 02:16):

Brendan Hansknecht (Apr 23 2023 at 02:23):

Brendan Hansknecht (Apr 23 2023 at 02:26):

Luke Boswell (Apr 23 2023 at 02:28):

Brendan Hansknecht (Apr 23 2023 at 02:32):

Luke Boswell (Apr 23 2023 at 02:34):

Current

Proposed

Luke Boswell (Apr 23 2023 at 02:35):

Brendan Hansknecht (Apr 23 2023 at 02:35):

Luke Boswell (Apr 23 2023 at 02:36):

Brendan Hansknecht (Apr 23 2023 at 02:37):

Brendan Hansknecht (Apr 23 2023 at 02:37):

Brendan Hansknecht (Apr 23 2023 at 02:40):

Brendan Hansknecht (Apr 23 2023 at 02:41):

Luke Boswell (Apr 23 2023 at 03:01):

Luke Boswell (Apr 23 2023 at 03:05):

Brendan Hansknecht (Apr 23 2023 at 03:05):

Brendan Hansknecht (Apr 23 2023 at 03:06):

Luke Boswell (Apr 23 2023 at 03:10):

Brendan Hansknecht (Apr 23 2023 at 03:16):

Brendan Hansknecht (Apr 23 2023 at 03:25):

Brendan Hansknecht (Apr 23 2023 at 03:28):

Luke Boswell (Apr 23 2023 at 03:38):

Brendan Hansknecht (Apr 23 2023 at 03:58):

Ajai Nelson (Apr 23 2023 at 04:42):

Brendan Hansknecht (Apr 23 2023 at 18:00):

Richard Feldman (Apr 23 2023 at 19:21):

Richard Feldman (Apr 23 2023 at 19:21):

Brendan Hansknecht (Apr 23 2023 at 19:58):

By default always take F64 restrictions into account.

Have parameterization to allow ease of use.

Have a way to enable a specific field to be encoded as a string even though it is a number

By default always take `F64` restrictions into account.