custom numbers and strings · ideas

so it seems to me that the best way to do custom numbers is to have number literals have the type:

a where [module(a).{
    from_int_digits : {
        is_negative : Bool,
        digits : Iter(U8),
    } -> Result(a, OutOfRange),
}]

and then the compiler can unify that with either a builtin number type or a custom user space number

Richard Feldman (Jul 13 2025 at 20:19):

Richard Feldman (Jul 13 2025 at 20:20):

these can also be bigger than builtin numbers because it's just an iterator of digits (we'd strip out the underscores for you, and give you the negative as a bool)

Richard Feldman (Jul 13 2025 at 20:21):

so userspace arbitrary-size integers can Just Work with number literals even though builtin numbers only go up to I128/U128

Richard Feldman (Jul 13 2025 at 20:23):

Richard Feldman (Jul 13 2025 at 20:24):

we can call that function at compile time as part of compile-time evaluation of constants, and if it returns Err we give a compile error

Richard Feldman (Jul 13 2025 at 20:25):

we already do this for the builtins, and should be straightforward to generalize it to user-defined number types

Richard Feldman (Jul 13 2025 at 20:25):

the repl solution of not showing the type of number literals also seems good to me

Richard Feldman (Jul 13 2025 at 20:26):

this all brings up another thing that's been bothering me for a long time, and which could be nicely solved by extending this design to strings

Richard Feldman (Jul 13 2025 at 20:26):

Richard Feldman (Jul 13 2025 at 20:36):

and I often want to give them a string literal, but also I often want to give them a Path, and paths might not be valid utf-8 (e.g. if I got the path from a Dir.list! and the path contained invalid utf-8 on the filesystem, which every major filesystem permits)

Richard Feldman (Jul 13 2025 at 20:38):

one solution is to have File.read! and File.read_path!, one of which takes a Str and one of which takes a Path, and then duplicate every operation like that

Richard Feldman (Jul 13 2025 at 20:38):

Richard Feldman (Jul 13 2025 at 20:39):

it's super common for languages to have a solution for this, so that you can give file I/O operations a string literal or a Path and either one Just Works

Richard Feldman (Jul 13 2025 at 20:39):

Richard Feldman (Jul 13 2025 at 20:43):

a where [module(a).{
    from_int_digits : {
        is_negative : Bool,
        digits : Iter(U8),
    } -> Result(a, OutOfRange),
}]

a where [module(a).{
    from_str : Str -> Result(a, BadStr)
}]

Richard Feldman (Jul 13 2025 at 20:45):

so then as long as the Path module has that function, you can write a function that takes a Path and call it passing a string literal, e.g. both of these can Just Work:

contents = read!(my_path, Json.utf8)?

contents = read!("x.json", Json.utf8)?

Richard Feldman (Jul 13 2025 at 20:45):

Richard Feldman (Jul 13 2025 at 20:46):

this is something I've always wanted for the file I/O and Http APIs, and if we're already doing it for numbers, it feels like a relatively small change to do it for strings too

Richard Feldman (Jul 13 2025 at 20:47):

and it would really cut down on API surface area without sacrificing correctness at all

Richard Feldman (Jul 13 2025 at 20:47):

Kiryl Dziamura (Jul 13 2025 at 21:16):

I would expect from_bytes instead of from_int_digits. What digit a is if it's written in hex notation? 10? If it's 10, then hex was parsed by compiler. If it's parsed by compiler - why not bytes immediately? The same with numbers with dot

Richard Feldman (Jul 13 2025 at 21:21):

Kiryl Dziamura (Jul 13 2025 at 21:21):

or we can introduce r notation (raw? idk)? 0r<whatever>. like, I want to use base17 so I want to have g as well (I'm not serious here, but you get the idea)

Richard Feldman (Jul 13 2025 at 21:21):

I assume bytes immediately would be less convenient for arbitrary-size types but I could be wrong

Richard Feldman (Jul 13 2025 at 21:24):

anyway I think we could iterate on exactly what format to provide; the real question in my mind is whether this is the overall design we want to use for literals!

Kiryl Dziamura (Jul 13 2025 at 21:26):

the core idea is reasonable. but my experience is too limited to come up with a use case :smile:

Kiryl Dziamura (Jul 13 2025 at 21:28):

I mean for numbers. huh. maybe built-in validation? e.g., a type for auto-wrapping uints

Kiryl Dziamura (Jul 13 2025 at 21:33):

and also string validation. besides path validation, it can be SQL validation! or css validation, or you name it. would it work with string interpolation? likely not

Richard Feldman (Jul 13 2025 at 21:34):

string interpolation Just Works bc first we resolve the interpolation, giving us a normal string, and then we pass that to the conversion function

Richard Feldman (Jul 13 2025 at 21:34):

Richard Feldman (Jul 13 2025 at 21:35):

custom numbers have been one of the most frequently requested language features over the year, and they have been deal-breakers

Richard Feldman (Jul 13 2025 at 21:36):

as in, people who have arrived to Zulip and asked whether Roc supports custom number types have mostly tended to leave after learning that we don't

Richard Feldman (Jul 13 2025 at 21:36):

which is unfortunate given Roc's goal of being nice for the long tail of use cases :smile:

Kiryl Dziamura (Jul 13 2025 at 21:36):

Luke Boswell (Jul 13 2025 at 21:44):

Do you think there is much of a cost to doing this? Is it just a constant bit of extra processing at compile time for every number and string literal in source?

Luke Boswell (Jul 13 2025 at 21:45):

Kiryl Dziamura (Jul 13 2025 at 21:46):

Maybe a.from_num : num where [ num : Num ] -> Result(a, err) is enough? So it's up to roc how to parse number. And then, if you want to have it in runtime - it just works with any Num

x = CustomNum.from_num(42) # x : CustomNum
y = CustomNum.from_num(factorial(4)) # y : Result(CustomNum, err)

Kiryl Dziamura (Jul 13 2025 at 21:46):

Kiryl Dziamura (Jul 13 2025 at 21:48):

x : MyCoolBase17
x = "24g"

Richard Feldman (Jul 13 2025 at 21:50):

I think we could make it cheap by caching whether a particular type unifies with string literals, num literals, etc. so that after the first unification, the others are just checking a boolean

Richard Feldman (Jul 13 2025 at 21:51):

this doesn't work if you want to make a numeric type that's bigger than any of Roc's builtin types, e.g. an arbitrary-sized integer that can be bigger than U128

Richard Feldman (Jul 13 2025 at 21:51):

Kiryl Dziamura (Jul 13 2025 at 21:51):

Kiryl Dziamura (Jul 13 2025 at 21:52):

Richard Feldman (Jul 13 2025 at 21:56):

then we might get inconsistent handling for underscores, we can't introduce new syntax without breaking existing parsers, etc.

Richard Feldman (Jul 13 2025 at 21:57):

easy fix for the hex/octal/etc is to add a base field, which the compiler would supply as 2, 8, 10, 16, etc:

a where [module(a).{
    from_int_digits : {
        is_negative : Bool,
        digits : Iter(U8),
        base: U8,
    } -> Result(a, OutOfRange),
}]

Kiryl Dziamura (Jul 13 2025 at 21:59):

num literals already have their meaning. if you use 101 you know the value is always 101. but if your custom type implements base2 there - you get an obscure bug. thus there's 0b101 notation

Richard Feldman (Jul 13 2025 at 22:00):

I mean custom number types can always have bugs haha, the question is how error-prone it is for them to implement what we want correctly

Richard Feldman (Jul 13 2025 at 22:00):

and I think giving them arbitrary strings is a lot more error-prone, especially because most number literals won't have underscores (and also they might not think to parse hexadecimal)

Richard Feldman (Jul 13 2025 at 22:01):

seems super easy to just give them digits, a base, and whether it's negative; that way they know exactly what they need to handle. seems like you'd almost need to go out of your way to not handle the edge cases correctly haha

Kiryl Dziamura (Jul 13 2025 at 22:02):

I'm wondering what is what we want here. because one thing is x = CustomType.from_num(101) and another x = 101 with inferred type for x

Richard Feldman (Jul 13 2025 at 22:03):

x = 101 works the same way as today, aside from the type behind the scenes being different

Kiryl Dziamura (Jul 13 2025 at 22:03):

Kiryl Dziamura (Jul 13 2025 at 22:04):

Richard Feldman (Jul 13 2025 at 22:04):

right, which I think is exactly what you want when it comes to number literals (and string literals in the cases mentioned above)

Richard Feldman (Jul 13 2025 at 22:05):

like nobody wants to have to do MyCustomNum.from_num(42) * MyCustomNum.from_num(2) when they could write 42 * 2

Luke Boswell (Jul 13 2025 at 22:17):

One risk... could this be abused to make roc code hard to understand at first glance? Particularly if there's no type annotation. The nominal custom number or thing type might be far away which is hard to understand what your creating :thinking:

Luke Boswell (Jul 13 2025 at 22:18):

I can't really think of ways to abuse it... but the misuse cases might be good to think about too

Luke Boswell (Jul 13 2025 at 22:19):

Could you accidentally put a number or string literal in and it silently becomes the thing because they implemented that method... and now your confused somehow?

Kiryl Dziamura (Jul 13 2025 at 22:24):

I think that array.map(calc) where calc does math with num literals would work differently based on array type and it bothers me. but it's still true with operator overloading that roc would provide so num literal overloading won't add anything more of confusion to the mix. it would only make the mix more consistent

Kiryl Dziamura (Jul 13 2025 at 22:25):

yes, technically it's not overloading, but it feels like that when you don't see types

Richard Feldman (Jul 13 2025 at 22:41):

yeah I'd be curious to see any examples anyone can think of of where this could be misused or lead to confusion

Luke Boswell (Jul 13 2025 at 22:49):

I've scratched my brain, and explored a bit with various LLMs... I cannot think of anything but upsides.

Luke Boswell (Jul 13 2025 at 22:50):

Anton (Jul 14 2025 at 06:20):

It feels a bit like hidden magic, but I'm in favor overall, syncing between Path and File in basic-cli is already a chore.

Brendan Hansknecht (Jul 14 2025 at 14:08):

I would just make from bytes instead of digits. Cause there is also handling for decimal point and exponents

Brendan Hansknecht (Jul 14 2025 at 14:08):

On tops of that, I think a list would be more convenient for that function than an iterator

Brendan Hansknecht (Jul 14 2025 at 14:09):

Brendan Hansknecht (Jul 14 2025 at 14:10):

I mean maybe something richer is fine, but I would make it like a list of an enum that is a digit wrapper or decimal point and a list for the exponent digits. Maybe make the base explicit as well in the enum...but not sure

Richard Feldman (Jul 14 2025 at 14:14):

yeah I guess a List would be fine since the allocation would happen at compile time anyway

Brendan Hansknecht (Jul 14 2025 at 14:44):

Even better, since it is injected by the compiler, you can make a seamless slice targeting the original source bytes.

Brendan Hansknecht (Jul 14 2025 at 14:44):

Richard Feldman (Jul 14 2025 at 15:19):

not if there are underscores in it, plus we need to convert from ASCII to a number (including hexadecimal conversion if necessary)

Brendan Hansknecht (Jul 14 2025 at 15:21):

Ok...we still could do a one time copy with reformal to an arena for all numbers and then give slices to each call

Brendan Hansknecht (Jul 14 2025 at 15:21):

Kiryl Dziamura (Jul 14 2025 at 15:52):

I still don't understand how num literals which have established meaning because of their format (0b, 0x, 0o, 42), have any meaning in decoding not from value but from representation. I get that i128 is the biggest range, but if we need bigger dec, why can't we introduce bigint notation that can be used only with custom types?

Kiryl Dziamura (Jul 14 2025 at 15:55):

x = 42
y = 0x2A

Sky Rose (Jul 14 2025 at 16:09):

1.0e2 and 1.00e2 could be meaningfully different to a scientific library that keeps track of precision.

Kiryl Dziamura (Jul 14 2025 at 16:27):

I honestly don't understand it. It means that roc provides a limited number of num notations and user is able to change their meaning, with no possibility of creating their own notations.
Also, num notation becomes a union itself implicitly (because it's possible to have 42 != 0x2A or 1.0e2 != 1.00e2)

Kiryl Dziamura (Jul 14 2025 at 16:39):

Richard Feldman (Jul 14 2025 at 16:52):

Richard Feldman (Jul 14 2025 at 16:53):

giving people control of the meaning of arbitrary number strings is how we end up being unable to change things in the future

Richard Feldman (Jul 14 2025 at 16:53):

extreme examples of userspace constricting language design space being smooshMap

Richard Feldman (Jul 14 2025 at 16:54):

I think it's a good point that different notations being different seems bad, but then maybe that means we should put them all in a consistent format (e.g. take care of the base for you)

Brendan Hansknecht (Jul 14 2025 at 17:03):

I think we likely should normalize numbers to a single format before sending them to the underlying number constructor

Brendan Hansknecht (Jul 14 2025 at 17:04):

So I think that we should guarantee that 0x2 0b10 and 2 are all indistinguishable in the number constructor

Brendan Hansknecht (Jul 14 2025 at 17:05):

Brendan Hansknecht (Jul 14 2025 at 17:06):

If we normalize, that guarantees that 0x12 == 18 for all numbers types of they support that value (could also just fail to initialize at compile time)

Richard Feldman (Jul 14 2025 at 17:06):

Richard Feldman (Jul 14 2025 at 17:07):

Brendan Hansknecht (Jul 14 2025 at 17:12):

Brendan Hansknecht (Jul 14 2025 at 17:13):

Kiryl Dziamura (Jul 14 2025 at 17:18):

Normalization is a good idea, thank you.
A list of digits is required to support bigint implementations?

Richard Feldman (Jul 14 2025 at 17:23):

Richard Feldman (Jul 14 2025 at 17:24):

would it be sufficient in the general case to do digits plus exponent digits? :thinking:

Kiryl Dziamura (Jul 14 2025 at 17:24):

Richard Feldman (Jul 14 2025 at 17:24):

Kiryl Dziamura (Jul 14 2025 at 17:28):

CustomNum.from_digits([2, 0, 2, 5])

It's ok, but now if you want to implement serialization, you need repro the implementation but for bytes.
What I mean, isn't bytes list superior?

Richard Feldman (Jul 14 2025 at 17:29):

Kiryl Dziamura (Jul 14 2025 at 17:30):

Richard Feldman (Jul 14 2025 at 17:30):

Kiryl Dziamura (Jul 14 2025 at 17:31):

Richard Feldman (Jul 14 2025 at 17:32):

Kiryl Dziamura (Jul 14 2025 at 17:34):

Richard Feldman (Jul 14 2025 at 17:34):

Richard Feldman (Jul 14 2025 at 17:35):

I think we may actually want to name it more explicitly for its purpose, e.g. from_literal

Richard Feldman (Jul 14 2025 at 17:35):

so it's clear that this type is intended to be compatible with number literals (in a similar way to why we want it to be named plus if you want to work with +, as opposed to naming it add)

Kiryl Dziamura (Jul 14 2025 at 19:52):

Signed(a) : [ Positive(a), Negative(b) ]
Digits : (List(U8), Signed(List(U8))) # value, exp
from_literal : [ NaN, ..Signed([Infinity, Value(Digits)]) ]-> ...

Jasper Woudenberg (Jul 15 2025 at 05:56):

Couple of potential downsides inspired by Haskell, which does a similar thing for literals (if the right extensions are enabled):

Jasper Woudenberg (Jul 15 2025 at 06:10):

(I quite like the proposed approach by the way, think I would enjoy using it, just kicking the tires a bit.)

Possible way from_str could be abused: building library APIs that put some complicated parsing in it. For instance:

The lure of using from_str for things like this is that you can get the parsing to happen at compile time. The downside is that error message quality when parsing fails will be bad.

Kiryl Dziamura (Jul 15 2025 at 06:13):

If Foo.from_string : Str -> Result(t, err) is defined - the string literal would be passed to it at comptime and the error would be whatever err returns. So the message will be something like error occured during creation type Foo from string literal: err

Good point. The problem not even in the literal vs string but in Result(type, error) vs just type

Kiryl Dziamura (Jul 15 2025 at 06:15):

Can the same abuse happen without the comptime evaluation and literal conversion?

Luke Boswell (Jul 15 2025 at 06:17):

These sound cool :smiley: -- shame about the errors, but that's probably ok if your ok with something as quick and ergonomic as this. Maybe it might be a problem if people start building webservers and all the endpoints are setup this way :shrug:

Stream: ideas

Topic: custom numbers and strings

Richard Feldman (Jul 13 2025 at 20:18):

Richard Feldman (Jul 13 2025 at 20:19):

Richard Feldman (Jul 13 2025 at 20:20):

Richard Feldman (Jul 13 2025 at 20:21):

Richard Feldman (Jul 13 2025 at 20:23):

Richard Feldman (Jul 13 2025 at 20:24):

Richard Feldman (Jul 13 2025 at 20:25):

Richard Feldman (Jul 13 2025 at 20:25):

Richard Feldman (Jul 13 2025 at 20:26):

Richard Feldman (Jul 13 2025 at 20:26):

Richard Feldman (Jul 13 2025 at 20:36):

Richard Feldman (Jul 13 2025 at 20:38):

Richard Feldman (Jul 13 2025 at 20:38):

Richard Feldman (Jul 13 2025 at 20:39):

Richard Feldman (Jul 13 2025 at 20:39):

Richard Feldman (Jul 13 2025 at 20:43):

Richard Feldman (Jul 13 2025 at 20:45):

Richard Feldman (Jul 13 2025 at 20:45):

Richard Feldman (Jul 13 2025 at 20:46):

Richard Feldman (Jul 13 2025 at 20:47):

Richard Feldman (Jul 13 2025 at 20:47):

Kiryl Dziamura (Jul 13 2025 at 21:16):

Richard Feldman (Jul 13 2025 at 21:21):

Kiryl Dziamura (Jul 13 2025 at 21:21):

Richard Feldman (Jul 13 2025 at 21:21):

Richard Feldman (Jul 13 2025 at 21:24):

Kiryl Dziamura (Jul 13 2025 at 21:26):

Kiryl Dziamura (Jul 13 2025 at 21:28):

Kiryl Dziamura (Jul 13 2025 at 21:33):

Richard Feldman (Jul 13 2025 at 21:34):

Richard Feldman (Jul 13 2025 at 21:34):

Richard Feldman (Jul 13 2025 at 21:35):

Richard Feldman (Jul 13 2025 at 21:36):

Richard Feldman (Jul 13 2025 at 21:36):

Kiryl Dziamura (Jul 13 2025 at 21:36):

Luke Boswell (Jul 13 2025 at 21:44):

Luke Boswell (Jul 13 2025 at 21:45):

Kiryl Dziamura (Jul 13 2025 at 21:46):

Kiryl Dziamura (Jul 13 2025 at 21:46):

Kiryl Dziamura (Jul 13 2025 at 21:48):

Richard Feldman (Jul 13 2025 at 21:50):

Richard Feldman (Jul 13 2025 at 21:51):

Richard Feldman (Jul 13 2025 at 21:51):

Kiryl Dziamura (Jul 13 2025 at 21:51):

Kiryl Dziamura (Jul 13 2025 at 21:52):

Richard Feldman (Jul 13 2025 at 21:56):

Richard Feldman (Jul 13 2025 at 21:57):

Kiryl Dziamura (Jul 13 2025 at 21:59):

Richard Feldman (Jul 13 2025 at 22:00):

Richard Feldman (Jul 13 2025 at 22:00):

Richard Feldman (Jul 13 2025 at 22:01):

Kiryl Dziamura (Jul 13 2025 at 22:02):

Richard Feldman (Jul 13 2025 at 22:03):

Kiryl Dziamura (Jul 13 2025 at 22:03):

Kiryl Dziamura (Jul 13 2025 at 22:04):

Richard Feldman (Jul 13 2025 at 22:04):

Richard Feldman (Jul 13 2025 at 22:05):

Luke Boswell (Jul 13 2025 at 22:17):

Luke Boswell (Jul 13 2025 at 22:18):

Luke Boswell (Jul 13 2025 at 22:19):

Kiryl Dziamura (Jul 13 2025 at 22:24):

Kiryl Dziamura (Jul 13 2025 at 22:25):

Richard Feldman (Jul 13 2025 at 22:41):

Luke Boswell (Jul 13 2025 at 22:49):

Luke Boswell (Jul 13 2025 at 22:50):

Anton (Jul 14 2025 at 06:20):

Brendan Hansknecht (Jul 14 2025 at 14:08):

Brendan Hansknecht (Jul 14 2025 at 14:08):

Brendan Hansknecht (Jul 14 2025 at 14:09):

Brendan Hansknecht (Jul 14 2025 at 14:10):

Richard Feldman (Jul 14 2025 at 14:14):

Brendan Hansknecht (Jul 14 2025 at 14:44):

Brendan Hansknecht (Jul 14 2025 at 14:44):

Richard Feldman (Jul 14 2025 at 15:19):

Brendan Hansknecht (Jul 14 2025 at 15:21):

Brendan Hansknecht (Jul 14 2025 at 15:21):

Kiryl Dziamura (Jul 14 2025 at 15:52):

Kiryl Dziamura (Jul 14 2025 at 15:55):