Stream: ideas

Topic: custom numbers and strings


view this post on Zulip Richard Feldman (Jul 13 2025 at 20:18):

so it seems to me that the best way to do custom numbers is to have number literals have the type:

a where [module(a).{
    from_int_digits : {
        is_negative : Bool,
        digits : Iter(U8),
    } -> Result(a, OutOfRange),
}]

and then the compiler can unify that with either a builtin number type or a custom user space number

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:19):

we can also have from_dec_digits for fractional numbers

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:20):

these can also be bigger than builtin numbers because it's just an iterator of digits (we'd strip out the underscores for you, and give you the negative as a bool)

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:21):

so userspace arbitrary-size integers can Just Work with number literals even though builtin numbers only go up to I128/U128

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:23):

this all seems reasonable and nice!

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:24):

we can call that function at compile time as part of compile-time evaluation of constants, and if it returns Err we give a compile error

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:25):

we already do this for the builtins, and should be straightforward to generalize it to user-defined number types

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:25):

the repl solution of not showing the type of number literals also seems good to me

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:26):

this all brings up another thing that's been bothering me for a long time, and which could be nicely solved by extending this design to strings

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:26):

the basic problem is that we have functions like read! for reading files

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:36):

and I often want to give them a string literal, but also I often want to give them a Path, and paths might not be valid utf-8 (e.g. if I got the path from a Dir.list! and the path contained invalid utf-8 on the filesystem, which every major filesystem permits)

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:38):

one solution is to have File.read! and File.read_path!, one of which takes a Str and one of which takes a Path, and then duplicate every operation like that

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:38):

it's a similar situation with Url

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:39):

it's super common for languages to have a solution for this, so that you can give file I/O operations a string literal or a Path and either one Just Works

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:39):

we could have this if we did string literals the same way as number literals

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:43):

in other words, number literals have this type:

a where [module(a).{
    from_int_digits : {
        is_negative : Bool,
        digits : Iter(U8),
    } -> Result(a, OutOfRange),
}]

and then string literals have this type:

a where [module(a).{
    from_str : Str -> Result(a, BadStr)
}]

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:45):

so then as long as the Path module has that function, you can write a function that takes a Path and call it passing a string literal, e.g. both of these can Just Work:

contents = read!(my_path, Json.utf8)?
contents = read!("x.json", Json.utf8)?

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:45):

same with Urls

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:46):

this is something I've always wanted for the file I/O and Http APIs, and if we're already doing it for numbers, it feels like a relatively small change to do it for strings too

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:47):

and it would really cut down on API surface area without sacrificing correctness at all

view this post on Zulip Richard Feldman (Jul 13 2025 at 20:47):

I'm curious what others think of all this!

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:16):

I would expect from_bytes instead of from_int_digits. What digit a is if it's written in hex notation? 10? If it's 10, then hex was parsed by compiler. If it's parsed by compiler - why not bytes immediately? The same with numbers with dot

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:21):

I was assuming we'd convert from hex to decimal for you

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:21):

or we can introduce r notation (raw? idk)? 0r<whatever>. like, I want to use base17 so I want to have g as well (I'm not serious here, but you get the idea)

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:21):

I assume bytes immediately would be less convenient for arbitrary-size types but I could be wrong

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:24):

anyway I think we could iterate on exactly what format to provide; the real question in my mind is whether this is the overall design we want to use for literals!

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:26):

the core idea is reasonable. but my experience is too limited to come up with a use case :smile:

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:28):

I mean for numbers. huh. maybe built-in validation? e.g., a type for auto-wrapping uints

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:33):

and also string validation. besides path validation, it can be SQL validation! or css validation, or you name it. would it work with string interpolation? likely not

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:34):

string interpolation Just Works bc first we resolve the interpolation, giving us a normal string, and then we pass that to the conversion function

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:34):

so it never needs to know it was interpolated in the first place

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:35):

custom numbers have been one of the most frequently requested language features over the year, and they have been deal-breakers

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:36):

as in, people who have arrived to Zulip and asked whether Roc supports custom number types have mostly tended to leave after learning that we don't

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:36):

which is unfortunate given Roc's goal of being nice for the long tail of use cases :smile:

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:36):

I mean, custom numbers and custom parsers for num literal is not the same

view this post on Zulip Luke Boswell (Jul 13 2025 at 21:44):

Do you think there is much of a cost to doing this? Is it just a constant bit of extra processing at compile time for every number and string literal in source?

view this post on Zulip Luke Boswell (Jul 13 2025 at 21:45):

The ergonomics of this look amazing :heart_eyes:

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:46):

Maybe a.from_num : num where [ num : Num ] -> Result(a, err) is enough? So it's up to roc how to parse number. And then, if you want to have it in runtime - it just works with any Num

x = CustomNum.from_num(42) # x : CustomNum
y = CustomNum.from_num(factorial(4)) # y : Result(CustomNum, err)

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:46):

so there's also symmetry with from_str

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:48):

and you still can probably do this:

x : MyCoolBase17
x = "24g"

and then manually parse the string because MyCoolBase17 has from_str

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:50):

Luke Boswell said:

Do you think there is much of a cost to doing this? Is it just a constant bit of extra processing at compile time for every number and string literal in source?

I think we could make it cheap by caching whether a particular type unifies with string literals, num literals, etc. so that after the first unification, the others are just checking a boolean

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:51):

Kiryl Dziamura said:

Maybe a.from_num : num where [ num : Num ] -> Result(a, err) is enough? So it's up to roc how to parse number. And then, if you want to have it in runtime - it just works with any Num

x = CustomNum.from_num(42) # x : CustomNum
y = CustomNum.from_num(factorial(4)) # y : Result(CustomNum, err)

this doesn't work if you want to make a numeric type that's bigger than any of Roc's builtin types, e.g. an arbitrary-sized integer that can be bigger than U128

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:51):

that's where the digits design comes from

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:51):

this doesn't work if you want to make a numeric type that's bigger than any of Roc's builtin types, e.g. an arbitrary-sized integer that can be bigger than U128

but from_str works

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:52):

you have to implement parsing on your own, yes, but is it a big deal?

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:56):

then we might get inconsistent handling for underscores, we can't introduce new syntax without breaking existing parsers, etc.

view this post on Zulip Richard Feldman (Jul 13 2025 at 21:57):

easy fix for the hex/octal/etc is to add a base field, which the compiler would supply as 2, 8, 10, 16, etc:

a where [module(a).{
    from_int_digits : {
        is_negative : Bool,
        digits : Iter(U8),
        base: U8,
    } -> Result(a, OutOfRange),
}]

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 21:59):

num literals already have their meaning. if you use 101 you know the value is always 101. but if your custom type implements base2 there - you get an obscure bug. thus there's 0b101 notation

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:00):

I mean custom number types can always have bugs haha, the question is how error-prone it is for them to implement what we want correctly

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:00):

and I think giving them arbitrary strings is a lot more error-prone, especially because most number literals won't have underscores (and also they might not think to parse hexadecimal)

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:01):

seems super easy to just give them digits, a base, and whether it's negative; that way they know exactly what they need to handle. seems like you'd almost need to go out of your way to not handle the edge cases correctly haha

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 22:02):

I'm wondering what is what we want here. because one thing is x = CustomType.from_num(101) and another x = 101 with inferred type for x

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:03):

x = 101 works the same way as today, aside from the type behind the scenes being different

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 22:03):

but you're right. it's not related to how custom type parses num literal

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 22:04):

It's just implicit without any additional notation

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:04):

right, which I think is exactly what you want when it comes to number literals (and string literals in the cases mentioned above)

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:05):

like nobody wants to have to do MyCustomNum.from_num(42) * MyCustomNum.from_num(2) when they could write 42 * 2

view this post on Zulip Luke Boswell (Jul 13 2025 at 22:17):

One risk... could this be abused to make roc code hard to understand at first glance? Particularly if there's no type annotation. The nominal custom number or thing type might be far away which is hard to understand what your creating :thinking:

view this post on Zulip Luke Boswell (Jul 13 2025 at 22:18):

I can't really think of ways to abuse it... but the misuse cases might be good to think about too

view this post on Zulip Luke Boswell (Jul 13 2025 at 22:19):

Could you accidentally put a number or string literal in and it silently becomes the thing because they implemented that method... and now your confused somehow?

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 22:24):

I think that array.map(calc) where calc does math with num literals would work differently based on array type and it bothers me. but it's still true with operator overloading that roc would provide so num literal overloading won't add anything more of confusion to the mix. it would only make the mix more consistent

view this post on Zulip Kiryl Dziamura (Jul 13 2025 at 22:25):

yes, technically it's not overloading, but it feels like that when you don't see types

view this post on Zulip Richard Feldman (Jul 13 2025 at 22:41):

yeah I'd be curious to see any examples anyone can think of of where this could be misused or lead to confusion

view this post on Zulip Luke Boswell (Jul 13 2025 at 22:49):

I've scratched my brain, and explored a bit with various LLMs... I cannot think of anything but upsides.

view this post on Zulip Luke Boswell (Jul 13 2025 at 22:50):

Richard Feldman said:

we can call that function at compile time as part of compile-time evaluation of constants, and if it returns Err we give a compile error

This comptime Err part is really nice

view this post on Zulip Anton (Jul 14 2025 at 06:20):

It feels a bit like hidden magic, but I'm in favor overall, syncing between Path and File in basic-cli is already a chore.

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 14:08):

I would just make from bytes instead of digits. Cause there is also handling for decimal point and exponents

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 14:08):

On tops of that, I think a list would be more convenient for that function than an iterator

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 14:09):

But maybe neither is a big deal

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 14:10):

I mean maybe something richer is fine, but I would make it like a list of an enum that is a digit wrapper or decimal point and a list for the exponent digits. Maybe make the base explicit as well in the enum...but not sure

view this post on Zulip Richard Feldman (Jul 14 2025 at 14:14):

yeah I guess a List would be fine since the allocation would happen at compile time anyway

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 14:44):

Even better, since it is injected by the compiler, you can make a seamless slice targeting the original source bytes.

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 14:44):

Or some arena with all the strings in it

view this post on Zulip Richard Feldman (Jul 14 2025 at 15:19):

not if there are underscores in it, plus we need to convert from ASCII to a number (including hexadecimal conversion if necessary)

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 15:21):

Ok...we still could do a one time copy with reformal to an arena for all numbers and then give slices to each call

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 15:21):

But yeah, less convenient

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 15:52):

I still don't understand how num literals which have established meaning because of their format (0b, 0x, 0o, 42), have any meaning in decoding not from value but from representation. I get that i128 is the biggest range, but if we need bigger dec, why can't we introduce bigint notation that can be used only with custom types?

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 15:55):

Putting it differently,

x = 42
y = 0x2A

Do you expect x and y are equal or not?

view this post on Zulip Sky Rose (Jul 14 2025 at 16:09):

1.0e2 and 1.00e2 could be meaningfully different to a scientific library that keeps track of precision.

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 16:27):

I honestly don't understand it. It means that roc provides a limited number of num notations and user is able to change their meaning, with no possibility of creating their own notations.
Also, num notation becomes a union itself implicitly (because it's possible to have 42 != 0x2A or 1.0e2 != 1.00e2)

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 16:39):

It's the same as if this was possible: "x" != "\u{0078}"

view this post on Zulip Richard Feldman (Jul 14 2025 at 16:52):

Kiryl Dziamura said:

I honestly don't understand it. It means that roc provides a limited number of num notations and user is able to change their meaning, with no possibility of creating their own notations.

this is a feature, not a bug

view this post on Zulip Richard Feldman (Jul 14 2025 at 16:53):

giving people control of the meaning of arbitrary number strings is how we end up being unable to change things in the future

view this post on Zulip Richard Feldman (Jul 14 2025 at 16:53):

extreme examples of userspace constricting language design space being smooshMap

view this post on Zulip Richard Feldman (Jul 14 2025 at 16:54):

I think it's a good point that different notations being different seems bad, but then maybe that means we should put them all in a consistent format (e.g. take care of the base for you)

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:03):

I think we likely should normalize numbers to a single format before sending them to the underlying number constructor

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:04):

So I think that we should guarantee that 0x2 0b10 and 2 are all indistinguishable in the number constructor

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:05):

I don't think we can guarantee that for exponents though

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:05):

But they should still be given as just decimal digits

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:06):

If we normalize, that guarantees that 0x12 == 18 for all numbers types of they support that value (could also just fail to initialize at compile time)

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:06):

yep, makes sense!

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:06):

so then we send decimal digits I guess?

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:07):

(since we can't just send a plain integer since that would cap us at i128)

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:12):

Yeah, decimal digits with potential decimal point?

view this post on Zulip Brendan Hansknecht (Jul 14 2025 at 17:13):

And then exponent in the same form but never with a decimal point?

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 17:18):

Normalization is a good idea, thank you.
A list of digits is required to support bigint implementations?

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:23):

I can't think of a better way to support bigint implementations :smile:

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:24):

would it be sufficient in the general case to do digits plus exponent digits? :thinking:

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 17:24):

I'm thinking about how would it work without comptime evaluation

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:24):

at that point I don't think we need to care about decimal point

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 17:28):

CustomNum.from_digits([2, 0, 2, 5])

It's ok, but now if you want to implement serialization, you need repro the implementation but for bytes.
What I mean, isn't bytes list superior?

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:29):

what do the bytes mean?

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 17:30):

Memory representation

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:30):

what does the memory mean though? haha

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:30):

like a base-2 integer?

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:30):

so it's as if you had a big-endian integer type that just keeps going?

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 17:31):

Like, how you serialize your custom number to send over the wire

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:32):

well we can't know how the author of the custom number wants to do that

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 17:34):

Hm, true. so from_digits is useful only at comptime?

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:34):

probably

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:35):

I think we may actually want to name it more explicitly for its purpose, e.g. from_literal

view this post on Zulip Richard Feldman (Jul 14 2025 at 17:35):

so it's clear that this type is intended to be compatible with number literals (in a similar way to why we want it to be named plus if you want to work with +, as opposed to naming it add)

view this post on Zulip Kiryl Dziamura (Jul 14 2025 at 19:52):

Signed(a) : [ Positive(a), Negative(b) ]
Digits : (List(U8), Signed(List(U8))) # value, exp
from_literal : [ NaN, ..Signed([Infinity, Value(Digits)]) ]-> ...

Would it be sufficient? (I started forgetting roc syntax)

view this post on Zulip Jasper Woudenberg (Jul 15 2025 at 05:56):

Couple of potential downsides inspired by Haskell, which does a similar thing for literals (if the right extensions are enabled):

view this post on Zulip Jasper Woudenberg (Jul 15 2025 at 06:10):

(I quite like the proposed approach by the way, think I would enjoy using it, just kicking the tires a bit.)

Possible way from_str could be abused: building library APIs that put some complicated parsing in it. For instance:

The lure of using from_str for things like this is that you can get the parsing to happen at compile time. The downside is that error message quality when parsing fails will be bad.

view this post on Zulip Kiryl Dziamura (Jul 15 2025 at 06:13):

If a user passes a string literal as an argument to a function that expects a custom type Foo, what error would show?

If Foo.from_string : Str -> Result(t, err) is defined - the string literal would be passed to it at comptime and the error would be whatever err returns. So the message will be something like error occured during creation type Foo from string literal: err

maybe folks would be surprised when passing in a Str constructed some other way doesn't work.

Good point. The problem not even in the literal vs string but in Result(type, error) vs just type

view this post on Zulip Kiryl Dziamura (Jul 15 2025 at 06:15):

abuse

Can the same abuse happen without the comptime evaluation and literal conversion?

view this post on Zulip Luke Boswell (Jul 15 2025 at 06:17):

A CLI arg-parsing API takes a string like mycmd [ARG] --help --foo <int> and generates a parser for it (some libraries in other languages take this approach).

A http server takes a string like /some/path/:arg?query and generates an endpoint pattern from it.

These sound cool :smiley: -- shame about the errors, but that's probably ok if your ok with something as quick and ergonomic as this. Maybe it might be a problem if people start building webservers and all the endpoints are setup this way :shrug:


Last updated: Jun 16 2026 at 16:19 UTC