so it seems to me that the best way to do custom numbers is to have number literals have the type:
a where [module(a).{
from_int_digits : {
is_negative : Bool,
digits : Iter(U8),
} -> Result(a, OutOfRange),
}]
and then the compiler can unify that with either a builtin number type or a custom user space number
we can also have from_dec_digits for fractional numbers
these can also be bigger than builtin numbers because it's just an iterator of digits (we'd strip out the underscores for you, and give you the negative as a bool)
so userspace arbitrary-size integers can Just Work with number literals even though builtin numbers only go up to I128/U128
this all seems reasonable and nice!
we can call that function at compile time as part of compile-time evaluation of constants, and if it returns Err we give a compile error
we already do this for the builtins, and should be straightforward to generalize it to user-defined number types
the repl solution of not showing the type of number literals also seems good to me
this all brings up another thing that's been bothering me for a long time, and which could be nicely solved by extending this design to strings
the basic problem is that we have functions like read! for reading files
and I often want to give them a string literal, but also I often want to give them a Path, and paths might not be valid utf-8 (e.g. if I got the path from a Dir.list! and the path contained invalid utf-8 on the filesystem, which every major filesystem permits)
one solution is to have File.read! and File.read_path!, one of which takes a Str and one of which takes a Path, and then duplicate every operation like that
it's a similar situation with Url
it's super common for languages to have a solution for this, so that you can give file I/O operations a string literal or a Path and either one Just Works
we could have this if we did string literals the same way as number literals
in other words, number literals have this type:
a where [module(a).{
from_int_digits : {
is_negative : Bool,
digits : Iter(U8),
} -> Result(a, OutOfRange),
}]
and then string literals have this type:
a where [module(a).{
from_str : Str -> Result(a, BadStr)
}]
so then as long as the Path module has that function, you can write a function that takes a Path and call it passing a string literal, e.g. both of these can Just Work:
contents = read!(my_path, Json.utf8)?
contents = read!("x.json", Json.utf8)?
same with Urls
this is something I've always wanted for the file I/O and Http APIs, and if we're already doing it for numbers, it feels like a relatively small change to do it for strings too
and it would really cut down on API surface area without sacrificing correctness at all
I'm curious what others think of all this!
I would expect from_bytes instead of from_int_digits. What digit a is if it's written in hex notation? 10? If it's 10, then hex was parsed by compiler. If it's parsed by compiler - why not bytes immediately? The same with numbers with dot
I was assuming we'd convert from hex to decimal for you
or we can introduce r notation (raw? idk)? 0r<whatever>. like, I want to use base17 so I want to have g as well (I'm not serious here, but you get the idea)
I assume bytes immediately would be less convenient for arbitrary-size types but I could be wrong
anyway I think we could iterate on exactly what format to provide; the real question in my mind is whether this is the overall design we want to use for literals!
the core idea is reasonable. but my experience is too limited to come up with a use case :smile:
I mean for numbers. huh. maybe built-in validation? e.g., a type for auto-wrapping uints
and also string validation. besides path validation, it can be SQL validation! or css validation, or you name it. would it work with string interpolation? likely not
string interpolation Just Works bc first we resolve the interpolation, giving us a normal string, and then we pass that to the conversion function
so it never needs to know it was interpolated in the first place
custom numbers have been one of the most frequently requested language features over the year, and they have been deal-breakers
as in, people who have arrived to Zulip and asked whether Roc supports custom number types have mostly tended to leave after learning that we don't
which is unfortunate given Roc's goal of being nice for the long tail of use cases :smile:
I mean, custom numbers and custom parsers for num literal is not the same
Do you think there is much of a cost to doing this? Is it just a constant bit of extra processing at compile time for every number and string literal in source?
The ergonomics of this look amazing :heart_eyes:
Maybe a.from_num : num where [ num : Num ] -> Result(a, err) is enough? So it's up to roc how to parse number. And then, if you want to have it in runtime - it just works with any Num
x = CustomNum.from_num(42) # x : CustomNum
y = CustomNum.from_num(factorial(4)) # y : Result(CustomNum, err)
so there's also symmetry with from_str
and you still can probably do this:
x : MyCoolBase17
x = "24g"
and then manually parse the string because MyCoolBase17 has from_str
Luke Boswell said:
Do you think there is much of a cost to doing this? Is it just a constant bit of extra processing at compile time for every number and string literal in source?
I think we could make it cheap by caching whether a particular type unifies with string literals, num literals, etc. so that after the first unification, the others are just checking a boolean
Kiryl Dziamura said:
Maybe
a.from_num : num where [ num : Num ] -> Result(a, err)is enough? So it's up to roc how to parse number. And then, if you want to have it in runtime - it just works with anyNumx = CustomNum.from_num(42) # x : CustomNum y = CustomNum.from_num(factorial(4)) # y : Result(CustomNum, err)
this doesn't work if you want to make a numeric type that's bigger than any of Roc's builtin types, e.g. an arbitrary-sized integer that can be bigger than U128
that's where the digits design comes from
this doesn't work if you want to make a numeric type that's bigger than any of Roc's builtin types, e.g. an arbitrary-sized integer that can be bigger than
U128
but from_str works
you have to implement parsing on your own, yes, but is it a big deal?
then we might get inconsistent handling for underscores, we can't introduce new syntax without breaking existing parsers, etc.
easy fix for the hex/octal/etc is to add a base field, which the compiler would supply as 2, 8, 10, 16, etc:
a where [module(a).{
from_int_digits : {
is_negative : Bool,
digits : Iter(U8),
base: U8,
} -> Result(a, OutOfRange),
}]
num literals already have their meaning. if you use 101 you know the value is always 101. but if your custom type implements base2 there - you get an obscure bug. thus there's 0b101 notation
I mean custom number types can always have bugs haha, the question is how error-prone it is for them to implement what we want correctly
and I think giving them arbitrary strings is a lot more error-prone, especially because most number literals won't have underscores (and also they might not think to parse hexadecimal)
seems super easy to just give them digits, a base, and whether it's negative; that way they know exactly what they need to handle. seems like you'd almost need to go out of your way to not handle the edge cases correctly haha
I'm wondering what is what we want here. because one thing is x = CustomType.from_num(101) and another x = 101 with inferred type for x
x = 101 works the same way as today, aside from the type behind the scenes being different
but you're right. it's not related to how custom type parses num literal
It's just implicit without any additional notation
right, which I think is exactly what you want when it comes to number literals (and string literals in the cases mentioned above)
like nobody wants to have to do MyCustomNum.from_num(42) * MyCustomNum.from_num(2) when they could write 42 * 2
One risk... could this be abused to make roc code hard to understand at first glance? Particularly if there's no type annotation. The nominal custom number or thing type might be far away which is hard to understand what your creating :thinking:
I can't really think of ways to abuse it... but the misuse cases might be good to think about too
Could you accidentally put a number or string literal in and it silently becomes the thing because they implemented that method... and now your confused somehow?
I think that array.map(calc) where calc does math with num literals would work differently based on array type and it bothers me. but it's still true with operator overloading that roc would provide so num literal overloading won't add anything more of confusion to the mix. it would only make the mix more consistent
yes, technically it's not overloading, but it feels like that when you don't see types
yeah I'd be curious to see any examples anyone can think of of where this could be misused or lead to confusion
I've scratched my brain, and explored a bit with various LLMs... I cannot think of anything but upsides.
Richard Feldman said:
we can call that function at compile time as part of compile-time evaluation of constants, and if it returns Err we give a compile error
This comptime Err part is really nice
It feels a bit like hidden magic, but I'm in favor overall, syncing between Path and File in basic-cli is already a chore.
I would just make from bytes instead of digits. Cause there is also handling for decimal point and exponents
On tops of that, I think a list would be more convenient for that function than an iterator
But maybe neither is a big deal
I mean maybe something richer is fine, but I would make it like a list of an enum that is a digit wrapper or decimal point and a list for the exponent digits. Maybe make the base explicit as well in the enum...but not sure
yeah I guess a List would be fine since the allocation would happen at compile time anyway
Even better, since it is injected by the compiler, you can make a seamless slice targeting the original source bytes.
Or some arena with all the strings in it
not if there are underscores in it, plus we need to convert from ASCII to a number (including hexadecimal conversion if necessary)
Ok...we still could do a one time copy with reformal to an arena for all numbers and then give slices to each call
But yeah, less convenient
I still don't understand how num literals which have established meaning because of their format (0b, 0x, 0o, 42), have any meaning in decoding not from value but from representation. I get that i128 is the biggest range, but if we need bigger dec, why can't we introduce bigint notation that can be used only with custom types?
Putting it differently,
x = 42
y = 0x2A
Do you expect x and y are equal or not?
1.0e2 and 1.00e2 could be meaningfully different to a scientific library that keeps track of precision.
I honestly don't understand it. It means that roc provides a limited number of num notations and user is able to change their meaning, with no possibility of creating their own notations.
Also, num notation becomes a union itself implicitly (because it's possible to have 42 != 0x2A or 1.0e2 != 1.00e2)
It's the same as if this was possible: "x" != "\u{0078}"
Kiryl Dziamura said:
I honestly don't understand it. It means that roc provides a limited number of num notations and user is able to change their meaning, with no possibility of creating their own notations.
this is a feature, not a bug
giving people control of the meaning of arbitrary number strings is how we end up being unable to change things in the future
extreme examples of userspace constricting language design space being smooshMap
I think it's a good point that different notations being different seems bad, but then maybe that means we should put them all in a consistent format (e.g. take care of the base for you)
I think we likely should normalize numbers to a single format before sending them to the underlying number constructor
So I think that we should guarantee that 0x2 0b10 and 2 are all indistinguishable in the number constructor
I don't think we can guarantee that for exponents though
But they should still be given as just decimal digits
If we normalize, that guarantees that 0x12 == 18 for all numbers types of they support that value (could also just fail to initialize at compile time)
yep, makes sense!
so then we send decimal digits I guess?
(since we can't just send a plain integer since that would cap us at i128)
Yeah, decimal digits with potential decimal point?
And then exponent in the same form but never with a decimal point?
Normalization is a good idea, thank you.
A list of digits is required to support bigint implementations?
I can't think of a better way to support bigint implementations :smile:
would it be sufficient in the general case to do digits plus exponent digits? :thinking:
I'm thinking about how would it work without comptime evaluation
at that point I don't think we need to care about decimal point
CustomNum.from_digits([2, 0, 2, 5])
It's ok, but now if you want to implement serialization, you need repro the implementation but for bytes.
What I mean, isn't bytes list superior?
what do the bytes mean?
Memory representation
what does the memory mean though? haha
like a base-2 integer?
so it's as if you had a big-endian integer type that just keeps going?
Like, how you serialize your custom number to send over the wire
well we can't know how the author of the custom number wants to do that
Hm, true. so from_digits is useful only at comptime?
probably
I think we may actually want to name it more explicitly for its purpose, e.g. from_literal
so it's clear that this type is intended to be compatible with number literals (in a similar way to why we want it to be named plus if you want to work with +, as opposed to naming it add)
Signed(a) : [ Positive(a), Negative(b) ]
Digits : (List(U8), Signed(List(U8))) # value, exp
from_literal : [ NaN, ..Signed([Infinity, Value(Digits)]) ]-> ...
Would it be sufficient? (I started forgetting roc syntax)
Couple of potential downsides inspired by Haskell, which does a similar thing for literals (if the right extensions are enabled):
Foo, what error would show?Str? If read!("x.json", Json.utf8)? is allowed, maybe folks would be surprised when passing in a Str constructed some other way doesn't work.(I quite like the proposed approach by the way, think I would enjoy using it, just kicking the tires a bit.)
Possible way from_str could be abused: building library APIs that put some complicated parsing in it. For instance:
mycmd [ARG] --help --foo <int> and generates a parser for it (some libraries in other languages take this approach)./some/path/:arg?query and generates an endpoint pattern from it.The lure of using from_str for things like this is that you can get the parsing to happen at compile time. The downside is that error message quality when parsing fails will be bad.
If a user passes a string literal as an argument to a function that expects a custom type
Foo, what error would show?
If Foo.from_string : Str -> Result(t, err) is defined - the string literal would be passed to it at comptime and the error would be whatever err returns. So the message will be something like error occured during creation type Foo from string literal: err
maybe folks would be surprised when passing in a
Strconstructed some other way doesn't work.
Good point. The problem not even in the literal vs string but in Result(type, error) vs just type
abuse
Can the same abuse happen without the comptime evaluation and literal conversion?
A CLI arg-parsing API takes a string like mycmd [ARG] --help --foo <int> and generates a parser for it (some libraries in other languages take this approach).
A http server takes a string like /some/path/:arg?query and generates an endpoint pattern from it.
These sound cool :smiley: -- shame about the errors, but that's probably ok if your ok with something as quick and ergonomic as this. Maybe it might be a problem if people start building webservers and all the endpoints are setup this way :shrug:
Last updated: Jun 16 2026 at 16:19 UTC