so today, we have a Roc type called Nat which is Roc's only target-dependent type
that is, it actually gives different answers depending on what target system you're building for
specifically, if you do an overflow-checked addition on a sufficiently large Nat, it might return Err Overflow on a 32-bit target (e.g. wasm) but not on a 64-bit target
this in turn means you can have pure Roc code which works differently depending on what target machine you're building it for
this in turn means that theoretically you could have pure Roc unit tests that pass on one system but fail on another - and not because they ran out of memory on one system but not another (which can always happen; there's no fixing that!) but rather because they just got different answers
in contrast, in Elm (for example) there's no need to ever have multiple different CI builds for your Elm package in case they get different answers on different targets
so I'd like to explore a world where we didn't have Nat in Roc at all - what would the pros and cons be compared to the current world where we do have it?
so the main place where Nat is used is in length and indexing
e.g. List.len : List * -> Nat
and List.get : List elem, Nat -> Result elem [OutOfBounds]*
for performance reasons, it's important that under the hood we store these as (the equivalent of) Nat
that is, at runtime, a List on WASM should be a 32-bit pointer, a 32-bit length integer, and a 32-bit capacity integer
however, that doesn't mean we have to represent it the same way in userspace
we could have: List.len : List * -> U64
and List.get : List elem, U64 -> Result elem [OutOfBounds]*
and so on
on 32-bit targets, this would involve a cast from U32 to U64
(after inlining and LLVM optimization, it's conceivable that this cast might end up getting removed in a lot of cases, but I wouldn't want to count on it)
and on 64-bit targets it wouldn't involve anything
if there are ever 128-bit targets, and we've hardcoded to U64, then we'd either need to make a major breaking language change, or else miss out on the larger address space
(Java is in this boat; the get method on Array is hardcoded to be a 32-bit signed integer, and decided not to break backwards compatibility by upgrading it for 64-bit targets)
one thing I worry about a bit with the current Nat design is that people will default to choosing it because they don't want to think about number sizes, and it feels like a reasonable default because things like List.len return it
this could theoretically result in code that works fine on their local machine, but then stops working as soon as they try it on wasm - but to be fair, that doesn't seem very likely to come up in practice
another consideration is that if I have a data structure which wants to store a list length init (like for example a parser's current state), then on 32-bit targets, having no alternative but to store U64 means wasting some bytes
also, on machines that don't have hardware support 64-bit arithmetic (unlike wasm, which does support it even though its pointers are 32 bits), having to do 64-bit arithmetic (assuming LLVM didn't optimize away the cast) would be a lot more expensive than being able to do 32-bit arithmetic. So U64 could be slower than Nat for arithmetic operations on those machines.
there's also a very minor learning curve consideration here
right now when teaching numbers Nat needs its own special section where we introduce the concept of "the machine you're building for," which otherwise wouldn't need to be in a beginner tutorial at all
which is a bit of an annoyance for a high-level language, since many high-level languages (e.g. Python, Java, JavaScript, to name the top 3 most popular ones) don't have this distinction when it comes to numbers
the only high-level languages I know of which have a Nat equivalent are Go, which has uintptr, and Swift, which has uint
How would a pointer get passed into roc? Like if I need to pass a pointer to into roc so it can pass it to an effect?
a Box of an empty tag union should work
of note, both Go and Swift overwhelmingly compile to 64-bit targets (e.g. I'm not aware of significant interest for Go or Swift on wasm, or on embedded systems)
Ah, forgot about box
another possibility, if we were worried about the Java "future compatibility" issue, is that we could define Nat to be an unsigned 64-bit integer on all targets, but reserve the right to change that in the future
that might be more error-prone than it's worth though
if we ever do end up in a world of 128-bit targets where collections of more than ~16,000 petabytes (the maximum addressable in a 64-bit integer) are desirable, I guess there are other options like introducing a HugeList builtin or something like that
but that also feels like a distant enough hypothetical future that I'm not convinced it should be much of a factor here :big_smile:
who knows, maybe by then we're all programming telepathically
If someone is concerned with performance and needs to also target multiple platforms could they instead do this?
module MyNat exposing (..)
type MyNat = UInt64
toUInt64 = identity
toUInt32 nat = nat % UInt32.maxValue
when targetting a 64 bit platform. And then when targetting a 32 bit platform they replace the definitions so it looks like this
module MyNat exposing (..)
type MyNat = UInt32
toUInt64 = UInt64.fromUInt32
toUInt32 = identity
I guess a drawback with this approach (besides needing some kind of tool to switch between these two definitions depending on which platform is targetted) is that only user code and 3rd party code that lets the user choose the int size will be efficient. Things like List.len : List * -> U64 won't use the most efficient int size.
yeah I guess I'd be surprised if someone actually wanted to do that in practice haha
another potentially interesting possibility is to disallow overflow-checked arithmetic on Nat
that wouldn't have the learning benefit, but it would mean if you wanted to do overflow checking, you'd first have to convert to U64 or something like that
Richard Feldman said:
yeah I guess I'd be surprised if someone actually wanted to do that in practice haha
I agree! But I think that's because Nat is a narrow use case that shouldn't burden normal users. If someone really needs it then there is a work around
the workaround wouldn't work for packages
only for applications
How often will packages choose a particular int size? In many cases won't it be something like this Int * -> OtherStuff -> Int *?
Do we think roc might ever support a smaller target due to embedded or something of that nature? If so, the chance of 8 or 16 bit Nat are probably a bigger concern than the chance of a 128 bit Nat.
Otherwise, I think that changing the List API in a way that lies to users about the real type and potentially has performance implications is likely a mistake.
Sure we can try and defend against an adversarial developer, but I don't think that is a proper basis for removing Nat. Sure a user might have to port a package to support wasm, but I don't think that will be a high cost or the normal case.
Relatedly the size of some high performance data structures might best be tuned based on the pointer size. So leaving it exposed may enable even more performance in Roc than if it was not exposed.
But overall, I mostly just have a negative gut reaction to the idea. So I am still quite open to it.
I also have a negative gut reaction, but will give it some more thought to possibly come up with any concrete arguments
one useful way to think about it: pretend we have U64 only.
In that world, what's the pitch for introducing Nat to the language? What benefits will it bring to justify the costs?
On 32 bit systems Nat makes it so that:
50% or more work and memory usage is huge. I get this really only applies to indices in Roc, but that is a lot of performance to potentially leave on the table.
only on dev backends though, right?
hm, maybe not actually
? how would LLVM fare better?
Should affect llvm too.
yeah nm
I was thinking about it terms of how LLVM can see unnecessary casts and remove them, but that wouldn't apply here
btw I'm not thinking about this in terms of adversarial developers, more in terms of people doing things that seem like a good idea without realizing the cost
e.g. someone makes a package which advertises that it will tell you whether or not your application is running in a browser, so you can recommend downloading and installing the native app
and the way it "determines" this is by assuming that if Nat is 32 bits it must be wasm, and otherwise it must be native
this might sound like something nobody would do, but consider the history of user agents in browsers https://stackoverflow.com/questions/1114254/why-do-all-browsers-user-agents-start-with-mozilla
heh, I just realized that even removing Nat's ability to do overflow-checked arithmetic wouldn't be sufficient to rule this out
just having wrapping arithmetic is enough, because you could use Num.addWrap with an amount that would overflow, and then see if the result got smaller (meaning that it overflowed)
and if we're going to have Nat, then for performance reasons we'd most likely want to have addWrap (since otherwise you'd have to cast to U64 to get access to addWrap anyway)
so if we did turn length and capacity into I64, then how big would a list or string be on 32-bit targets? Does it become 4+8+8?
or do we leave it as 4+4+4 and do some casting op?
currently it's 4+4+4 for pointer, length, capacity
I kinda feel like the answer here is: There are different kinds of computers in the world and sometimes you just need to deal with that. :shrug:
oh I definitely think we'd leave the storage as-is and do casting at the last minute
yeah de facto ruling out 16-bit targets entirely would be sad
just because they'd be incapable of doing fast register-based 64-bit arithmetic
and wouldn't have any option not to if lengths and indices were always U64
Richard Feldman said:
heh, I just realized that even removing
Nat's ability to do overflow-checked arithmetic wouldn't be sufficient to rule this out
Here is the thing: Either we expose a proper way to check the capabilities of the current platform, or people will find another way.
Another crazy example I recently came across was this thread on Twitter: https://twitter.com/fasterthanlime/status/1537184155536355328
Seemingly, a Go cryptography library turned unexported internal type names into strings to do some kind of ad-hoc ill-defined bug-ridden kind of 'generic dispatch'.
contrary to popular belief, Go has always had generics https://twitter.com/fasterthanlime/status/1537184155536355328/photo/1
- fasterthanlime ๐ (@fasterthanlime)Here is the thing: Either we expose a proper way to check the capabilities of the current platform, or people will find another way.
well, really it's "either there exists any possible way to check the capabilities of the current target, or there isn't"
like if there is no Nat, I'm not sure how someone could even possibly do it
(using pure Roc code I mean - if you're a platform author, it's trivial; you can pass a value from the host to the platform indicating what it is)
I'm taking it as a given (because of Hyrum's law!) that if it's possible, people will do it; the question is whether the benefits of it being possible are worth the costs :big_smile:
thinking about this some more, I think the actual answer here is that it's not feasible to prevent people from determining whether their pure Roc code is being built for a 32-bit or 64-bit target, which eliminates the potential selling point of removing Nat; closing this as resolved!
Richard Feldman has marked this topic as resolved.
Last updated: Jun 16 2026 at 16:19 UTC