✔ what if we didn't have Nat? · ideas

so today, we have a Roc type called Nat which is Roc's only target-dependent type

Richard Feldman (Jul 13 2022 at 18:51):

that is, it actually gives different answers depending on what target system you're building for

Richard Feldman (Jul 13 2022 at 18:52):

specifically, if you do an overflow-checked addition on a sufficiently large Nat, it might return Err Overflow on a 32-bit target (e.g. wasm) but not on a 64-bit target

Richard Feldman (Jul 13 2022 at 18:52):

this in turn means you can have pure Roc code which works differently depending on what target machine you're building it for

Richard Feldman (Jul 13 2022 at 18:53):

this in turn means that theoretically you could have pure Roc unit tests that pass on one system but fail on another - and not because they ran out of memory on one system but not another (which can always happen; there's no fixing that!) but rather because they just got different answers

Richard Feldman (Jul 13 2022 at 18:54):

in contrast, in Elm (for example) there's no need to ever have multiple different CI builds for your Elm package in case they get different answers on different targets

Richard Feldman (Jul 13 2022 at 18:55):

so I'd like to explore a world where we didn't have Nat in Roc at all - what would the pros and cons be compared to the current world where we do have it?

Richard Feldman (Jul 13 2022 at 18:55):

for performance reasons, it's important that under the hood we store these as (the equivalent of) Nat

Richard Feldman (Jul 13 2022 at 18:56):

that is, at runtime, a List on WASM should be a 32-bit pointer, a 32-bit length integer, and a 32-bit capacity integer

Richard Feldman (Jul 13 2022 at 18:56):

Richard Feldman (Jul 13 2022 at 18:57):

(after inlining and LLVM optimization, it's conceivable that this cast might end up getting removed in a lot of cases, but I wouldn't want to count on it)

Richard Feldman (Jul 13 2022 at 18:58):

Richard Feldman (Jul 13 2022 at 18:59):

if there are ever 128-bit targets, and we've hardcoded to U64, then we'd either need to make a major breaking language change, or else miss out on the larger address space

Richard Feldman (Jul 13 2022 at 18:59):

(Java is in this boat; the get method on Array is hardcoded to be a 32-bit signed integer, and decided not to break backwards compatibility by upgrading it for 64-bit targets)

Richard Feldman (Jul 13 2022 at 19:00):

one thing I worry about a bit with the current Nat design is that people will default to choosing it because they don't want to think about number sizes, and it feels like a reasonable default because things like List.len return it

Richard Feldman (Jul 13 2022 at 19:01):

this could theoretically result in code that works fine on their local machine, but then stops working as soon as they try it on wasm - but to be fair, that doesn't seem very likely to come up in practice

Richard Feldman (Jul 13 2022 at 19:02):

another consideration is that if I have a data structure which wants to store a list length init (like for example a parser's current state), then on 32-bit targets, having no alternative but to store U64 means wasting some bytes

Richard Feldman (Jul 13 2022 at 19:03):

also, on machines that don't have hardware support 64-bit arithmetic (unlike wasm, which does support it even though its pointers are 32 bits), having to do 64-bit arithmetic (assuming LLVM didn't optimize away the cast) would be a lot more expensive than being able to do 32-bit arithmetic. So U64 could be slower than Nat for arithmetic operations on those machines.

Richard Feldman (Jul 13 2022 at 19:04):

right now when teaching numbers Nat needs its own special section where we introduce the concept of "the machine you're building for," which otherwise wouldn't need to be in a beginner tutorial at all

Richard Feldman (Jul 13 2022 at 19:05):

which is a bit of an annoyance for a high-level language, since many high-level languages (e.g. Python, Java, JavaScript, to name the top 3 most popular ones) don't have this distinction when it comes to numbers

Richard Feldman (Jul 13 2022 at 19:06):

the only high-level languages I know of which have a Nat equivalent are Go, which has uintptr, and Swift, which has uint

Brendan Hansknecht (Jul 13 2022 at 19:07):

How would a pointer get passed into roc? Like if I need to pass a pointer to into roc so it can pass it to an effect?

Richard Feldman (Jul 13 2022 at 19:07):

of note, both Go and Swift overwhelmingly compile to 64-bit targets (e.g. I'm not aware of significant interest for Go or Swift on wasm, or on embedded systems)

Brendan Hansknecht (Jul 13 2022 at 19:08):

Richard Feldman (Jul 13 2022 at 19:11):

another possibility, if we were worried about the Java "future compatibility" issue, is that we could define Nat to be an unsigned 64-bit integer on all targets, but reserve the right to change that in the future

Richard Feldman (Jul 13 2022 at 19:12):

Richard Feldman (Jul 13 2022 at 19:15):

if we ever do end up in a world of 128-bit targets where collections of more than ~16,000 petabytes (the maximum addressable in a 64-bit integer) are desirable, I guess there are other options like introducing a HugeList builtin or something like that

Richard Feldman (Jul 13 2022 at 19:16):

but that also feels like a distant enough hypothetical future that I'm not convinced it should be much of a factor here :big_smile:

Richard Feldman (Jul 13 2022 at 19:16):

Martin Stewart (Jul 13 2022 at 19:18):

If someone is concerned with performance and needs to also target multiple platforms could they instead do this?

module MyNat exposing (..)

type MyNat = UInt64

toUInt64 = identity

toUInt32 nat = nat % UInt32.maxValue

when targetting a 64 bit platform. And then when targetting a 32 bit platform they replace the definitions so it looks like this

module MyNat exposing (..)

type MyNat = UInt32

toUInt64 = UInt64.fromUInt32

toUInt32 = identity

I guess a drawback with this approach (besides needing some kind of tool to switch between these two definitions depending on which platform is targetted) is that only user code and 3rd party code that lets the user choose the int size will be efficient. Things like List.len : List * -> U64 won't use the most efficient int size.

Richard Feldman (Jul 13 2022 at 19:20):

yeah I guess I'd be surprised if someone actually wanted to do that in practice haha

Richard Feldman (Jul 13 2022 at 19:20):

another potentially interesting possibility is to disallow overflow-checked arithmetic on Nat

Richard Feldman (Jul 13 2022 at 19:21):

that wouldn't have the learning benefit, but it would mean if you wanted to do overflow checking, you'd first have to convert to U64 or something like that

Martin Stewart (Jul 13 2022 at 19:23):

I agree! But I think that's because Nat is a narrow use case that shouldn't burden normal users. If someone really needs it then there is a work around

Richard Feldman (Jul 13 2022 at 19:23):

Martin Stewart (Jul 13 2022 at 19:25):

How often will packages choose a particular int size? In many cases won't it be something like this Int * -> OtherStuff -> Int *?

Brendan Hansknecht (Jul 13 2022 at 19:42):

Do we think roc might ever support a smaller target due to embedded or something of that nature? If so, the chance of 8 or 16 bit Nat are probably a bigger concern than the chance of a 128 bit Nat.

Brendan Hansknecht (Jul 13 2022 at 19:46):

Otherwise, I think that changing the List API in a way that lies to users about the real type and potentially has performance implications is likely a mistake.

Sure we can try and defend against an adversarial developer, but I don't think that is a proper basis for removing Nat. Sure a user might have to port a package to support wasm, but I don't think that will be a high cost or the normal case.

Brendan Hansknecht (Jul 13 2022 at 19:48):

Relatedly the size of some high performance data structures might best be tuned based on the pointer size. So leaving it exposed may enable even more performance in Roc than if it was not exposed.

Brendan Hansknecht (Jul 13 2022 at 19:49):

But overall, I mostly just have a negative gut reaction to the idea. So I am still quite open to it.

Qqwy / Marten (Jul 13 2022 at 19:49):

I also have a negative gut reaction, but will give it some more thought to possibly come up with any concrete arguments

Richard Feldman (Jul 13 2022 at 20:04):

In that world, what's the pitch for introducing Nat to the language? What benefits will it bring to justify the costs?

Brendan Hansknecht (Jul 13 2022 at 20:13):

Brendan Hansknecht (Jul 13 2022 at 20:15):

50% or more work and memory usage is huge. I get this really only applies to indices in Roc, but that is a lot of performance to potentially leave on the table.

Richard Feldman (Jul 13 2022 at 20:16):

Richard Feldman (Jul 13 2022 at 20:17):

Folkert de Vries (Jul 13 2022 at 20:17):

Brendan Hansknecht (Jul 13 2022 at 20:17):

Richard Feldman (Jul 13 2022 at 20:18):

I was thinking about it terms of how LLVM can see unnecessary casts and remove them, but that wouldn't apply here

Richard Feldman (Jul 13 2022 at 20:19):

btw I'm not thinking about this in terms of adversarial developers, more in terms of people doing things that seem like a good idea without realizing the cost

Richard Feldman (Jul 13 2022 at 20:20):

e.g. someone makes a package which advertises that it will tell you whether or not your application is running in a browser, so you can recommend downloading and installing the native app

Richard Feldman (Jul 13 2022 at 20:20):

and the way it "determines" this is by assuming that if Nat is 32 bits it must be wasm, and otherwise it must be native

Richard Feldman (Jul 13 2022 at 20:21):

Richard Feldman (Jul 13 2022 at 20:22):

heh, I just realized that even removing Nat's ability to do overflow-checked arithmetic wouldn't be sufficient to rule this out

Richard Feldman (Jul 13 2022 at 20:23):

just having wrapping arithmetic is enough, because you could use Num.addWrap with an amount that would overflow, and then see if the result got smaller (meaning that it overflowed)

Richard Feldman (Jul 13 2022 at 20:24):

and if we're going to have Nat, then for performance reasons we'd most likely want to have addWrap (since otherwise you'd have to cast to U64 to get access to addWrap anyway)

Brian Carroll (Jul 13 2022 at 20:34):

so if we did turn length and capacity into I64, then how big would a list or string be on 32-bit targets? Does it become 4+8+8?

Brian Carroll (Jul 13 2022 at 20:34):

Brian Carroll (Jul 13 2022 at 20:35):

Brian Carroll (Jul 13 2022 at 20:45):

I kinda feel like the answer here is: There are different kinds of computers in the world and sometimes you just need to deal with that. :shrug:

Richard Feldman (Jul 13 2022 at 20:45):

oh I definitely think we'd leave the storage as-is and do casting at the last minute

Richard Feldman (Jul 13 2022 at 20:46):

Qqwy / Marten (Jul 13 2022 at 21:57):

Here is the thing: Either we expose a proper way to check the capabilities of the current platform, or people will find another way.
Another crazy example I recently came across was this thread on Twitter: https://twitter.com/fasterthanlime/status/1537184155536355328
Seemingly, a Go cryptography library turned unexported internal type names into strings to do some kind of ad-hoc ill-defined bug-ridden kind of 'generic dispatch'.

contrary to popular belief, Go has always had generics https://twitter.com/fasterthanlime/status/1537184155536355328/photo/1

- fasterthanlime 🌌 (@fasterthanlime)

Brendan Hansknecht (Jul 13 2022 at 22:09):

Richard Feldman (Jul 13 2022 at 22:21):

well, really it's "either there exists any possible way to check the capabilities of the current target, or there isn't"

Richard Feldman (Jul 13 2022 at 22:21):

(using pure Roc code I mean - if you're a platform author, it's trivial; you can pass a value from the host to the platform indicating what it is)

Richard Feldman (Jul 13 2022 at 22:22):

I'm taking it as a given (because of Hyrum's law!) that if it's possible, people will do it; the question is whether the benefits of it being possible are worth the costs :big_smile:

Richard Feldman (Jul 20 2022 at 00:07):

thinking about this some more, I think the actual answer here is that it's not feasible to prevent people from determining whether their pure Roc code is being built for a 32-bit or 64-bit target, which eliminates the potential selling point of removing Nat; closing this as resolved!

Stream: ideas

Topic: ✔ what if we didn't have Nat?

Richard Feldman (Jul 13 2022 at 18:51):

Richard Feldman (Jul 13 2022 at 18:51):

Richard Feldman (Jul 13 2022 at 18:52):

Richard Feldman (Jul 13 2022 at 18:52):

Richard Feldman (Jul 13 2022 at 18:53):

Richard Feldman (Jul 13 2022 at 18:54):

Richard Feldman (Jul 13 2022 at 18:55):

Richard Feldman (Jul 13 2022 at 18:55):

Richard Feldman (Jul 13 2022 at 18:55):

Richard Feldman (Jul 13 2022 at 18:55):

Richard Feldman (Jul 13 2022 at 18:55):

Richard Feldman (Jul 13 2022 at 18:56):

Richard Feldman (Jul 13 2022 at 18:56):

Richard Feldman (Jul 13 2022 at 18:56):

Richard Feldman (Jul 13 2022 at 18:56):

Richard Feldman (Jul 13 2022 at 18:56):

Richard Feldman (Jul 13 2022 at 18:57):

Richard Feldman (Jul 13 2022 at 18:57):

Richard Feldman (Jul 13 2022 at 18:58):

Richard Feldman (Jul 13 2022 at 18:59):

Richard Feldman (Jul 13 2022 at 18:59):

Richard Feldman (Jul 13 2022 at 19:00):

Richard Feldman (Jul 13 2022 at 19:01):

Richard Feldman (Jul 13 2022 at 19:02):

Richard Feldman (Jul 13 2022 at 19:03):

Richard Feldman (Jul 13 2022 at 19:04):

Richard Feldman (Jul 13 2022 at 19:04):

Richard Feldman (Jul 13 2022 at 19:05):

Richard Feldman (Jul 13 2022 at 19:06):

Brendan Hansknecht (Jul 13 2022 at 19:07):

Richard Feldman (Jul 13 2022 at 19:07):

Richard Feldman (Jul 13 2022 at 19:07):

Brendan Hansknecht (Jul 13 2022 at 19:08):

Richard Feldman (Jul 13 2022 at 19:11):

Richard Feldman (Jul 13 2022 at 19:12):

Richard Feldman (Jul 13 2022 at 19:15):

Richard Feldman (Jul 13 2022 at 19:16):

Richard Feldman (Jul 13 2022 at 19:16):

Martin Stewart (Jul 13 2022 at 19:18):

Richard Feldman (Jul 13 2022 at 19:20):

Richard Feldman (Jul 13 2022 at 19:20):

Richard Feldman (Jul 13 2022 at 19:21):

Martin Stewart (Jul 13 2022 at 19:23):

Richard Feldman (Jul 13 2022 at 19:23):

Richard Feldman (Jul 13 2022 at 19:23):

Martin Stewart (Jul 13 2022 at 19:25):

Brendan Hansknecht (Jul 13 2022 at 19:42):

Brendan Hansknecht (Jul 13 2022 at 19:46):

Brendan Hansknecht (Jul 13 2022 at 19:48):

Brendan Hansknecht (Jul 13 2022 at 19:49):

Qqwy / Marten (Jul 13 2022 at 19:49):

Richard Feldman (Jul 13 2022 at 20:04):

Brendan Hansknecht (Jul 13 2022 at 20:13):

Brendan Hansknecht (Jul 13 2022 at 20:15):

Richard Feldman (Jul 13 2022 at 20:16):

Richard Feldman (Jul 13 2022 at 20:17):

Folkert de Vries (Jul 13 2022 at 20:17):

Brendan Hansknecht (Jul 13 2022 at 20:17):

Richard Feldman (Jul 13 2022 at 20:18):

Richard Feldman (Jul 13 2022 at 20:18):

Richard Feldman (Jul 13 2022 at 20:19):

Richard Feldman (Jul 13 2022 at 20:20):

Richard Feldman (Jul 13 2022 at 20:20):

Richard Feldman (Jul 13 2022 at 20:21):

Richard Feldman (Jul 13 2022 at 20:22):

Richard Feldman (Jul 13 2022 at 20:23):

Richard Feldman (Jul 13 2022 at 20:24):

Brian Carroll (Jul 13 2022 at 20:34):

Brian Carroll (Jul 13 2022 at 20:34):

Brian Carroll (Jul 13 2022 at 20:35):

Brian Carroll (Jul 13 2022 at 20:45):

Richard Feldman (Jul 13 2022 at 20:45):

Richard Feldman (Jul 13 2022 at 20:46):

Richard Feldman (Jul 13 2022 at 20:46):

Richard Feldman (Jul 13 2022 at 20:46):

Qqwy / Marten (Jul 13 2022 at 21:57):

Brendan Hansknecht (Jul 13 2022 at 22:09):

Richard Feldman (Jul 13 2022 at 22:21):