so I think there's a strong case to be made that we should replace Nat with U64
not to change any data structures, mind you - just any builtin function that currently accepts or returns Nat instead uses U64
so 64-bit targets would have no change, and on 32-bit targets there would be a cast behind the scenes from 32-bit to 64-bit integers (which after LLVM optimization would usually be no difference in practice I suspect)
so an obvious motivation (but not the main one, at least to me) is that it's one less concept to learn in the language, and also it means that when teaching the language to beginners, they don't need to encounter the concept of compilation targets (which isn't a thing you have to learn at all in lots of languages, e.g. the 3 most popular languages - Python, Java, JavaScript)
to me, the main reasons are:
Nat is the only reason Roc code given the same inputs can give different outputs on different targets; without Nat in the language, Roc code given the same inputs gives the same outputs on every target (or crashes, since different targets may have different system resources available). This means that, for example, if I'm running roc test on my 64-bit machine and on my 64-bit CI server, I can have code that will give different answers once deployed to my 32-bit WebAssembly target, despite having passed roc test locally and on CI. This means I can have bugs in production that would have been caught by my roc test tests except that Nat meant something different happened in my build. Also, this can be used by nefarious package authors who want to sneak exploits past my test suite. To prevent this, I would need to run roc test via WebAssembly, which is not currently supported - we'd need to introduce something like roc test --target=wasm32 and then ship a wasm interpreter with every roc binary. The only reason to even consider this is that Nat is in the language.Nat means that change can break existing Roc code, sometimes silently. If Nat is not in the language, we have the option of introducing 16-bit or 128-bit targets as a nonbreaking change (but still might choose not to)the reason for both of these is that you can use Nat to write Roc code that runs differently on different targets, because it overflows at different points - so you can do Num.addChecked 1nat 2^32 and on 32-bit targets it will return Err but on 64-bit targets it will return Ok
this isn't true of any other type in Roc
also, code that casts Nat to U64 today is lossless, but in a hypothetical future where 128-bit targets are a thing, that code will all silently become lossy and can cause bugs
similarly, the code isTarget32bit = Result.isErr (Num.addChecked 1nat 2^32) is accurate today, but inaccurate if we introduce a 16-bit target
(likewise for isTarget64bit = Result.isOk (Num.addChecked 1nat 2^32) and 128-bit targets)
granted, it's not a given that 128-bit targets will ever be a thing (since hardware today only permits using 48 of the 64 bits in practice, and we'd presumably want to start using all 64 bits before there would be demand for upgrading to 128) but in a future where 128-bit targets never happen (or perhaps by the time we want addresses that big, it's so far in the future that something more fundamental has changed about operating systems, and we need to rethink various representations of things anyway), and we only ever have 32-bit and 64-bit targets like today...then what was Nat getting us?
We could have just had U64 and actual target-independence and it would have been fine
supposing 128-bit targets do happen, at least having hardcoded List.len and friends to return U64 gives us options; we can, for example, do any one of these:
Nat builtin functions to return U128 instead of U64. This would theoretically affect a lot of code, but considering how often these values get used without being annotated (e.g. List.get list (List.len list - 2)), they might be nonbreaking in practice in a lot of cases. Also, when it does break code, the upgrade process should be automatable 99% of the time, especially considering it's a safe bet AI will be better by then (128-bit targets are so distant I haven't even seen anyone predict when they might be desirable), so transitioning a given code base from U64 to U128 might actually be completely automatable by then. Certainly it would seem likely to be way less painful than Python 2 -> 3, which was legendarily painful but which did not stop Python becoming arguably the #1 most popular language in the world (at least top 3 if not top 1), so there's precedent for a language making a substantial breaking change, even years after becoming popular, and still continuing to grow in popularity.U64, meaning the change is nonbreaking, but collections on 128-bit targets could only go up to 4 million terabytes in size (whereas 128-bit addressing would technically allow more). Today, Java collections are capped at 2GB because they use I32 for their collection lengths, and considering it's one of the 3 most popular languages ever, this restriction - while presumably frustrating in some situations - clearly has not been deadly to Java's popularity. Maybe the 4 exabyte restriction would be different in this hypothetical future, but if so, there's always the "upgrade to 128-bit and do a breaking change" option.anyway, all things considered, it seems to me that replacing Nat with U64 is the right choice for the language
what do others think?
Just to clarify:
Nat is no longer in the language at allBox {}?What about the size cost of using larger values everywhere? If roc starts supporting 16bit platforms and I want to write a program for a low-resource machine (pico-8?) would forcing all Nats to be 4 times as big create problems with using too much memory?
(Don't weight my comment heavily. I don't have experience with that sort of programming and don't know if that'd be an issue, and it's all just hypothetical anyway.)
That's a good point @Sky Rose, what do you think @Folkert de Vries?
well part of the idea here is that in data structures, we still use the pointer size. I think that mitigates most of the downsides
yes you do have to do some stuff with 64-bit values and could strictly speaking be done with a smaller value, but I suspect that the cost is marginal.
Brendan Hansknecht said:
Just to clarify:
Natis no longer in the language at all- All builtins use 'U64' instead (so walking with an index, U64. also len, U64)
- If a host passes a pointer into the app, it would be as
Box {}?
yep! Although just to be clear, the builtin functions would use U64 but under the hood the data structures would still be usize internally - so for example on 32-bit targets we wouldn't be wasting memory storing 64-bit lengths in memory
I was assuming builtins would all use usize everywhere and just add casting at the boundaries?
yeah exactly!
Ok
I think this will hurt if we ever have 16bit. I don't think it will hurt too bad on 32bit.
I haven't verified on godbolt, but I expect that the fact that we convert builtins to llvm bitcode will make a lot of the casts go away
for example:
List.get list (List.len list - 2)
List.len on a 16-bit target would compile to:
structI see 2 main issues:
List.get on a 16-bit target would compile to:
structso I'm pretty sure LLVM will see all that and realize that the casts from 16-bit to 64-bit and then back to 16-bit are not necessary, and will drop them
Only for small functions that get inlined
sure, but I think that tends to describe the way these functions are used in practice
for example, I think if I were writing some sort of SoA thing in pure Roc, I wouldn't use Nat anyway
I'd use the smallest type that works for my use case, e.g. U32
I don't think so. Many builtins don't get inlined or at least not fully. Even something like List.set is large enough and used in enough places that it doesn't get inlined (though maybe this one got fixed when I increased the inline threshold). List.set is not that large of a function.
Also, many user lambdas may only get partial inlining that hits these issues.
I'm okay with adding "inline always" to those
as in telling LLVM to explicitly inline them
If you catch everything and a user is careful to define types (including intermediate values inside of a function), i guess it could be ok. I still think this is likely to cause weird perf edges.
That said, I really only think it will hurt when dropping to 16 bit. Where a 64 bit value is giant and often you don't have a lot of memory so inlining everything and bloating the executable is bad.
yeah, I guess I'm kind of assuming that:
it would be cool in theory that you could just write 16-bit Roc code in the same style as 32-bit or 64-bit, I just have a hard time imagining that would work in practice :big_smile:
I think we have already pointed out a number of reasons that roc isn't necessarily gonna be great for embedded. If we accept that embedded long term will remain second class for roc (which i think is totally fine, still could be fun to use on embedded and people manage to hack python to work on embedded), i think this is a totally fine change.
The biggest other concern being accidental register pressure leading to more spilling to cache and potential perf hits. Just maybe you have to add a lot of casts to random variables that you normally wouldn't have typed in roc.
I do agree that the overall concern is pretty minor, but it can just randomly be wasteful and is easy to miss when many variables within functions in roc never get a type added to them.
Aside, overall, i think i am for this change and saying that roc is focused on desktop and server systems. As such, though 32 bit and 16bit may be supported (even very well), they are not the core focus. In all cases, high perf should be possible, but smaller bit systems are slightly lower class.
that makes sense to me!
Hmm...what if i want to use a 32 bit hash on 32 bit systems and for that to be faster? Will we expose a built-in to check system info?
Cause can't get that info with nat anymore
I guess we kinda have to expose similar info anyway if we want to expose a good simd chunk size to the user
hm, is that true? I thought we could do simd by offering primitives that are sort of agnostic to the chunk size
either that or we offer primitives that say "this will operate on an exact chunk size of N, but if that's not available in hardware, we'll automatically emulate it in software"
basically so the "simd" logic always works exactly the same way no matter the target - it just might be more or less efficient
Brendan Hansknecht said:
Hmm...what if i want to use a 32 bit hash on 32 bit systems and for that to be faster? Will we expose a built-in to check system info?
I think this only specifically comes up when hashing collection lengths, which seems like it should be a small enough percentage of all hashes being performed that I'm not worried about it
it doesn't come up when hashing pointers because we never hash the pointer itself, but rather dereference its contents and hash that instead
No, you misunderstand. I want to change a huge chunk of my code to use u32 and different algorithms instead of u64. So full replacement for faster code. I am not talking about just hashing a length.
For example, maybe I want to use sha256 on 32bit systems and sha512 on 64bit systems.
:thinking: if sha256 would suffice for the use case, why not use it on both?
This is a contrived example, but imagine that we would prefer sha512 everywhere, but it is too slow on 32bit systems.
hm, but why would we prefer it? :big_smile:
Lets just keep of the contrived example, assume sha512 is way faster on 64bit systems.
So pretend that sha512 is way faster on 64bit and sha256 is way faster on 32bit
I want my code to be fast on both systems, so I need a way to distinguish the systems and pick different code.
Brendan Hansknecht said:
So pretend that sha512 is way faster on 64bit and sha256 is way faster on 32bit
do such algorithms exist though? Like is there any algorithm where the same algorithm runs faster on 64-bit systems but slower on 32-bit systems, and there's another algorithm which runs faster on 32-bit systems but slower on 64-bit systems?
(and does the same thing)
Hashing is always the obvious example, but the general answer is yes.
generally 64bit also has more memory. So it can use more cache and may have different levels of dependencies that makes sense.
Also, since hashing is the base for sets and dictionaries and some algorithms, it indirectly means it effects all of those uses as well.
Also, in some cases you may opt for more overflow safety on 64 bit systems, but give that up intentionally on 32 bit systems to save memory.
hm, ok so I wonder what's the specific application scenario where someone wants this
it has to be something where I'm writing an application that gets deployed both to 32-bit and to 64-bit targets, and hashing is a significant part of performance
Also, all integer operations will be slower if you use a U64 instead of U32 on a 32bit system. So if you want to not pessimize on 32bit systems, some code may want to use U32 instead of U64 on 32bit systems. Though I guess you could argue that you should use the use U32 on both cause it is faster on both due to less memory pressure.
it has to be something where I'm writing an application that gets deployed both to 32-bit and to 64-bit targets, and hashing is a significant part of performance
Writing a library? That you want usable fast on both?
yeah I think that's just a specific case of "use the smallest integer type you can get away with"
ah fair point
so I guess the "I want to have a library that does something different on 32-bit vs 64-bit targets" is kind of a separate discussion from Nat itself
and the main question on my mind there is whether it's worth it to enable that at the cost of giving up "Roc code gives the same outputs for the same inputs on all targets, so you never need to run roc test on different targets"
and my feeling right now is that it's not worth it, and target-aware Roc code shouldn't be a thing, even if that means we give up some perf in the specific scenario where you have one hashing algorithm that runs faster on 32-bit target and another that would run faster on 64-bit targets
kinda... Though currently Nat enables this.
Also, if your goal is to fix "so you never need to run roc test on different targets". Enabling doing different things based on 32 vs 64 would add this problem right back in. Hopefully library code is written well and only they need to test on multiple systems, but it could definitely affect end users. So I think saying we will remove Nat, but maybe add doing different things based on 32 vs 64, it feels like you haven't gained much.
Though I guess most places will probably only ever run code on 32 bit or 64 bit computation systems (note, wasm is a 64bit computation system), not both, so maybe it is just fine.
Brendan Hansknecht said:
Also, if your goal is to fix "so you never need to run
roc teston different targets". Enabling doing different things based on 32 vs 64 would add this problem right back in.
oh that's what I'm saying - I don't think we should enable doing different things based on 32 bit vs 64 bit :big_smile:
I don't think the upsides would outweigh the downsides in practice
I guess what I am trying to say is that Nat, doesn't have much value, but having the best perf when using hashing, dictionaries, and sets matters a lot. So I think those should be the focus of the discussion around roc test being consistent on all platforms.
well the builtin dictionaries and sets wouldn't be affected by this
we can have them do whatever we want under the hood, and it won't be observable in userspace because we've already made their hashing functions unobservable so we can upgrade them as a nonbreaking change :grinning:
so it would only affect hashing functions written in userspace
Currently it wouldn't but personally I don't think that solves the issue. What if I want to hash a file.
There are lots of cases for userland hashing.
sure
Also, we may want to expose changing the hash for dict in the long term. That or we want to implement at least 4 different hashing algorithms in the standard library dictionary.
interesting! what would be the use cases there? some hashing algorithms being faster for some key types than others?
2 per target. One for short and one for long data. That would at least be a rough approximation.
nice, yeah that makes sense - although we can just do that automatically
since we know the types at compile time
I agree that we can make an extremely fast standard library dictionary for most types. I still think that having fast and flexible userland hashing is important.
I personally don't like too much bespoke standard when it isn't needed.
Only the standard library dict is fast. No other userland datastructures can get close because they can't use the magic hash function in the standard library that is impossible to write in roc.
hm, ok I think it would help to walk through a specific example - what are some specific pairs of hashing algorithms where:
oh, thats easy. wyhash and wyhash32.
wyhash operates on larger blocks and use 64bit math. wyhash32 use smaller blocks and 32bit math
The larger blocks and low cost of 64bit math make wyhash faster on 64bit system. The cheaper math make wyhash32 faster on 32bit systems
A number of hashing algorithms are developed in pairs like this.
gotcha, makes sense!
Also, wyhash has a few more variants if you want to tailor more to arm cpus that are missing/occasional have extra features.
so in that situation, always using wyhash64 would be optimal as long as it's being run on 64-bit CPUs, even if you're running on a 32-bit target like wasm32 (so long as it's actually running on a 64-bit CPU)
so the issue would be the specific scenario where I have an application that wants to be used on a machine with a 64-bit CPU and also on a machine with a 32-bit CPU, like a raspberry pi
libraries aside (which could presumably be vendored if absolutely necessary), if I really wanted wyhash32 on one and wyhash64 on the other, it's still possible to do this by:
Wyhash.rocobviously that's much less ergonomic as having userspace selection between the two targets, but it would work. If I really really needed that performance, I could get it.
but in that specific scenario I also wonder about: if my application runs well on a rpi, it can't be too CPU-intensive in general...so is it really going to be a noticeable problem on 64-bit targets if I just choose wyhash32 always?
If someone told you that all hashing would be half as fast, would you accept that?
waste is still waste, why give a worse experience on 64bit systems.
but the experience is only worse if you notice it :big_smile:
Also, I do specifically expect this to come up the most with libraries. We totally could standardize on maintaining two versions of all functions were this perf matters. Then just manually change? Seems like terrible ergonomics for an easy to solve problem.
when Target.registerWidth is
32 ->
64 ->
like if you have someone run two versions of the program, one using wyhash64 and one using wyhash32, and they can't tell a difference, I don't think there's a problem
Do you ever expect a roc desktop or cli app to hash files that are large (or many of them so it adds up to a lot of data). You will feel the 2x there.
but remember we're only talking about apps that run on raspberry pi too
Sounds like any linux cli app.
I don't think finding examples where this could matter is hard.
nice! examples would be helpful :thumbs_up:
Do none of the things I listed above count?
oh I mean like a specific CLI app that you'd want to run on both a desktop and on a rpi
that does hashing of large files
git, roc, any compiler with incremental compilation, any build tool
Essentially anything that interacts with files and wants to be able to short circuit by using a hash.
:thinking: for files specifically, platforms could offer a function for hashing them
that could be target-specific, since platforms can already run target-specific host code
Sure, a platform can do anything.
We could say that about many features in Roc.
sure, but file I/O is already coupled to platforms
I think you should think about sockets, the postgres library recently built in roc, and general platform fragmentation.
all of that postgres library could be in a platform, but we want it to be possible in roc.
I think this general class should fall in the same category.
true
but another good question to ask is: suppose everyone in the Roc ecosystem does 64-bit hashing, what are the specific bad things that happen?
Many 32bit systems are multiple times slower any time they hash. Any app that depends decently on hashing has a significant unnecessary slowdown. All applications that use hashing take more battery life due to increased computation. We can never do hashing on 16 bit systems (well we can, but the perf would make it non-viable)
so specifically, if someone build a roc app that compiles to 32-bit raspberry pi, and does hashing of large files, then it will run slower on raspberry pi unless they're using a platform that exposes a primitive for reading and hashing files (which uses 32-bit or 64-bit under the hood depending on target), or they go out of their way to accept worse ergonomics specifically for their hashing function (and possibly vendoring libraries if need be) in that they need to swap out the file that does the hashing when building for rpi
I think it's important to note that "it has to run massively slower and everyone just has a bad experience" is only true if the application author is willing to go to the trouble of building and distributing 32-bit binaries for rpi but not willing to do a custom build to swap out the hashing function
that's an important distinction to me, because there is a big difference in my mind between:
especially when it's a very uncommon use case
Why add the friction though? Also, many apps are death by 1000 papercuts. Repeat a small slowdown in a bunch of places (or just bad architecture from the beginning) and you end up with the piles of slow apps that exist today. Everyone spent waiting longer or with a more jank experience.
well the reason to add the friction is to have the language-wide guarantee that Roc code gives the same answers regardless of target
Also, I don't see the value in roc test being the same on all targets. You are still building on top of a platform that needs to be tested. So you fundamentally need to test on all the targets you deploy to.
to flip that around, "why sacrifice that language-wide guarantee for the entire ecosystem for the sake of making a build step a bit more convenient for the 0.0001% of Roc programmers who are building for both 32-bit raspberry pi and desktop applications and doing enough hashing that target-specific hashing makes a noticeable performance difference"
hm, yeah I guess that's true of roc test with expect-fx tests
really I'm trying to use roc test as a shorthand for "cross-target concerns"
like concretely I started thinking about this because of wanting to use Roc at work by calling Roc functions from NodeJS via WebAssembly
and I think about:
which means all of our top-level expects, everything we try out in the repl...it might actually do something different in production
like there might be bugs because somewhere in some Roc code path there's a conditional based on the target, and it's just doing something different - and there's a bug in that much-less-tested code path
and we don't notice it until we've done a production deploy and got bitten by it
and it seems like today, this same concern could happen to any wasm application
and more to the point, this is not a concern in JavaScript, Java, Python, Elm, ...
it's a case where Roc has a category of potential production errors that similar languages don't have
or rather, similarly high-level languages
and generally speaking we're trying to remove entire categories of errors compared to other high-level languages, rather than introducing them :sweat_smile:
but at the same time, we also want to run faster than them, so there is definitely tension here!
especially considering we also want to be useful on a variety of targets - e.g. Golang can say "this is designed for servers" but our scope isn't that limited
I think this is a case of boundaries. When something like Target.registerWidth is used correctly, it should never visibly do something different in a way that affects tests. That is the same with all of the standard library. Even though small string size is different or the hashing algorithm, it shouldn't cause userspace effects based on target. That is also generally the case with platforms.
What I am trying to say is that I do think the cases where something like Target.registerWidth should be used are extremely limited, but they tend to be the kind of stuff where performance really matters, and a whole chain of things depend on it. Hopefully because of the huge chain of dependencies, you can trust the code and not worry about cross platform issues.
This happens all the time: v8, cpython, numpy, any python library that calls c code, any java library that calls c code, the jvm itself. I bet multiple of these have had bugs that affect specific targets. None the less, people still use them, test them, and ignore the platform constraints (while getting platform specific speed ups).
Roc aims to be to have higher peak performance than all of these languages. As such, I think there will be some libraries where this matters. I agree with you that the rest of Roc should not need to care about this. They should just use the library, get amazing speeds, and never think about the fact it is optimized based on the target. I think removing Nat is completely reasonable, but if we do so, I think that we should add in some sort of Target.registerWidth.
Other languages and tools aren't immune to the issues you list. Also, roc still wouldn't be immune due to being on top of a platform.
these are good points!
I wonder if there's some way to sort of "limit the blast radius" here, or maybe make it more visible what subset of the program is target-specific, kinda like Rust unsafe
for example, maybe instead of having it be an expression-level thing, it's that packages can use one of two different modules depending on target
so you could tell right in the package's module header whether it was doing this
and if so, what the affected code would be
interesting idea. Yeah something of that nature could potentially be reasonable.
Maybe a bit more bug prone though. I would assume that would make it more likely that they don't get updated together
having them close in code is probably a good thing.
Also, other thought on testing, if we only expose 32 vs 64 register width, it totally can be fully tested on one system. So instead of testing for wasm, you are testing for 32bit and just using only 32bit assembly. So running a 32bit application on a 64bit system
whoa that's very cool, I didn't think of that!
theoretically we could even consider doing it automatically for all the relevant targets, if we could cheaply enough (e.g. using the module dependency graph) detect which code paths could possibly give different answers per target :thinking:
it would be pretty cool to get an error like "hey this test passes on 64-bit targets but not on 32-bit targets" after running roc test normally, not even thinking about it
Could even have more than 1 mode if it ends up being too slow.
it's good that this is decoupled from Nat, since it means figuring out what to do (e.g. per-target modules vs builtin constant vs etc.) doesn't block changing the Nat APIs
Yeah. I totally agree. I think all of these Nat changes are completely great assuming we recognize what restrictions it's adding and work on a plan to alleviate that.
Computers sure are a pain aren’t they? 😅
I like this idea of being able to choose which size you use, but in an obvious explicit way instead of the compiler silently choosing for you. And especially the idea of being able to run the 32 bit version to test it on a 64bit machine (or visa versa). Most people will never think about it, so get all the benefits of removing Nat.
Another idea for if we get rid of Nat: What if by default List functions always return u64, but if you know you're on a platform where the performance/size matters, and you can guarantee that the list will never be too big, you could use a List8 that returned u8, or a Dict32 that used a 32bit hash.
It's explicit and opt in, so most people get the benefits of not using Nat. It would have consistent behavior no matter which target you compile for. It allows optimizing which implementation to use independent of the target (what if you want the performance of a List8 in one place, and capacity of a List64 List in another), and is pretty future proof for adding 16bit or 128bit targets. And it combines well with the other idea of explicitly checking the target, if you want to use that to decide which implementation to use.
The downside is that it's a step towards the land of Java offering 13 different implementations of Dictionaries.
Just a clarify, in the current proposal, on a 32bit system, you would get what you have named a List32 by default. It is just when we return a size or index to the user, we up cast it to a U64
So impl matches the host machine, but the API always matches U64.
Ah, right, thanks.
Brendan Hansknecht said:
So impl matches the host machine, but the API always matches
U64.
So, if I'm building for a 32bit system and I use, say List.length the type will be U64? wouldn't that confuse the end user? Is that why we have Nat currently? for typing system-bound values?
as a complete noob in system development, I would prefer to have something explicitly dynamic, knowing that it could break at different points in different systems - and if I wanted to deploy in a system different of my own I would probably need to test my application in that system as well.
wouldn't it still make it for a simple experience for 99.9% of the system that are being developed and deployed on similar hardware? (and the example given of CI using 64bit and then production using 32bit -- would you really catch these types of errors on unit tests? and wouldn't it be really advised to test your application on a similar machine?) - seems like solving it in Roc is what is at hand, but ideally this would be a infrastructure problem, am I tripping?
So, if I'm building for a 32bit system and I use, say
List.lengththe type will beU64? wouldn't that confuse the end user?
Why is that confusing? List.length always returns a U64. That seems like a very clear contract. If you don't know systems dev, you will not know that returning U64 is a little strange. If you do know systems dev, you may question it, but fundamentally a U32 fits in a U64.
knowing that it could break at different points in different systems
What can break? the U64 will always work.
and wouldn't it be really advised to test your application on a similar machine?
Probably, though roc being a sandboxed language does have some ability to make this unnecessary. Think of python or java or js. You don't feel the need to test on an rpi and on your windows desktop and on your mac laptop. You just run the tests on your linux ci server and move on.
@Brendan Hansknecht makes sense!
@Brendan Hansknecht few more questions, sir?
how does python, js and java deal with this particular problem? or they don't because their scope is narrower? (embedded systems for instance)
I totally hear what Richard said about maybe our use case falls out of embedded systems most of the time but would things like console game development also face these issues?
what is the best ergonomic strategy currently used for the code-once-run-anywhere when there are multiple system architectures involved?
is the code-once approach not the best ergonomic in these scenarios since you would need to optimize for the opaque system-optimizer algorithm instead of just accessing the different strategies for each system directly for your own, very specific, use case?
how does python, js and java deal with this particular problem?
Looking at Nat specifically, none of them have it. Python has an infinitely growing integer type. Java use an u32 for indexing (so you can't have an array with more than 4 billion-ish elements). JS has their weird number type (and is limited to 32 bits worth of elements in an array)
For the general optimization based on platform. I think essentially no code written in these languages does that. They may call C code that is optimized for the platform, but generally that is in large, well tested libraries that are assumed to be correct on all platforms. Of course in the case of Java and JS, the jit can optimize for the target machine (though that will be limited by data types), and python only gets any achitecture specific optimizations added to the interpretter.
maybe our use case falls out of embedded systems most of the time but would things like console game development also face these issues?
Console shouldn't hit issue with this, but will probably hit issues with reference counting and memory related costs. They are basically modern desktops with slightly limited resources where you want to use every drop of performance. Roc should have no issue replacing parts of games where lua would be used, but a pure roc console game might be very allocation heavy depending on how it is written (so would a bad c++ program though)
As an aside on embedded. Embedded systems are honestly really powerful. Micropython can run on a lot of embedded devices. It has a cost, but it is usable. I believe that they mostly just lock down python types and restrict them to more basic c like types (e.g. no growing ints, just i32)
what is the best ergonomic strategy currently used for the code-once-run-anywhere when there are multiple system architectures involved?
Write zig/c with good data oriented design principles and architecture generic simd/swar. Compile it for every target you care about.
Yeah, I don't think there is a real answer here. Nothing I know of truly targets both 16bit machines and 64 bit machines. Many target 32 and 64, but even though systems have restrictions because of it. Like java having restricted array size.
is the code-once approach not the best ergonomic in these scenarios since you would need to optimize for the opaque system-optimizer algorithm instead of just accessing the different strategies for each system directly for your own, very specific, use case?
I think Mike Acton in his DOD talk went over this some. If you want to make something with good performance, you need to know all of the details of the concrete set of systems that you are optimizing for. Of course, it would be better to target individual systems and design for each of those specifically, but a small or similar enough group of systems is also fine. Having a vague or large set of systems is simply not possible to optimize for.
In the case of libraries/core language design, generally scope is small enough that you can relatively well design each feature to run well on most systems. This does require picking specific code to run based on the systems though.
Brendan Hansknecht said:
Also, other thought on testing, if we only expose 32 vs 64 register width, it totally can be fully tested on one system. So instead of testing for wasm, you are testing for 32bit and just using only 32bit assembly. So running a 32bit application on a 64bit system
Does every reasonably common 64-bit arch include compatibility support for a 32-bit ISA? I know that's the case for x86_64, but iirc, that's not the case for all others... Or do you mean compiling to use 32-bit registers and ops despite being on a 64-bit arch?
Aren't these properties we can just prove or constrain in the compiler for Roc code (assuming the platform code is doing the right thing)?
Last updated: Jun 16 2026 at 16:19 UTC