Currently everything in Roc is essentially passed as a constant value, with the special exception that lists and strings can be edited in place if they only have a single reference. That being said, they are technically still passed by constant value. Just the data they point to is not constant.
In many cases this does not matter, but depending on the calling convention, this can have horrible performance characteristics. Specifically in Roc's case, passing large structs and unions has a lot of overhead in many calling conventions. For every call, the large struct must be copied to a new stack location. This means that if you are passing a model down the stack, you are copying the entire model for every single call, wasting stack space and time. Here is an example in c++ of the difference between passing by const value and passing by const ref. The cost is approximately 30 memory copy assembly instructions even in this simple example. When looking at the performance cost using only a struct that is 4 usizes (pretty tiny), it already is about 3x slower to pass by const value instead of const reference for these small functions.
Given Roc is a functional language focused on high performance, I think taking advantage of passing by const reference is extremely important. I think we should follow along with C++ guidelines, for anything that is trivially copy-able, pass it by value. For everything else, pass it by const reference. In our specific case, I propose that any value over 2 usizes should be passed by const reference. Then it will only be copied if a sub-function needs to return a modified version of the struct, rather than being copied for every single function call. The reason for 2 usizes is that it matches along with most modern function calls. Things that are 2 usizes or smaller will be passed in registers instead of on the stack. Anything larger than 2 usizes requires copying to the stack for passing.
I think this will be paramount to performance and avoiding stack overflows for large models and structs being passed around. It also will make boxing fit in naturally. If we are boxing a large object, we can still just pass the pointer down the call stack like with all other structs. No need to every truly unbox and copy all of the data out of the box. A box is just a constant reference to a chunk of data.
Thoughts?
In our specific case, I propose that any value over 2 usizes should be passed by const reference.
that sounds like a great heuristic! I've been thinking about this for awhile, but didn't know what a good cutoff point would be...I like your reasoning for choosing that!
actually a good example of where this will come up is GUIs, e.g. for editor plugins
you have the entire application state value being passed around to all sorts of functions
if that thing has like 20 fields in it, RIP performance unless we're passing pointers :laughing:
a separate consideration: what if we have a bunch of arguments?
for example:
As an extra note, in some simple benchmarks, even at 3 usizes, it is still faster to pass by reference than by value. Of coures, at 2 usizes, passing by ref is much worse due to not taking advantage of registers.
consider a function with 20 arguments. Should we copy all 20 every time the function gets called?
I could see arguments for two designs:
usually I'd default to #2 in a case like this - give people some way to get the behavior, so we're not eliminating potential performance optimizations - but the thing is, I can't think of a single motivating use case to support #2 where you'd ever actually prefer that it copy all 20 arguments to having them be automatically "tupled" and passed by reference :sweat_smile:
So with many arguments, it gets a lot more calling conventions specific. For example, on x86-64 linux, I can pass 14 arguments in registers if 6 of them are ints, and the other 8 are floats, but on windows, I get 4 arguments total passed in registers, int or float.
This being said, I don't think making a tuple would help much. The only case where it would be an advantage is if the same 20 arguments are passed to multiple functions, or always in order linear subsets of the arguments.
Because if you swap order or only use some the parameters, you would still have to copy around to make the new expected tuple for the sub function.
Also with 20 args, as many as possible will still be in registers before we start pushing to the stack, so that is another advantage over the tuple. With individual args, I might get 8 in registers and 12 on the stack instead of 20 on the stack with a reference to it being passed as an arg.
In fact, in shallow cases, or cases where the args for each function change a lot, splitting a struct into many values and passing it as a ton of arguments instead of just one, it can be faster. But in most cases where you pass the same struct around to multiple functions, that will be generally better than direct args.
oh that's a good point! :thumbs_up:
The Wasm backend is already passing all data structure arguments as pointers. Only primitive numbers are passed by value.
Is it possible to in-place mutate an argument?
The answer should be no to my knowledge, which is why I want to make passing by const ref an explicit rule of the language. That being said, with the llvm backend, it's optimizations totally are allowed to violate that. For example if it is passed a struct on the stack, modifies a field, and then uses it, llvm could update that struct argument because it knows the value was copied in and it has ownership. Technically in the dev backend basic linear scan optimizations could probably be made, but it is not part of our mono ir.
So for wasm, does the "C" calling convention just always use pointers? Also, this means that the wasm backend already doesn't copy data if it passes a struct from one function to the next, just sends the pointer farther down?
Brendan Hansknecht said:
So for wasm, does the "C" calling convention just always use pointers? Also, this means that the wasm backend already doesn't copy data if it passes a struct from one function to the next, just sends the pointer farther down?
That's right, yes.
It always uses pointers for data structures.
Yeah, so this all just doesn't effect wasm, I guess.
In x86-64, if a function is passed a 4 element struct and it then passes the struct to another function, it has to copy the struct to a new stack offset. So switching to const ref is important. I assume everything in wasm is already pointers because registers and some memory management are abstracted away.
for LLVM this should be mitigated by inlining, but is probably still worth looking at
question is though: at what level of abstraction do we want to think about this? in the mono IR, or in the backends themselves?
wasm already does its own thing, so that makes it a bit weird to put it into mono IR
inlining will help, but will not solve all problems, especially with a large model (or subsets of them) are threaded through most functions. It is unlikely all of the functions will be inlined and then some calls will lead to copying.
As for IR or not, I think it should probably be in the IR.
I think it needs to be explicit because the platform needs to know about it in order to pass args correctly.
If struct greater than 2 usize, platforms must pass by constant ref.
Also, if the platform is passing a value, even in wasm (to my understanding), the host might still need to make a copy of the value and then pass it in, rather than just passing the pointer to the existing value.
The issue is that const values that are function args can still be modified and a copy will be required if the host wants to use or modify the value later.
I don't see why anything other than the copy needs to be explicit in the IR
Checking if something is bigger than 2 usize can be done anywhere. I'm already doing it in the wasm backend.
From a certain point of view it's already in the IR because the Layout tells you the size of the structure so you have the info you need.
That is true. I guess it is just important that the platform author has an explicit rule to follow when passing data to Roc (also builtins from roc to zig). Otherwise, internally, all backends can technically make up whatever calling conventions they want.
I think that I would prefer something in the IR because it will specify consistency across the backends and help to ensure a platform won't be messed up on a specific backend. That being said, the explicitness could just be a bool flag or another enum value with the same data (Struct and StructRef. I don' think we need to explicitly add pointer loading and such. So for the wasm backend, it actually wouldn't change anything since everything is already references.
On the flip side, in some situations, not having the explicit IR could enable a tiny bit more performance optimizations. For example, with avx2, it is technically possible to pass 4 doubles by value in a single register using the __m256d type.
Not sure if Roc would or could ever take advantage of that, but maybe.
Last updated: Jun 16 2026 at 16:19 UTC