Passing by const ref · ideas · Zulip Chat Archive

Stream: ideas

Topic: Passing by const ref

Brendan Hansknecht (Feb 22 2022 at 01:31):

Currently everything in Roc is essentially passed as a constant value, with the special exception that lists and strings can be edited in place if they only have a single reference. That being said, they are technically still passed by constant value. Just the data they point to is not constant.

In many cases this does not matter, but depending on the calling convention, this can have horrible performance characteristics. Specifically in Roc's case, passing large structs and unions has a lot of overhead in many calling conventions. For every call, the large struct must be copied to a new stack location. This means that if you are passing a model down the stack, you are copying the entire model for every single call, wasting stack space and time. Here is an example in c++ of the difference between passing by const value and passing by const ref. The cost is approximately 30 memory copy assembly instructions even in this simple example. When looking at the performance cost using only a struct that is 4 usizes (pretty tiny), it already is about 3x slower to pass by const value instead of const reference for these small functions.

Given Roc is a functional language focused on high performance, I think taking advantage of passing by const reference is extremely important. I think we should follow along with C++ guidelines, for anything that is trivially copy-able, pass it by value. For everything else, pass it by const reference. In our specific case, I propose that any value over 2 usizes should be passed by const reference. Then it will only be copied if a sub-function needs to return a modified version of the struct, rather than being copied for every single function call. The reason for 2 usizes is that it matches along with most modern function calls. Things that are 2 usizes or smaller will be passed in registers instead of on the stack. Anything larger than 2 usizes requires copying to the stack for passing.

I think this will be paramount to performance and avoiding stack overflows for large models and structs being passed around. It also will make boxing fit in naturally. If we are boxing a large object, we can still just pass the pointer down the call stack like with all other structs. No need to every truly unbox and copy all of the data out of the box. A box is just a constant reference to a chunk of data.

Thoughts?

Richard Feldman (Feb 22 2022 at 01:42):

In our specific case, I propose that any value over 2 usizes should be passed by const reference.

that sounds like a great heuristic! I've been thinking about this for awhile, but didn't know what a good cutoff point would be...I like your reasoning for choosing that!

Richard Feldman (Feb 22 2022 at 01:43):

actually a good example of where this will come up is GUIs, e.g. for editor plugins

Richard Feldman (Feb 22 2022 at 01:43):

you have the entire application state value being passed around to all sorts of functions

Richard Feldman (Feb 22 2022 at 01:43):

if that thing has like 20 fields in it, RIP performance unless we're passing pointers :laughing:

Richard Feldman (Feb 22 2022 at 01:45):

a separate consideration: what if we have a bunch of arguments?

Richard Feldman (Feb 22 2022 at 01:46):

for example:

Brendan Hansknecht (Feb 22 2022 at 01:46):

As an extra note, in some simple benchmarks, even at 3 usizes, it is still faster to pass by reference than by value. Of coures, at 2 usizes, passing by ref is much worse due to not taking advantage of registers.

Richard Feldman (Feb 22 2022 at 01:47):

consider a function with 20 arguments. Should we copy all 20 every time the function gets called?

I could see arguments for two designs:

Automatically turn some of those arguments into a struct and pass them that way, so we aren't copying 20 things on the stack
If you're doing it this way, it's because for some reason you explicitly want that to happen (if you don't want it to happen, you could just accept a single-tag union with a 20-field payload), and we shouldn't take away that option from you

Richard Feldman (Feb 22 2022 at 01:48):

usually I'd default to #2 in a case like this - give people some way to get the behavior, so we're not eliminating potential performance optimizations - but the thing is, I can't think of a single motivating use case to support #2 where you'd ever actually prefer that it copy all 20 arguments to having them be automatically "tupled" and passed by reference :sweat_smile:

Brendan Hansknecht (Feb 22 2022 at 01:49):

So with many arguments, it gets a lot more calling conventions specific. For example, on x86-64 linux, I can pass 14 arguments in registers if 6 of them are ints, and the other 8 are floats, but on windows, I get 4 arguments total passed in registers, int or float.

Brendan Hansknecht (Feb 22 2022 at 01:50):

This being said, I don't think making a tuple would help much. The only case where it would be an advantage is if the same 20 arguments are passed to multiple functions, or always in order linear subsets of the arguments.

Brendan Hansknecht (Feb 22 2022 at 01:51):

Because if you swap order or only use some the parameters, you would still have to copy around to make the new expected tuple for the sub function.

Brendan Hansknecht (Feb 22 2022 at 01:52):

Also with 20 args, as many as possible will still be in registers before we start pushing to the stack, so that is another advantage over the tuple. With individual args, I might get 8 in registers and 12 on the stack instead of 20 on the stack with a reference to it being passed as an arg.

Brendan Hansknecht (Feb 22 2022 at 01:53):

In fact, in shallow cases, or cases where the args for each function change a lot, splitting a struct into many values and passing it as a ton of arguments instead of just one, it can be faster. But in most cases where you pass the same struct around to multiple functions, that will be generally better than direct args.

Richard Feldman (Feb 22 2022 at 02:07):

oh that's a good point! :thumbs_up:

Brian Carroll (Feb 22 2022 at 09:21):

The Wasm backend is already passing all data structure arguments as pointers. Only primitive numbers are passed by value.

Brian Carroll (Feb 22 2022 at 09:21):

Is it possible to in-place mutate an argument?

Brendan Hansknecht (Feb 22 2022 at 17:08):

The answer should be no to my knowledge, which is why I want to make passing by const ref an explicit rule of the language. That being said, with the llvm backend, it's optimizations totally are allowed to violate that. For example if it is passed a struct on the stack, modifies a field, and then uses it, llvm could update that struct argument because it knows the value was copied in and it has ownership. Technically in the dev backend basic linear scan optimizations could probably be made, but it is not part of our mono ir.
So for wasm, does the "C" calling convention just always use pointers? Also, this means that the wasm backend already doesn't copy data if it passes a struct from one function to the next, just sends the pointer farther down?

Brian Carroll (Feb 22 2022 at 17:42):

Brendan Hansknecht said:

So for wasm, does the "C" calling convention just always use pointers? Also, this means that the wasm backend already doesn't copy data if it passes a struct from one function to the next, just sends the pointer farther down?

That's right, yes.

Brian Carroll (Feb 22 2022 at 17:43):

It always uses pointers for data structures.

Brendan Hansknecht (Feb 22 2022 at 19:21):

Yeah, so this all just doesn't effect wasm, I guess.

Brendan Hansknecht (Feb 22 2022 at 19:24):

In x86-64, if a function is passed a 4 element struct and it then passes the struct to another function, it has to copy the struct to a new stack offset. So switching to const ref is important. I assume everything in wasm is already pointers because registers and some memory management are abstracted away.

Folkert de Vries (Feb 22 2022 at 19:30):

for LLVM this should be mitigated by inlining, but is probably still worth looking at

Folkert de Vries (Feb 22 2022 at 19:31):

question is though: at what level of abstraction do we want to think about this? in the mono IR, or in the backends themselves?

Folkert de Vries (Feb 22 2022 at 19:31):

wasm already does its own thing, so that makes it a bit weird to put it into mono IR

Brendan Hansknecht (Feb 22 2022 at 19:48):

inlining will help, but will not solve all problems, especially with a large model (or subsets of them) are threaded through most functions. It is unlikely all of the functions will be inlined and then some calls will lead to copying.

Brendan Hansknecht (Feb 22 2022 at 19:50):

As for IR or not, I think it should probably be in the IR.
I think it needs to be explicit because the platform needs to know about it in order to pass args correctly.

Brendan Hansknecht (Feb 22 2022 at 19:51):

If struct greater than 2 usize, platforms must pass by constant ref.

Brendan Hansknecht (Feb 22 2022 at 19:53):

Also, if the platform is passing a value, even in wasm (to my understanding), the host might still need to make a copy of the value and then pass it in, rather than just passing the pointer to the existing value.

Brendan Hansknecht (Feb 22 2022 at 19:53):

The issue is that const values that are function args can still be modified and a copy will be required if the host wants to use or modify the value later.

Brian Carroll (Feb 22 2022 at 22:07):

I don't see why anything other than the copy needs to be explicit in the IR

Brian Carroll (Feb 22 2022 at 22:08):

Checking if something is bigger than 2 usize can be done anywhere. I'm already doing it in the wasm backend.

Brian Carroll (Feb 22 2022 at 22:09):

From a certain point of view it's already in the IR because the Layout tells you the size of the structure so you have the info you need.

Brendan Hansknecht (Feb 22 2022 at 22:40):

That is true. I guess it is just important that the platform author has an explicit rule to follow when passing data to Roc (also builtins from roc to zig). Otherwise, internally, all backends can technically make up whatever calling conventions they want.

Brendan Hansknecht (Feb 22 2022 at 22:49):

I think that I would prefer something in the IR because it will specify consistency across the backends and help to ensure a platform won't be messed up on a specific backend. That being said, the explicitness could just be a bool flag or another enum value with the same data (Struct and StructRef. I don' think we need to explicitly add pointer loading and such. So for the wasm backend, it actually wouldn't change anything since everything is already references.

Brendan Hansknecht (Feb 22 2022 at 22:56):

On the flip side, in some situations, not having the explicit IR could enable a tiny bit more performance optimizations. For example, with avx2, it is technically possible to pass 4 doubles by value in a single register using the __m256d type.

Brendan Hansknecht (Feb 22 2022 at 22:56):

Not sure if Roc would or could ever take advantage of that, but maybe.

Last updated: Jul 23 2026 at 13:15 UTC