passing multiple pointers instead of structs · compiler development

Stream: compiler development

Topic: passing multiple pointers instead of structs

Richard Feldman (Jan 05 2024 at 10:29):

very interesting! https://gist.github.com/FeepingCreature/5dff669aad380a123b15659e195fb96c

Brendan Hansknecht (Jan 05 2024 at 15:36):

Now, LLVM is a very good optimizer, but this does not leave it much room. The value has to go on the stack, which means there must be space for it there, it must be copied out of the register it is probably living in, and it has to remember which parts of the stack are in use and which ones can be reused by another call, which it turns out to be pretty poor at.

I think this is actually just a misunderstanding that llvm doesn't really deal with calling conventions and such. Cause the proper solution generally is to pass a single pointer to the entire struct all the way down the stack. So there shouldn't be any of this copying to begin with.

Brendan Hansknecht (Jan 05 2024 at 15:36):

So fundamentally they are passing by value when they should be passing by reference.

Brendan Hansknecht (Jan 05 2024 at 15:36):

As such they keep copying over and over again.

Brendan Hansknecht (Jan 05 2024 at 15:37):

Also, llvm fastcall should enable slipping the stack for things like this and not following the amd sysv arg passing rules.

Brendan Hansknecht (Jan 05 2024 at 15:38):

Also, this is a dangerous microbenchmark

Brendan Hansknecht (Jan 05 2024 at 15:39):

Passing 3x the args Will eat Up all of the registers very very quickly in an x86 system. So they will very quickly hit bad perf due to that. Also, passing everything in registers likely means way more shuffling around of data and pushing and popping (as arg lists get longer and functions more complex)

Brendan Hansknecht (Jan 05 2024 at 15:42):

Lastly, they explicitly block inlining, which would make small functions like those equivalent. And for larger functions, the cost diminishes quickly.

Brendan Hansknecht (Jan 05 2024 at 15:42):

So I think this falls into the category of a bad microbenchmark for the most part

Anton (Jan 05 2024 at 15:43):

Interesting!

Brendan Hansknecht (Jan 05 2024 at 16:23):

At the same time, I should clarify, as long as the function doesn't/can't be inlined(or changed to llvm fastcall), that specific benchmark will be faster with the split version. Of course, with less gains the longer the function gets. Also, I guess depending the calling context, it could lead to popping a bunch of stuff to make the function call, but that is unlikely.

Sysv has exactly 6 registers. So they are using the 6 register perfectly and never put anything on the stack.

Brendan Hansknecht (Jan 05 2024 at 16:23):

As a fun note, enabling lto on that example (and thus enabling inlining) is more than 2x faster than having a function call at all.

Brendan Hansknecht (Jan 05 2024 at 19:52):

Tangent, the related benchmark linked in that post might be interesting to implement in roc (assuming our json parsing is far enough along:

Read the posts JSON file.
Iterate over the posts and populate a map containing: tag -> List<int>, with the int representing the post index of each post with that tag.
Iterate over the posts and for each post:
- Create a map: PostIndex -> int to track the number of shared tags
- For each tag, Iterate over the posts that have that tag
- For each post, increment the shared tag count in the map.
Sort the related posts by the number of shared tags.
Write the top 5 related posts for each post to a new JSON file.

Last updated: Jul 26 2025 at 12:14 UTC