Dev Backend IR · compiler development

This is a pretty open thought, but I am questioning if we should change the Dev backend to have an ir that can be stored in a flat array (rather than a tree structure/not existing), that is essentially a portable assembly (one op per function in the current Assembler trait).

So it would kinda be akin to llvm ir, but just what we need to support the dev backend. It would also just be for defining within a single procedure and any jmp type instructions would just use a integer as the target that is which other instruction in the array to jump to.

The main gain of doing this is for future minor optimizations that we want to run on the ir. The big one being register allocation. With the current IR, register allocation, and lifetimes in general, are very much not a clean problem. Mono is a tree. It also is not nearly as fine grain as individual assembly instructions. So it is very hard to have any idea how many registers will be needed for a specific mono ir node.

I think that we should have a very fast pass that runs over mono and blits out a flat array of assembly like instructions. (essentially will be equivalent to the current dev backend but less complex due to having infinite registers for example).

Then we would run register allocation on the flat array (lots of good algorithms with linear time exist for this). Given the assembly ir won't perfectly match to assembly instructions, we will have a backend dependent function that will label exactly how many temporary registers an instruction will need. That is all the information we should need to deal with register allocations.

Once that works, we can consider other minor but fast optimizations:

grouping instructions for final output
intentionally reusing source and dest reg to avoid extra data movement instructions
freeing up the base pointer by calculating all offsets with the stack pointer
storing some aggregate values in registers if available instead of on the stack
maybe some minor and quick rewrites.

Especially given we generate at the procedure level, it should be possible to run this part of the dev backend in parallel. Only merging the final generated assembly into an output object file.

I think this will help clean up some of the complexities of the dev backend related to where data is stored and how it is loaded. It should help separate that work cleanly from the rest of the ir selection at a minimum. It also gives us some opportunities for minor but important optimization passes over dev backend ir.

What are people's thoughts on this? I would guess @Folkert de Vries and @Richard Feldman would have the most input.

Richard Feldman (Feb 01 2024 at 23:19):

so my default thinking on anything to do with the dev backend is "fewer passes is faster, so let's minimize passes"

Richard Feldman (Feb 01 2024 at 23:20):

obviously there can possibly exist a way to take a given implementation and actually make a faster implementation which does more passes, but by default it's going to be slower

Richard Feldman (Feb 01 2024 at 23:21):

so my immediate reaction is "to what extent could we achieve some of these goals without introducing any new passes or IRs?"

Brendan Hansknecht (Feb 01 2024 at 23:27):

I think it is important to start with, we definitely want ok register allocation in the long term. It will be important to get enough performance that end users actually want to use the dev backend for reasonably sized projects. On top of that, it will be useful for making the dev backend output assembly easier to read and debug.

Brendan Hansknecht (Feb 01 2024 at 23:28):

I don't think it will be feasible to generate ok register allocation while depending only on Mono IR.

Or more accurately, we probably could find a way to force it to happen, but it would be exceptionally messy. The current super simple register and stack allocation strategy built on mono is already messy.

Folkert de Vries (Feb 01 2024 at 23:28):

I'll just say that in principle I'm on board with this idea. The current register allocation is fundamentally limited, I don't think there is a (good) way to traverse the tree structure and perform the register allocation in one pass

Brendan Hansknecht (Feb 01 2024 at 23:37):

so my default thinking on anything to do with the dev backend is "fewer passes is faster, so let's minimize passes"

My general thoughts are:

The dev backend is already fast. Frontend and typechecking and such is likely to be the bottleneck by a large margin currently.
The dev backend can easily be n times faster by running procedures in parallel.
This change should make the dev backend simplier and easier to maintain my avoiding intermixing the handling of multiple different concerns.
This is practically required to do ok register allocation
All "extra" optimizations can be evaluated one at a time to see if they meet our tradeoffs.

Richard Feldman (Feb 01 2024 at 23:50):

so could it be done in 1 extra pass?

Richard Feldman (Feb 01 2024 at 23:50):

as in:

today we do mono IR -> bytes
in this world we do mono IR -> new IR -> bytes

Brendan Hansknecht (Feb 01 2024 at 23:52):

Yeah. Though technically today we do one pass on mono to record lifetimes and in the new form we would do one pass on the new ir to do register allocation instead.

Richard Feldman (Feb 02 2024 at 00:04):

that sounds good to me then! :+1:

Brendan Hansknecht (Feb 02 2024 at 00:07):

One extra note, if we do any of the other extra optimizations that I mentioned, they would all run on this IR (and some may actually just be extensions to the linear register allocator pass). So definitely no other IRs.

Wizard ish (Dec 03 2024 at 22:02):

What's the status of this?

Richard Feldman (Dec 03 2024 at 22:06):

nobody is working on it right now, as far as I know

Richard Feldman (Dec 03 2024 at 22:06):

but it would be awesome if anyone wanted to get it going! :smiley:

Brendan Hansknecht (Dec 03 2024 at 22:15):

Yeah, totally unstarted

Brendan Hansknecht (Dec 03 2024 at 22:15):

Just documented

Wizard ish (Dec 03 2024 at 22:35):

also how would this pass / perserve registers when jumps happen (what's the calling convention)?

Brendan Hansknecht (Dec 03 2024 at 22:41):

Jumps or calls?

Brendan Hansknecht (Dec 03 2024 at 22:44):

Any, calls sound be c calling convention

Brendan Hansknecht (Dec 03 2024 at 22:45):

Jumps are up to us but theoretically can keep everything alive.

Brendan Hansknecht (Dec 03 2024 at 22:46):

If stashing out happen inside a loop from a join point, we would just want to hoist whatever possible before the join point so it isn't stashing on repeat in the loop

Wizard ish (Dec 03 2024 at 22:47):

calls

Brendan Hansknecht (Dec 03 2024 at 22:47):

Of course still might have stashing in a loop due to locals for the loop

Brendan Hansknecht (Dec 03 2024 at 22:47):

Yeah, calls are just c abi.

Wizard ish (Dec 03 2024 at 22:47):

c is just preserve the arguement on a stack right?

Brendan Hansknecht (Dec 03 2024 at 22:48):

Anytime one comes up, we just ban a list of registers and have to stash what's in them

Brendan Hansknecht (Dec 03 2024 at 22:48):

Yeah, we stash to the stack whenever out of register space

Wizard ish (Dec 03 2024 at 22:48):

got it

Brendan Hansknecht (Dec 03 2024 at 22:50):

In the GitHub issue, I link to some good article on linear time register allocation passes

Richard Feldman (Dec 03 2024 at 23:57):

@Wizard ish are you interested in diving into this? :smiley:

Wizard ish (Dec 04 2024 at 00:01):

@Richard Feldman Perhaps, although I'll have to freshen up on my assembly first

Richard Feldman (Dec 04 2024 at 00:06):

that's awesome! feel free to post any questions here! :grinning_face_with_smiling_eyes:

Brendan Hansknecht (Dec 04 2024 at 02:37):

For the most part you should not have to generate any assembly directly. Cause we already have the assembler trait. That said, brushing up on assembly will definitely help use the trait, map it to an ir, and debug issues.

Wizard ish (Dec 04 2024 at 02:37):

yeah and i also want to re-famaliarize myself with just the whole thing yk

Wizard ish (Dec 04 2024 at 02:38):

been like a year or two since ive done any assembly

Brendan Hansknecht (Dec 04 2024 at 02:42):

Makes sense

Wizard ish (Dec 05 2024 at 01:12):

only one thing i really disagree with, and that's that the ir should use a integer as the target for jumps, which, given that the whole point would be for optimization, seems like it would be pretty hard to deal with (i would make essientally "label" no-ops that are used when generating the actual machine code, so basically like assembly). For instance, if we have a function that is (this is a very contrived example and also i didn't test that this on any assembler (note that destination first is used)):

store LOC eax ;0
add ebx eax ;1
load eax [LOC] ;2
JumpTo: ;3
sub ebx #1 ;4
cmp ebx #5 ;5
jge JumpTo ;6

if we first transform the labels to the instructions they refrence (technically LOC is also a label but that's not important) we get:

store LOC eax ;0
add ebx eax ;1
load eax [LOC] ;2
nop ;3
sub ebx #1 ;4
cmp ebx #5 ;5
jge [3] ;6

Then when optimized, we get

add ebx eax ;0
sub ebx #1 ;1
cmp ebx #5 ;2
jge [3] ;3

Where the jge is now a loop, not the desired behavior! However, if we looked up the labels after, it is first optimized to:

add ebx eax ;0
JumpTo: ;1
sub ebx #1 ;2
cmp ebx #5 ;3
jge JumpTo ;4

Then we realize the labels:

add ebx eax ;0
nop ;1
sub ebx #1 ;2
cmp ebx #5 ;3
jge [1] ;4

We fix the problem

Brendan Hansknecht (Dec 05 2024 at 01:30):

I dont think we plan to support those kinds of optimizations

Brendan Hansknecht (Dec 05 2024 at 01:30):

Moving a nodes index is really expensive. Would requiring shifting all later nodes in the array.

Wizard ish (Dec 05 2024 at 01:31):

Not if it was a linked list

Brendan Hansknecht (Dec 05 2024 at 01:31):

That is a lot of unnecessary cost

Brendan Hansknecht (Dec 05 2024 at 01:32):

We only want something super slim

Brendan Hansknecht (Dec 05 2024 at 01:32):

This is the dev backend

Wizard ish (Dec 05 2024 at 01:32):

got it

Brendan Hansknecht (Dec 05 2024 at 01:33):

We can pattern match on final assembly generation if we want, but I don't think we want any passes that actually mutate this array in large ways (like inserting or deleting a node)

Brendan Hansknecht (Dec 05 2024 at 01:33):

Brendan Hansknecht (Dec 05 2024 at 01:34):

At least that is the hope if it works out in practice

Brendan Hansknecht (Dec 05 2024 at 01:35):

Just one quick write to go from mono to a flat array of roughly assembly. N passes that just edit nodes inplace or add sideband info. A final pass to take all that info and actually generate the final assembly.

Brendan Hansknecht (Dec 05 2024 at 01:35):

That's roughly the theory and hope.

Brendan Hansknecht (Dec 05 2024 at 01:42):

Note: I totally could be off about a constraint here. I have not thought about this in a long time or dug through it in a ton of detail. For background context. There was debate around if we should even add another ir. What we do currently is super adhoc, but it generates assembly crazy fast. We really want to keep as much of that speed as possible, just get slightly more reasonable dev assembly.

Brendan Hansknecht (Dec 05 2024 at 01:42):

But the target is still roughly -O0 assembly

Wizard ish (Dec 05 2024 at 02:06):

oh yeah i thought it was gonna be something that would act as an IR for all the backends, so in other words, Roc -> IR , and then IR -> LLVM, IR -> WASM, and, for the dev backend, just have an evaluator to get rid of the giant overhead of compilation

Wizard ish (Dec 05 2024 at 02:07):

but that makes a lot less sense now that i think about it

Wizard ish (Dec 05 2024 at 19:45):

Also why isn’t the dev backend just an interpreter? Wouldn’t that make it much faster

Luke Boswell (Dec 05 2024 at 20:09):

Roc isn't limited to being used through the cli, you can also compile it and embed it in other things. Also this interpreter would need to be platform specific.

Wizard ish (Dec 05 2024 at 20:10):

No simply for the dev backend, as the idea for it is to be used for... devoloping. So keep LLVM, use that for release builds, and use dev for debugging.

Luke Boswell (Dec 05 2024 at 20:10):

I think @Ryan Barth had some interesting ideas in this direction. Not sure where we got to with that.

Wizard ish (Dec 05 2024 at 20:10):

ok, also, just because i feel like there is definitly something I'm doing wrong in #7249, how does one sign old commits that have already been pushed

Luke Boswell (Dec 05 2024 at 20:12):

An admin can bypass and merge, or you can interactively rebase and sign. I have a note on my laptop how to do it if you want to try that.

Wizard ish (Dec 05 2024 at 20:12):

yeah the rebasing, ive tried that a couple times but for some reason it appears to not fix the issue...

Luke Boswell (Dec 05 2024 at 20:13):

Nvm, I just realised what we were talking about. All done :grinning:

Luke Boswell (Dec 05 2024 at 20:13):

Thanks for working on that, and your patience

Wizard ish (Dec 05 2024 at 20:14):

ok ty and np (alas this is why i need to read the fine print of stuff...)

Joshua Warner (Dec 06 2024 at 06:29):

Pro tip, you can do git rebase --exec 'git commit --amend --no-edit -S' main to rebase on main and sign all your commits.

Anton (Dec 06 2024 at 11:04):

It can be a lot more complicated if you got merges in your commits as well

Anthony Bullard (Dec 06 2024 at 11:44):

Merges? I thought we lived in a civilized world?

Anton (Dec 06 2024 at 11:47):

Sorry I did not see your comment, I did a force push after my rebase :p

Anthony Bullard (Dec 06 2024 at 11:48):

Hopefully no one else is working on my feature branch :-)

Brendan Hansknecht (Dec 06 2024 at 16:22):

Anthony Bullard said:

Merges? I thought we lived in a civilized world?

I think that a rebase only world is delightful, but I have really learned that many people can't cope with rebase. To be fair, rebase gets absolutely horrid if you make millions of small commits. (Which I think is way more common in open source).

Also, gits and GitHub are only mid at best. If they were better, rebasing might be more common.

shua (Dec 07 2024 at 16:00):

Should 128-bit registers be part of this IR? as far as I understand, 128-bit values have to be moved to stack, then mov'd to 2 64-bit registers in our current design, but if we're going to have some abstract IR register map to chip register anyway, I wonder how tricky it would be to allow 1 IR register to map to one or two chip registers based on the size?

Brendan Hansknecht (Dec 07 2024 at 17:15):

I don't think so. Cause you don't necessarily have to have both parts loaded at the same time. And that can enable better register use.

Brendan Hansknecht (Dec 07 2024 at 17:15):

So better to see it as a tuple of 2 u64

shua (Dec 07 2024 at 18:17):

do you have an example in mind? I suspect it won't be useful at the Mono->IR translation level to split across two IR registers, but rather something we do when translating IR->asm In other words, it's not useful to work on one half at a time in gen_dev/generic64/mod.rs which is where I imagine translating Mono to IR happens, but rather in gen_dev/generic64/x86_64.rs where translating IR to x86 asm happens.

shua (Dec 07 2024 at 18:19):

(I'm assuming also that allocating registers happens in the IR->asm steps)

Wizard ish (Dec 07 2024 at 18:26):

I actually think that it should just be registers are unspecified in size, and however large they need to be is specified at IR->Asm according to how large it needs to be according to instruction size, with perhaps a ‘extend’ and ‘truncate’ op, as it dosent really seem to be about “how much can we optimize instructions as “how much can we optimize memory usage” but that’s just my opinion

Brendan Hansknecht (Dec 07 2024 at 18:34):

The main goal is to avoid tons of extra instructions that are just spilling and loading from stack.

Brendan Hansknecht (Dec 07 2024 at 18:34):

Not really about memory usage, more about huge time sink and perf hit

Brendan Hansknecht (Dec 07 2024 at 18:35):

Just too heavy to be reasonable in a loop

Brendan Hansknecht (Dec 07 2024 at 18:36):

ASM doesn't work on 128 but values. Only register size or less values. I think keeping that abstraction is useful to make the target as simple as possible

Brendan Hansknecht (Dec 07 2024 at 18:37):

That said, infinite sized registers are common with more complex register allocation algorithms.

Brendan Hansknecht (Dec 07 2024 at 18:37):

So maybe would be valuable, haven't thought this through fully

Brendan Hansknecht (Dec 07 2024 at 18:39):

Given that, due to following c abi, most things will be required on the stack anyway, I had leaned towards keeping it simple and storing the true value of more things on the stack (like is done today). Then only solve register allocation for live subvalues that are actually being acted on.

Brendan Hansknecht (Dec 07 2024 at 18:40):

Definitely possible to do the reverse. Just changes the infrastructure father away from the simpler and more direct design of today (everything large has space on the stack and we only ever load register sized chunks of it)

Brendan Hansknecht (Dec 07 2024 at 18:42):

I could be wrong, but I think with infinite sized registers, you likely need more passes and a more complex algorithm to complete register allocation

shua (Dec 07 2024 at 19:50):

fair, and just to be clear, I haven't thought much past my recent attempt to implement add/sub/mul x checked/overflow/wrap for u128/i128 as asm rather than bitcode call. It feels like it would make those cases simpler, but raised the question here because that's a pretty narrow view.

Wizard ish (Dec 07 2024 at 19:53):

Hmmm yeah that makes sense, although abstracting that way makes it fundamentally harder to translate it to 32 bit

Wizard ish (Dec 07 2024 at 19:54):

And also it might save a bit of memory by splitting registers (which would only be possible if we weren’t assuming each register is 64 bits)

Wizard ish (Dec 07 2024 at 19:54):

But that would undoubtedly make it much more complex

Wizard ish (Dec 07 2024 at 19:55):

One thing that would be hard to implement w/o memory apart from 64 bit registers are structs

Wizard ish (Dec 07 2024 at 19:55):

As you’d need to use mutiple registers

Wizard ish (Dec 07 2024 at 19:56):

Although you could just have “large registers” that will probably just become memory locations

shua (Dec 07 2024 at 19:59):

re "following c abi, most things will be required on the stack", for the very specific case of those arithmetic functions that only take two u128/i128 arguments, they fit entirely in registers; and the return fits in rax+rdx as well. I haven't found a way to work with u128/i128 args that doesn't first push them to stack, so that's a case where not having 128bit registers (or a way to say "give me the two 64bit registers that this 128bit argument has been already loaded into") ends up with more mov's between stack/register.

shua (Dec 07 2024 at 20:00):

This could also be over-optimizing an edge case. I don't know how much roc code will be using Num.mul on u128/i128, but maybe Dec?

Wizard ish (Dec 07 2024 at 20:05):

Yeah I do think that having registers of diffrent sizes would be useful, even if it was a little more complex, either with or without splitting them, and yeah ‘Dec’ is a case, and a very large case, of where a 128 bit registers would be useful (though frankly I think if performance is a concern f64 or perhaps Frac should be used instead)

Brendan Hansknecht (Dec 07 2024 at 20:06):

Wizard ish said:

Although you could just have “large registers” that will probably just become memory locations

Yeah, I guess this is essentially what we have now

Wizard ish (Dec 07 2024 at 20:07):

Perhaps a way to “unsplit” registers, ie, if one did ‘rax’ and ‘rbx’ become a 128bit ‘rabx’

shua (Dec 07 2024 at 20:08):

if an example is any help, here's the change to impl add u128/i128 as asm.

Brendan Hansknecht (Dec 07 2024 at 20:11):

I don't quite follow that change, why is the argument pushed onto the stack? If it is already in registers, couldn't it be just used directly from those registers?

Wizard ish (Dec 07 2024 at 20:11):

We could have it so the virtual registers can have “hints” for the max size so if they’re moved into memory they can be smaller then the regular guarantee (ie if a register is used in a add 32 and mul 16 instructions and it has to at that point be moved into memory, instead of giving it a full 64 bits, it could just be given a 32 bit slot of memory (this would happen is IR->Backend

shua (Dec 07 2024 at 20:12):

Brendan Hansknecht said:

I don't quite follow that change, why is the argument pushed onto the stack? If it is already in registers, couldn't it be just used directly from those registers?

that's just my not understanding how is the best way to get u128 args. I couldn't find any examples using registers, only other things that pushed to stack, then read what was pushed.

Brendan Hansknecht (Dec 07 2024 at 20:13):

Brendan Hansknecht (Dec 07 2024 at 20:15):

Also, I had assumed this ir would be just for generic64, I guess if it is for all of gen dev and a future theoretical generic32, it would need a more featured solution

Brendan Hansknecht (Dec 07 2024 at 20:16):

That said, if it is too generic, it can't do register allocation

Brendan Hansknecht (Dec 07 2024 at 20:16):

For example, you can't register allocate until you know if an i64 fits in a register or belongs on the stack

Brendan Hansknecht (Dec 07 2024 at 20:17):

If it belongs on the stack you have to generate multiple instructions to handle the addition

Brendan Hansknecht (Dec 07 2024 at 20:17):

And that set of instructions needs register alloctions

Brendan Hansknecht (Dec 07 2024 at 20:19):

Given we don't want too much complexity, it is nice to have the abstraction that everything larger than a register is always on the stack. An optimization would be to allow things to be partially on the stack and partially loaded. A different optimization would be to use 2 registers as the cut off cause that matches c abi (this is the issue referenced with adding u128 currently and requiring dumping to stack). I'm not sure any of those optimizations need to be in the dev backend

Brendan Hansknecht (Dec 07 2024 at 20:20):

My biggest concern is with being too generic is that it will lead to more required transformation and a significant cost in time to generate a binary with the dev backend.

Wizard ish (Dec 07 2024 at 20:20):

Yeah that’s the thing ain’t it, can’t be to complex

Brendan Hansknecht (Dec 07 2024 at 20:20):

That is why I would learn towards just solving a bespoke solution for generic64 and not worring about 32 bit at all

Brendan Hansknecht (Dec 07 2024 at 20:21):

32bit really needs a different asm trait. Which means it needs an entirely different ir.

shua (Dec 07 2024 at 20:21):

yeah, I don't know what the right answer is, but "complex structs on the stack, primitives in registers" sounds correct. The question is what to do with u128/i128 and Dec. Are these primitives or complex?

Wizard ish (Dec 07 2024 at 20:22):

Although I feel that the stack should be abstracted as “struct registers” just to be the abstraction, although that might cause a performance problem

Brendan Hansknecht (Dec 07 2024 at 20:22):

I think you are probably right

Brendan Hansknecht (Dec 07 2024 at 20:23):

About the abstraction

Brendan Hansknecht (Dec 07 2024 at 20:24):

The assembly would have functions to load from and store to "struct registers". Then it would have tons more functions to operate on normal registers. It would later calculate stack offsets for all struct registers (maybe even using lifetime analysis to reuse stack space)

Wizard ish (Dec 07 2024 at 20:25):

Wait so I was right about it being a good idea or a bad idea?

Wizard ish (Dec 07 2024 at 20:25):

Also it could just have functions to go from struct to normal

Brendan Hansknecht (Dec 07 2024 at 20:25):

That having struct registers as an abstraction for the stack is a good idea

Brendan Hansknecht (Dec 07 2024 at 20:34):

shua said:

yeah, I don't know what the right answer is, but "complex structs on the stack, primitives in registers" sounds correct. The question is what to do with u128/i128 and Dec. Are these primitives or complex?

I think they have to be seen as complex due to no real assembly operations being available for them. If we want to do a minor optimization, we could make a special optimization to allow storing them in 2 registers.

shua (Dec 07 2024 at 20:53):

I think you're right. I only have one counterpoint which is in x86_64, mul rcx will have as dest, rax and rdx which is a 128bit value split over two 64bit registers.

Brendan Hansknecht (Dec 07 2024 at 22:16):

Yeah, it is definitely imperfect. We need a smart way for assembly operations to claim registers temporarily.

Wizard ish (Dec 08 2024 at 00:14):

Hmmm well for the meantime tomorrow ill write up something to describe it just so we’re all on the same page

Brendan Hansknecht (Dec 08 2024 at 00:35):

:folded_hands:

Wizard ish (Dec 08 2024 at 18:09):

Oh, and one final thing, how should results be given, should they (all the following all accomplish

$C=A+B$

Be stored in an Accumulator (ACC)

add A B
mov C ACC

Be stored in a operand (to the function itself)

mov C A
add C B

Or stored in a operand given seperatly

add C A B

Wizard ish (Dec 08 2024 at 18:19):

also IMO it shoud be called ROAR :)
ROAR = Roc Optimizable Abstract Representation

Ayaz Hafiz (Dec 08 2024 at 18:51):

I would suggest (2)

Wizard ish (Dec 08 2024 at 18:53):

yeah that is the most common, but i also like option 1, as it allows for calls to functions and their returns to be modeled quite simply, although of course that increases complexity

Brendan Hansknecht (Dec 08 2024 at 19:31):

Current assembly is all 3 arg like arm

Brendan Hansknecht (Dec 08 2024 at 19:31):

I think that is nicest

Brendan Hansknecht (Dec 08 2024 at 19:31):

Encode result register in the ASM directly

Wizard ish (Dec 08 2024 at 19:31):

i think all have their merits personally

Brendan Hansknecht (Dec 08 2024 at 19:31):

So it all matches 3 today

Wizard ish (Dec 08 2024 at 19:32):

although given it's purpose 3 is probably the best (it makes it very apparent what registers are being modified)

Brendan Hansknecht (Dec 08 2024 at 19:32):

Given risc v and arm are both 3 and x86 is the only special on (and it is trivial to decompose to x86), I would push for 3.

Wizard ish (Dec 08 2024 at 19:33):

ok then itll be 3 :)

Wizard ish (Dec 08 2024 at 19:33):

so i think thats it, which is to say in about fifteen minutes ill probably have another question :)

Brendan Hansknecht (Dec 08 2024 at 19:33):

Also, you can always do (add a a b for I place if register allocation realizes it is ok)

Wizard ish (Dec 08 2024 at 21:45):

Wizard ish said:

also IMO it shoud be called ROAR :)
ROAR = Roc Optimizable Abstract Representation

Also what yall think, yae or nae for ROAR

Sam Mohr (Dec 08 2024 at 21:46):

I'm here for the name. It's fun to pick something like that and have it persist for a decade or so and have people take it seriously

Wizard ish (Dec 09 2024 at 00:26):

Made a first draft of the proposal

Luke Boswell (Dec 09 2024 at 01:34):

Looking great, thanks for taking the time to do this design work. I think it will really help to clarify things.

I've copied some random notes from skimming back through the thread.

I think it would be helpful to provide a little more context/background or a summary to motivate this change. We can talk about how the dev backend is currently very fast but maybe fragile or overly complicated... (my words, no idea if this is accurate).

Goal: reduce complexity of the dev backend related to where data is stored and how it is loaded
Goal: maintain fast compilation speed for the development backend, while improving runtime performance.

we definitely want ok register allocation in the long term. It will be important to get enough performance that end users actually want to use the dev backend for reasonably sized projects.

Goal: provide register allocation

it will be useful for making the dev backend output assembly easier to read and debug.

Goal: improve maintainability for dev backend

Does this improve our ability to test things? should we mention that?

change the Dev backend to have an ir that can be stored in a flat array (rather than a tree structure/not existing), that is essentially a portable assembly (one op per function in the current Assembler trait). Main gain of doing this is for future minor optimizations that we want to run on the ir. The big one being register allocation. With the current IR, register allocation, and lifetimes in general, are very much not a clean problem. Mono is a tree. It also is not nearly as fine grain as individual assembly instructions. So it is very hard to have any idea how many registers will be needed for a specific mono ir node.

Should we include this motivation in the ROAR proposal?

used only for 64-bit development backends

I may have missed it, but is this mentioned in the ROAR proposal? 32-bit architectures are out of scope.

"It would also just be for defining within a single procedure and any jmp type instructions would just use a integer as the target that is which other instruction in the array to jump to."

Is this covered in the proposal?

Possible future optimisations
- grouping instructions for final output
- intentionally reusing source and dest reg to avoid extra data movement instructions
- freeing up the base pointer by calculating all offsets with the stack pointer
- storing some aggregate values in registers if available instead of on the stack
- maybe some minor and quick rewrites.

Would be good to include future work in the context of ROAR

All "extra" optimizations can be evaluated one at a time to see if they meet our tradeoffs.

Future optimizations can be evaluated individually. how would we do this? is this something we can benchmark easily?

it should be possible to run this part of the dev backend in parallel. Only merging the final generated assembly into an output object file.

Is this possible or a goal? would be good to mention it.

Wizard ish (Dec 09 2024 at 01:52):

Ok, yeah Ill get to adding those (thanks for taking time to review it)

Brendan Hansknecht (Dec 09 2024 at 19:14):

Sorry for being slow about commenting on this (energy for roc has been low lately). Will try to write up more comprehensive comments hopefully this afternoon PST.

Wizard ish (Dec 09 2024 at 21:41):

I have two ideas, both of which involve function calls, that I'm interested to hear your opionions on:

While local jumps have been covered as being absolute addresses, I think that there should be a more adressing modes. As mentioned before, ROAR will diffenitly be split up into functions, with each one being a discrete chunk of code. However, I think we should also have larger segments, with those being groups of function, each one reperesenting a single program unit (a library or file, perhaps)
Relatedly, ROAR does not have a way to deal with IO and other specific needs. As the goal of ROAR is to be simple, I would propose a simple "black box" instruction, int N (derived from the instruction in x86 for interupts) which would be do a special action N, which can't be modeled in ROAR. IMO this would go right along with Roc's notion of "platforms", and they could be correlated (although this isn't substantiated by any logical reason, just a general sort of feeling)
So yeah what do yall think about these?

Wizard ish (Dec 09 2024 at 23:38):

also just making sure about this (because its a pretty big thing) the goal would be to go from Roc->ROAR->Bin as opposed to the current Roc->Mono->Bin?

Sam Mohr (Dec 09 2024 at 23:40):

Not an expert, but presumably it'd be Roc->Mono->ROAR->Bin

Sam Mohr (Dec 09 2024 at 23:40):

Monomorphization is the process that takes generic functions and spits out statically typed functions

Sam Mohr (Dec 09 2024 at 23:41):

It's probably not worth having to duplicate that effort in ROAR

Sam Mohr (Dec 09 2024 at 23:42):

Let me read the proposal first

Wizard ish (Dec 09 2024 at 23:42):

wait a moment, is Roc Mono diffrent from .Net Mono?

Wizard ish (Dec 09 2024 at 23:42):

it's not like an extension?

Sam Mohr (Dec 09 2024 at 23:43):

I don't know dot net's Mono

Sam Mohr (Dec 09 2024 at 23:43):

But Roc mono is a lot like Rust's mono

Wizard ish (Dec 09 2024 at 23:43):

oh that's why a lot of stuff about it didn't make much sense

Sam Mohr (Dec 09 2024 at 23:44):

It's a phase of the compiler after type checking where we take type checked code and spit out IR that is simple yet concrete

Sam Mohr (Dec 09 2024 at 23:44):

So if I have a function that typechecked to numToStr : Num * -> Str

Sam Mohr (Dec 09 2024 at 23:44):

And I use it with the inputs U64 and I8

Sam Mohr (Dec 09 2024 at 23:45):

I'll get two functions out of mono, numToStr : U64 -> Str and numToStr : I8 -> Str

Wizard ish (Dec 09 2024 at 23:45):

Yeah that makes more sense...
I was wondering why Roc used the Microsoft IR, but this makes way more sense

Sam Mohr (Dec 09 2024 at 23:45):

Now, that's based on my incomplete understanding of what:

The part of the code I've read said
How I understand Rust's mono works
How I've have our code explained at a high level

Sam Mohr (Dec 09 2024 at 23:46):

https://en.wikipedia.org/wiki/Monomorphization

Sam Mohr (Dec 09 2024 at 23:46):

Monomorphization is good because it trades a larger binary size for better performance

Sam Mohr (Dec 09 2024 at 23:47):

If you don't do monomorphization, you end up with a single function that needs to check the input's type at runtime

Sam Mohr (Dec 09 2024 at 23:47):

More or less

Wizard ish (Dec 09 2024 at 23:50):

I thought Mono Ir refered to this :speechless:

Luke Boswell (Dec 09 2024 at 23:51):

https://github.com/roc-lang/roc/tree/main/crates#roc-compiler-stages

This might be helpful

Sam Mohr (Dec 09 2024 at 23:52):

Specialize is the term in that diagram

Wizard ish (Dec 09 2024 at 23:52):

yeah that was my bad for really reading things too quickly

Sam Mohr (Dec 09 2024 at 23:52):

As in "take each specific use of generic entities and make a specialized version"

Sam Mohr (Dec 09 2024 at 23:52):

Welcome to the club

Sam Mohr (Dec 09 2024 at 23:53):

Don't worry about it too much. I think the more important part is that you're helping us make a big improvement part to a tricky part of the compiler. That's awesome!

Wizard ish (Dec 09 2024 at 23:54):

yeah, well fourtanetly it dosen't actually affect my understanding of it that much, as the idea is basically the same, but thank you for the patience :)

Sam Mohr (Dec 09 2024 at 23:55):

Of course

Brendan Hansknecht (Dec 10 2024 at 01:19):

Wizard ish said:

I have two ideas, both of which involve function calls, that I'm interested to hear your opionions on:

While local jumps have been covered as being absolute addresses, I think that there should be a more adressing modes. As mentioned before, ROAR will diffenitly be split up into functions, with each one being a discrete chunk of code. However, I think we should also have larger segments, with those being groups of function, each one reperesenting a single program unit (a library or file, perhaps)

Relatedly, ROAR does not have a way to deal with IO and other specific needs. As the goal of ROAR is to be simple, I would propose a simple "black box" instruction, int N (derived from the instruction in x86 for interupts) which would be do a special action N, which can't be modeled in ROAR. IMO this would go right along with Roc's notion of "platforms", and they could be correlated (although this isn't substantiated by any logical reason, just a general sort of feeling)
So yeah what do yall think about these?

I don't think we'll have a need for either. The dev backend already has solutions for both.

The dev backend only ever worries about generating a single procedure at a time. As such, each procedure can be 100% self contained without any worry of what other procedures are doing. So we never need long jumps or other special things like segments. I think it would be 100% fine for ROAR to slot in such that it is only used during single function compilation. I think that it will actually fit best there.

This is the same for io. There is no need to worry about it. We never do it directly in roc. All of this will always go through a function call. So the call instruction is all you will need.

Using relocations + call instructions solve all of the above nicely.

Brendan Hansknecht (Dec 10 2024 at 01:21):

So I would imagine the compilation process as:

roc source
roc compilation to mono
For each function in mono, run the dev backend. This consists of:
1. convert to ROAR
2. run register allocation (plus any other future passes)
3. output assembly + relocations

Sam Mohr (Dec 10 2024 at 01:27):

You say each "function in mono", which is currently all top-level values, since we convert (at least most) top-level constants to thunks with force_thunk AFAICT. I presume this strategy would work if/when we decided to move towards non-thunked top level constants, but just wanna call that out in case there would be an issue

Brendan Hansknecht (Dec 10 2024 at 01:27):

Sure, we should either force thunks for the dev backend or we would also have to do constant generation, but for constants, we wouldn't need ROAR at all.

Sam Mohr (Dec 10 2024 at 01:32):

Maybe I missed something in the discussion, but it would still need to translate complex constants, right? We said above that even I128 would get multiple "registers", so a struct/discriminated union would need to be loaded like any other variable that was stored in a prior register. This should be very minor compared to the main scope of ROAR, though

Brendan Hansknecht (Dec 10 2024 at 01:37):

no, they will all just be equivalent to the constant stored on the stack. So they are just a chunk of bytes that we get the address of and load data from.

Brendan Hansknecht (Dec 10 2024 at 01:38):

So they would in the roar proposal just be represented as a structured register (which is really just a pointer to memory)

Wizard ish (Dec 10 2024 at 01:39):

Oh and relating to your comment on the PR, I've moved the proposal to it's own repo here

Brendan Hansknecht (Dec 10 2024 at 01:42):

comments on the proposal:

main idea around instructions and operands sounds great.
abstract registers and the lack of types sounds perfect
I think we should iterate on the abstract stack. I think that by default that ROAR should not care about saving registers for call instructions. I think it is safe to just use the callee saved registers first followed by the caller saved registers. When generating a call assembly, the backend assembly can add in all of the pushes and pops. If we later find this to be a bottleneck, we can add caller and callee saved register info into the register allocation pass to make it a bit smarter. Either way, this info would not be need in the IR and would likely just be part of an optimization pass. I need to think more about jump instructions, but I think due to how roc organizes ir, something similar should hold true. I belive that roc explicitly labels all data that needs to be passed to a jump instruction (such that it looks like a call instruction as well). Need to double check on that, but I think we can get away with something simple here.
For structured registers, I think that we should use an explicit load and store instruction instead of overloading the move instruction. I think it will help keep things clean and clear. We also will need some way to represent data that starts on the stack due to the c calling convention. So I think we likely need to actually store these outside of the core ir. I would propose that we have a flat dictionary of structured registers/stack values. It would be initialized with the arguments to the function that are on the stack. Instead of having a create instruction, it would be a function that adds a new index to this map. Each value in the map will hold where the data is stored and how large it is. For starters, most data will not have a real location, at some point, we will have to map all of that data to concrete stack locations (likely with a pass very similar to register allocation). Hopefully this rough idea makes sense. It essentially allows the almost everything to think in registers, but when it doesn't think in registers, it thinks in abstract memory that it can just call load and store instructions to. Later the abstract memory is mapped to real memory, but no need to change the ir, just the list of memory locations. Also, just realized, doesn't need to be a map, would just be a list where the index is the structured register number.

Wizard ish (Dec 10 2024 at 02:14):

Just one thing, if registers aren’t being explicitly saved, how will the program differentiate registers that are set to for the purpose of returning values and those that simply hold intermediate values? Would the plan be for the return and call op take an argument that is what register should be returned / saved to?

Brendan Hansknecht (Dec 10 2024 at 02:54):

Oh, this is actually a really good point I hadn't full though through. In the current design, due to having access to mono, a call or return instruction is given a list of symbols. Those symbols are the loaded and dealt with according to the c abi for that system. If we are abstracting into ROAR, we have to decide how to deal with that mapping and call convention. Hmm.

So that gives three main options I can think of:

Define a restricted machine agnostic call convention such that roar deals with the call conv including stack vs register.
Have roar be partially machine specific. It would query the backend to learn about it's call conv and then inserts structure register stores and data swaps to prep everything for the call.
For call and return instructions, keep a mapping to the original mono nodes in roar. When generating assembly for a call instruction we pass in the layout info from mono along with a list of abstract and structural registers for the arguments. The backend then deals with loading those to match the calling convention.

I lean 3, but am open to other opinions and ideas for sure.

Wizard ish (Dec 10 2024 at 12:47):

Personally I think 1 is better, as it keeps the process pure (ie, machine agnostic and seperate from mono), which would probably help with maintainability

Wizard ish (Dec 10 2024 at 16:00):

Also if we restricted ROAR functions to returning one argument, we could just have ‘call’ instructions take an output register that would the be set to a corresponding register returned with ‘ret’
As for passing arguments, this could be even more simple, just pass them in registers, and only differentiate it from the original once it’s been set

Brendan Hansknecht (Dec 10 2024 at 16:09):

It definitely has to learn to speak c abi no matter what, sadly. It has to be callable from the platform and be able to call the zig builtins and platform effects.

Brendan Hansknecht (Dec 10 2024 at 16:22):

This is why the dev backend currently speaks c abi. I found it easier to be consistent than makes exceptions for the edges.

Wizard ish (Dec 10 2024 at 20:11):

but shouldn't this be abstracted as part of the ROAR->Binary interface, given that one of the primary goals is to be platform agnostic? (which ccould be solved by somethig like this?)

Brendan Hansknecht (Dec 10 2024 at 21:11):

I don't think the goal of roar is really to be platform agnostic. It is trying to abstract away a level of assembly details, but register allocation (which will run on roar) is platform specific. On top of that specific handling of the stack, that will come out of converting from structured registers to stack use is also platform specific.

So roar should be expected to pull in platform details. That said, I think that 3 is platform agnostic. Roar will simply record the information it needs to pass to the final assembly generation for it to deal with generating the correct call conv.

Brendan Hansknecht (Dec 10 2024 at 21:14):

The reason I don't think we need something like black box. is that nothing is black box. They are just function calls. The same as any other function call.

Wizard ish (Dec 10 2024 at 21:17):

ok got it

Brendan Hansknecht (Dec 10 2024 at 21:21):

I'm not sure how much you have look at the current traits, but we actually have one for assembly and a second for callconv. If you look at a callconv trait impl this is what making a function call looks like today: https://github.com/roc-lang/roc/blob/f8b0be725c1677dd1c39cc68723fe4ad5ce09ade/crates/compiler/gen_dev/src/generic64/x86_64.rs#L292-L337

ROAR could call essentially the same primitive.

Brendan Hansknecht (Dec 10 2024 at 21:23):

Just instead of using Symbol from mono, it would use abstract and struct registers.

Wizard ish (Dec 10 2024 at 21:37):

so if i'm understanding correctly you would just have the call function accept any number of args, including one for where to place the return value?

Brendan Hansknecht (Dec 10 2024 at 21:41):

yep

Brendan Hansknecht (Dec 10 2024 at 21:41):

With the minor caveat that we may want to handle storage in a special way to avoid bloating the roar enum in rust. Don't want to take up too many bytes per assembly instruction

Wizard ish (Dec 12 2024 at 02:22):

I've been making a little progress with the proposal, just wanted to make sure I'm putting down correctly what yall mean, in particular about the function calls https://github.com/wizard7377/ROAR

Brendan Hansknecht (Dec 12 2024 at 04:57):

I'm really liking how this is looking!

Brendan Hansknecht (Dec 12 2024 at 04:58):

I'm not sure the value of making cmp, jmp, ret, etc functions return a null value, but if in practice it helps build out a consistent interface, that sounds fine. I assume the assembly will map to rust enum. The powerful pattern matching in rust should make it ok for some ops to differ.

Brendan Hansknecht (Dec 12 2024 at 05:00):

ROAR is that every function is pure (with respect to registers)

I'm not 100% sure the caveats of the meanings here, but it is important to note that some values passed into function will get mutated. That said, it will never be a register value. It will always be a pointer to data on the heap that gets mutated (like a list with a refcount of one).

Brendan Hansknecht (Dec 12 2024 at 05:00):

I think this likely falls into what you are saying, but wanted to clarify

Brendan Hansknecht (Dec 12 2024 at 05:02):

Also, for structured registers (at least as roc exists today), all loads and stores should be to a compile time known offset. Also, we will need the ability to load at different widths, but that can just be a load8 vs load32 vs load64.

Brendan Hansknecht (Dec 12 2024 at 05:02):

And roc will make sure that all loads are properly aligned based on the layout of the data. So we should have no concerns there.

Brendan Hansknecht (Dec 12 2024 at 05:03):

Also, for jumps, I know that roc only has structured loops, so we may be able to take advantage of something there. But I'm not sure what exactly. Just noting.

Brendan Hansknecht (Dec 12 2024 at 05:05):

Oh, one other note for jmps. Roc's jumps technically have arguments, but we should be able to map that into editing the same roar registers over and over again.

Wizard ish (Dec 12 2024 at 14:53):

Yeah, I might’ve forgotten about the references not being mutated dosent correlate to the referents being mutated

Wizard ish (Dec 12 2024 at 14:56):

Also can you explain a little bit more about what you mean about them taking arguments, do you mean they should be similar to RISC branch instructions (ie, the instruction contains the condition)

Brendan Hansknecht (Dec 12 2024 at 16:32):

I just mean that in mono ir, jumps look a lot like function calls.

Brendan Hansknecht (Dec 12 2024 at 16:34):

There is a block declared with a joinpout. The join point has a list of args. Within the block you can access any constants from before the jointpoint and the args as well. At some point in the main code flow, it calls a jump instruction to a join point with a list of args.

Brendan Hansknecht (Dec 12 2024 at 16:37):

In roar, this could be modeled as raw assembly style jumps with register updates before the jump to set the args. Or it could be modeled closer to a function call. Not sure either is better, just worth noting and thinking about.

Wizard ish (Dec 12 2024 at 20:15):

yeah personally im much more familiar with x86 and 6502 assembly, and in those, jmps basically only change the instruction pointer, so that just seems more natrual to me, but only because that's the only thing ive ever encountered

Brendan Hansknecht (Dec 12 2024 at 20:41):

Yeah, rocs mono isn't trying to be assembly. So makes sense it has a high level concept

Wizard ish (Dec 13 2024 at 21:27):

should ROAR have seperate floating point registers?

Brendan Hansknecht (Dec 13 2024 at 21:32):

Would be nice for a tiny bit of extra safety, but would be fine to start with unified registers and then only convert to float vs general registers during allocation.

So I think either is fine before register allocation. I lean slightly towards just having them

Wizard ish (Dec 13 2024 at 21:44):

yeah i also lean on having them, theyre isnt really a reason not to and basically all systems do so it probably will be more efficient

Wizard ish (Dec 16 2024 at 01:45):

Did some work on the proposal (still not done)... and i may have just realized gh md has a diagramming feature... https://github.com/wizard7377/ROAR#optimizations

Wizard ish (Dec 17 2024 at 02:44):

Ok I think this is the first draft, is there anything any of you think needs to be added to this (obviously needs to be cleaned up a lot)

Brendan Hansknecht (Dec 17 2024 at 02:47):

I'll read through it when I get the chance, but packing tonight and traveling tomorrow.

Wizard ish (Dec 17 2024 at 02:48):

nice safe travels :)

Brendan Hansknecht (Dec 17 2024 at 02:49):

Also, on the llvm side, I think I am going to greatly simplify abi (including the c abi we generate). Dev backend may want to do the same (might not really affect the proposal). The core change is that all aggregate types will be passed by const ref. Then only enums and numeric types are passed in registers.

Wizard ish (Dec 17 2024 at 02:50):

if anything that actually works with the proposal quite nicely

Anthony Bullard (Dec 17 2024 at 03:18):

I don't know very much about c abi, but I can say that this was a well-written doc that even a mid-wit (as regards assembly) like myself could understand

Wizard ish (Dec 17 2024 at 03:23):

Thank you :)

Anthony Bullard (Dec 17 2024 at 03:24):

So could I say (as a midwit as noted above) that this is a SSA representation that's meant to be a level higher than say LLVMIR because it can assume a lot of Roc's language semantics?

Anthony Bullard (Dec 17 2024 at 03:25):

But still low-level enough that we can use those semantics to drive efficient optimization passes before codegen?

Wizard ish (Dec 17 2024 at 03:26):

Yes, that is the primary purpose

Anthony Bullard (Dec 17 2024 at 03:26):

Ok, I passed my comprehension test. I'm going to bed now :-)

Anthony Bullard (Dec 17 2024 at 03:27):

I'll be following the (assumed) implementation of this, as I am very interested in learning a lot in this area

Ayaz Hafiz (Dec 18 2024 at 05:50):

A few thoughts on the ROAR document:

+1 on abstract registers. Unless the stack+register allocation model should be consistent between all targets (which is maybe reasonable), I think it is wise to try to avoid modeling finite registers or the stack.
I would suggest removing the null register. My impression is that this will not make it easier to operate over the IR, unless nodes that write to the null register are strictly typed - in which case, it's equivalent to not typing the null register at all. This is based on experience of writing IRs for languages to translation to C/Asm (C->asm translation is fairly trivial besides regalloc). One way model this without null registers is have type Stmt { Load (Register, Expr), Store (Register, Expr), Ret, ...} and type Expr { Add (Register, Register), Sub (Register, Register), ... }
Whenever possible I would suggest favoring convenience to translation of machine code over convenience for optimizations. I don't think the dev backend should be for producing good machine code - it should be for producing machine code quickly in dev builds. High-level optimizations should happen earlier in the compilation pipeline, and low-level optimizations (e.g. elision) are best served by LLVM or another general-purpose toolkit that has already solved this problem. For example, in the proposal, I would remove all variants of jmp other than jmp itself, and model conditional jumps via branches to basic blocks. Reducing the number of nodes in the IR makes it easier to implement a backend, which I think is the best cost function for the IR.
I would suggest taking a look at the Mono IR and figuring out where the delta between that today and what the dev backends need to lower is, and what subset is best served by covering in ROAR. In my head, the biggest common threads are (1) lowering low-levels and (2) lowering ref-counting and equality operations - I would optimize the ROAR instruction set for those.

Wizard ish (Dec 18 2024 at 12:37):

I personally think the null register is quite important. It allows statements to be consistent internally as one structure, rather than having expressions which can be statements, which would significantly complicate the process of obtaining modified registers. In addition, sometimes one wants to have expressions which don’t write to anything, for example on x64 ‘cmp’ vs ‘sub’, where the only difference is whether they write to something or just set flags. In addition, the null register wouldn’t be that hard to implement, and in case this wasent clear the null register can’t be read, it just serves as a “trash” register

Wizard ish (Dec 18 2024 at 12:49):

Also the purpose of this is to be stored in a flat array, which wouldn’t allow for nested expressions

Wizard ish (Dec 18 2024 at 12:51):

(Note, refer back to the first message in this channel)

Wizard ish (Dec 18 2024 at 12:52):

Brendan Hansknecht said:

This is a pretty open thought, but I am questioning if we should change the Dev backend to have an ir that can be stored in a flat array (rather than a tree structure/not existing), that is essentially a portable assembly (one op per function in the current Assembler trait).

So it would kinda be akin to llvm ir, but just what we need to support the dev backend. It would also just be for defining within a single procedure and any jmp type instructions would just use a integer as the target that is which other instruction in the array to jump to.

The main gain of doing this is for future minor optimizations that we want to run on the ir. The big one being register allocation. With the current IR, register allocation, and lifetimes in general, are very much not a clean problem. Mono is a tree. It also is not nearly as fine grain as individual assembly instructions. So it is very hard to have any idea how many registers will be needed for a specific mono ir node.

I think that we should have a very fast pass that runs over mono and blits out a flat array of assembly like instructions. (essentially will be equivalent to the current dev backend but less complex due to having infinite registers for example).

Then we would run register allocation on the flat array (lots of good algorithms with linear time exist for this). Given the assembly ir won't perfectly match to assembly instructions, we will have a backend dependent function that will label exactly how many temporary registers an instruction will need. That is all the information we should need to deal with register allocations.

Once that works, we can consider other minor but fast optimizations:

grouping instructions for final output

intentionally reusing source and dest reg to avoid extra data movement instructions

freeing up the base pointer by calculating all offsets with the stack pointer

storing some aggregate values in registers if available instead of on the stack

maybe some minor and quick rewrites.

Especially given we generate at the procedure level, it should be possible to run this part of the dev backend in parallel. Only merging the final generated assembly into an output object file.

I think this will help clean up some of the complexities of the dev backend related to where data is stored and how it is loaded. It should help separate that work cleanly from the rest of the ir selection at a minimum. It also gives us some opportunities for minor but important optimization passes over dev backend ir.

What are people's thoughts on this? I would guess Folkert de Vries and Richard Feldman would have the most input.

Wizard ish (Dec 20 2024 at 19:43):

Also, just making sure of something: ROAR would require reimplementating the Backend trait, but not the Assembler and CallConv trait (as the backend trait takes in mono as input)

Brendan Hansknecht (Dec 20 2024 at 20:15):

Yeah, might even end up replacing the backend trait with just a raw method to go from mono to roar and a separate method that directly calls out to the assembler and call conv traits

Wizard ish (Dec 20 2024 at 20:25):

also was just thinking, it might be useful to have a concept of "true global variables" in ROAR. While struct register persist, they require refrences being passed around, and what I'm thinking of would just be accessed with it's symbol. This wouldn't be prefered (ie, local vars would still be in abstract registers) but i think might make it a lot more clean

Wizard ish (Dec 20 2024 at 21:25):

oh also if no one has anymore comments on this, ill probably start working on it tommorow

Brendan Hansknecht (Dec 20 2024 at 21:36):

While struct register persist, they require refrences being passed around, and what I'm thinking of would just be accessed with it's symbol.

I'm not sure I fully follow what you are suggesting here

Wizard ish (Dec 20 2024 at 21:38):

basically having things like struct registers, but instead of symbols being dynamically generated their symbols persist from mono

Brendan Hansknecht (Dec 20 2024 at 21:42):

I think most of the symbols in mono are locals. There are true globals. For those, we will need some solution. The final output is a chunk of data and a relocation to load that data locally in a function.

Brendan Hansknecht (Dec 20 2024 at 21:47):

Also, my only main comment on the proposal is that I would try to stay very close to the assembler trait to start with (in terms of which operations to support). For example, it looks like we currently only need a jmp and jne instruction. Clearly more optimizations are possible and in some cases other jump instructions might map better, but I think it will make the transition smoothest if we start as close as possible. Then incrementally make things more powerful.

Brendan Hansknecht (Dec 20 2024 at 22:04):

Oh and instead of using things like u32 directly, wrap them in nominal types to avoid using an index as a register or a float register as an int register etc

Brendan Hansknecht (Dec 20 2024 at 22:05):

I think that was already the plan, just explicitly clarifying

Wizard ish (Dec 21 2024 at 14:20):

also just so yall can comment if you like, i made a draft pr at #7397

Wizard ish (Dec 21 2024 at 15:02):

first piece of ROAR (handcoded into rust), only the printing is done

[%32] {
    %78 <- addu %73 %32
    _ <- subs %73 %32
}

also i do plan to make the code not break every single style guideline possible

Brendan Hansknecht (Dec 21 2024 at 17:16):

Left a handful of comments

Wizard ish (Dec 21 2024 at 18:02):

ok, yeah i wasn't thinking that much about the 128 bit types, those might be a small problem

Wizard ish (Dec 21 2024 at 18:03):

actually having the abstract registers be sized might not be that bad of an idea

Wizard ish (Dec 21 2024 at 18:05):

also maybe seperating structure refrences from regular registers

Brendan Hansknecht (Dec 21 2024 at 18:17):

I think it should be fine to see 128 bit values as a structured registers containing two 64bit registers

Brendan Hansknecht (Dec 21 2024 at 18:18):

That is how operations function on them anyways

Wizard ish (Dec 21 2024 at 18:18):

okay

Brendan Hansknecht (Dec 21 2024 at 18:18):

Mostly called them out to make sure they are being thought of

Wizard ish (Dec 22 2024 at 17:22):

Relating to your comment here, what do you think should be the max size per instruction?

Brendan Hansknecht (Dec 22 2024 at 18:08):

I would hope it can fit into 4 u64s, but that may not be realistic.

1 for op code and any sort of metadata
1 for output register
2 for input registers

If an op might need more than 2 input registers. The input u64s would be instead a pointer and length to a sideband array of extra inputs.

Wizard ish (Dec 22 2024 at 20:23):

i dont it will be naccasary to have them actually modeled as inputs, as they should be consant, and therefore are not a part of the internal logic

Wizard ish (Dec 22 2024 at 20:50):

also just realized it was never discussed if function pointers should be included or not

Brendan Hansknecht (Dec 22 2024 at 21:13):

I think a function pointer can be treated the same as an int constant with a special op to create it.

Wizard ish (Dec 22 2024 at 21:14):

ok!

Wizard ish (Dec 23 2024 at 00:56):

why does roc use bump allocation? It does seem to speed it up, but it makes the code much less clean (same thing with layout interning)

Brendan Hansknecht (Dec 23 2024 at 01:00):

Cause we want a really fast compiler. That is also why it is written in rust

Brendan Hansknecht (Dec 23 2024 at 01:00):

Obviously there is still large room for improvement, but we care a lot about compile time.

Wizard ish (Dec 23 2024 at 01:00):

fair enough

Wizard ish (Dec 25 2024 at 18:54):

Also, I won’t be able to work on this for a couple days

Wizard ish (Dec 25 2024 at 18:54):

On a related note, anyone know of any cheap, small, low-middle end pc’s?

Wizard ish (Dec 25 2024 at 18:56):

(I somehow managed to break my laptops keyboard by cleaning it… so now the whole thing is crippled)

Wizard ish (Dec 26 2024 at 16:08):

update on this, the classic strategy of doing nothing about it has worked, and now my keyboard is working again :partying_face:

Wizard ish (Jan 18 2025 at 23:15):

Hey everyone, sorry about the slow progress, it’s just that school takes up a lot of my time, and in particular recently I’ve just been doing way too much stuff, hopefully however I’ll be able to do more work on ROAR this week (and after that) :)

Luke Boswell (Jan 18 2025 at 23:16):

It's alg, we're all volunteers here juggling life things. :smiley:

Wizard ish (Jan 19 2025 at 18:54):

error[E0515]: cannot return value referencing local variable `env`
   --> crates/compiler/build/src/program.rs:589:5
    |
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                     ---- `env` is borrowed here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
600 | |         },
601 | |     )
    | |_____^ returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing local variable `layout_interner`
   --> crates/compiler/build/src/program.rs:589:5
    |
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                                         -------------------- `layout_interner` is borrowed here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
600 | |         },
601 | |     )
    | |_____^ returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing local variable `interns`
   --> crates/compiler/build/src/program.rs:589:5
    |
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                           ------------ `interns` is borrowed here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
600 | |         },
601 | |     )
    | |_____^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `interns` because it is borrowed
   --> crates/compiler/build/src/program.rs:597:13
    |
549 |   fn gen_from_mono_module_dev_assembly<'a>(
    |                                        -- lifetime `'a` defined here
...
562 |           mut interns,
    |           ----------- binding `interns` declared here
...
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                           ------------ borrow of `interns` occurs here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
597 | |             interns,
    | |             ^^^^^^^ move out of `interns` occurs here
...   |
600 | |         },
601 | |     )
    | |_____- returning this value requires that `interns` is borrowed for `'a`

error[E0505]: cannot move out of `layout_interner` because it is borrowed
   --> crates/compiler/build/src/program.rs:598:13
    |
549 |   fn gen_from_mono_module_dev_assembly<'a>(
    |                                        -- lifetime `'a` defined here
...
564 |           mut layout_interner,
    |           ------------------- binding `layout_interner` declared here
...
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                                         -------------------- borrow of `layout_interner` occurs here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
598 | |             layout_interner,
    | |             ^^^^^^^^^^^^^^^ move out of `layout_interner` occurs here
599 | |             expectations: loaded.expectations,
600 | |         },
601 | |     )
    | |_____- returning this value requires that `layout_interner` is borrowed for `'a`

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.

anyone know how to fix this? (full state is here https://github.com/wizard7377/roc/tree/roar)

Wizard ish (Jan 19 2025 at 18:55):

(note the change in this signature https://github.com/wizard7377/roc/blob/d4de3b1545fcfd3bbb4c6ccc0ee7b918c663208d/crates/compiler/gen_dev/src/object_builder.rs#L49)

Wizard ish (Jan 20 2025 at 13:52):

anyone can help (rust should definitly have something to bypass the borrow check in development...)

Anton (Jan 20 2025 at 14:39):

I can look when I'm done with #7530

Wizard ish (Jan 20 2025 at 14:40):

thank you!
(this has been driving me insane)

Anton (Jan 20 2025 at 16:22):

@Wizard ish using the latest roar branch commit I currently only get a mismatched types error at crates/compiler/build/src/program.rs:456:17. That's with cargo build --release. How can I produce your borrow checker errors?

Wizard ish (Jan 20 2025 at 16:24):

run with -F use_roar

Wizard ish (Jan 20 2025 at 16:25):

(the flag is temporary and will be replaced with a sensible build system later)

Wizard ish (Jan 20 2025 at 16:29):

also if it's helpful this is the mess that i use to compile it clear && RUSTFLAGS="-Awarnings" cargo run -F use_roar -- --dev ../tests/fizzbuzz.roc

Wizard ish (Jan 20 2025 at 17:41):

sorry it's really a mess

Anton (Jan 20 2025 at 18:20):

No problem, we all have to learn by doing :)

Brendan Hansknecht (Jan 20 2025 at 18:27):

fn build_module_help<'a, 'r : 'a>(
    env: &'r Env<'a>,
    interns: &'r mut Interns,
    layout_interner: &'r mut STLayoutInterner<'a>,
    target: Target,
    procedures: MutMap<(symbol::Symbol, ProcLayout<'a>), Proc<'a>>,
) -> Object<'a> {

Why are 'a and 'r linked? I don't think that is correct. 'r : 'a. Not that I 100% know what that syntax means.

Anyway, I think you should be able to fully remove 'r?

Wizard ish (Jan 20 2025 at 18:30):

yeah it was that i changed it bc in the Converter it (the borrow checker) required 'b to outlive 'a (the Converter's 'a), which then required 'r to outlive 'a (of the above funciton)

Brendan Hansknecht (Jan 20 2025 at 18:32):

Something is definitely wrong is 'r needs to outlive 'a. 'a is essentially a global lifetime. 'r is temporary for the current function. So the code creating that requirement must be wrong

Wizard ish (Jan 20 2025 at 18:34):

Oh wait :skull: i somehow forgot the most crucial point, without that requirement, i get this error

error: lifetime may not live long enough
   --> crates/compiler/gen_dev/src/roar/convert.rs:240:22
    |
162 | ...c, 'a: 'c, 'b: 'c> Converter<'a, 'b>
    |       --      -- lifetime `'b` defined here
    |       |
    |       lifetime `'a` defined here
...
240 | ...          &mut sym_map.insert(*sym, (Output::Register(new_reg.clone()), repr, form...
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'b` must outlive `'a`
    |
    = help: consider adding the following bound: `'b: 'a`

error: could not compile `roc_gen_dev` (lib) due to 1 previous error

Wizard ish (Jan 20 2025 at 18:34):

(all the random lifetimes are me just trying to fix this)

Brendan Hansknecht (Jan 20 2025 at 18:38):

I think you have a loop that is angering rust. env -> arena -> converter -> back to env.

Brendan Hansknecht (Jan 20 2025 at 18:39):

That in general won't work

Brendan Hansknecht (Jan 20 2025 at 18:39):

likely converter shouldn't be allocated in the env at all

Brendan Hansknecht (Jan 20 2025 at 18:39):

Should just be passed around on the stack

Brendan Hansknecht (Jan 20 2025 at 18:40):

Then just converter -> env, interner, etc

Brendan Hansknecht (Jan 20 2025 at 18:40):

no loop

Wizard ish (Jan 20 2025 at 18:41):

Wizard ish (Jan 20 2025 at 18:45):

Wait so onlyy the converter object or also everything created by it

Brendan Hansknecht (Jan 20 2025 at 18:46):

I assume most things can be created in the env. The problem with converter is specifically the loop.

Brendan Hansknecht (Jan 20 2025 at 18:46):

so just converter

Brendan Hansknecht (Jan 20 2025 at 18:48):

And I think you will be a able to remove a bunch of type variable like &'c mut self will just be &mut self

Brendan Hansknecht (Jan 20 2025 at 18:49):

And I think proc_queue: RefCell<Vec<'b, (symbol::Symbol, &'b ir::ProcLayout<'b>, &'b ir::Proc<'b>)>>, Likely should be all 'a? Cause those are all allocated in the env. So they have a lifetime same as the bump in the env.

Wizard ish (Jan 20 2025 at 18:51):

yeah this is a little bit of a mess...

Brendan Hansknecht (Jan 20 2025 at 18:51):

Part of learning rust in a codebase that uses arenas (thus tons of extra lifetimes to manage)

Wizard ish (Jan 27 2025 at 01:12):

the good news is it's fixed :)
the bad news is that it was fixed by using incredibly bad practice to get around the borrow checker...

Wizard ish (Feb 02 2025 at 17:55):

Hey, given the plans to completly rewrite the compiler, does it make sense to continue this project in Rust (as opposed to switching it to Zig)?

Richard Feldman (Feb 02 2025 at 18:02):

I'd say switch to zig!

Wizard ish (Feb 02 2025 at 18:02):

okay :)

Brendan Hansknecht (Feb 02 2025 at 18:05):

As a note, some of the rewrite plans may make this project obsolete. As part of the switch, we want to try using an interpreter instead of the dev backend.

Brendan Hansknecht (Feb 02 2025 at 18:06):

We need an interpreter anyway for constant folding and some other work and we think that an interpreter should be capable of running fast enough that it makes the dev backend a lot less useful as a proposition

Brendan Hansknecht (Feb 02 2025 at 18:06):

So we want to start by trying to only have the interpreter and skipping the dev backend.

Wizard ish (Feb 02 2025 at 18:07):

hmmm okay, although still might be useful as a sort of in between for the interpreter and the new mono ir

Brendan Hansknecht (Feb 02 2025 at 18:14):

Interpreter is planned to run higher up the stack (it may still want a specific IR, but haven't thought through the details yet).

Brendan Hansknecht (Feb 02 2025 at 18:15):

Totally will be lots of things to help with and IR design work, just might not map straight to roar.

Brendan Hansknecht (Feb 02 2025 at 18:24):

I'm really sorry about this and any time that was wasted. The big zig rewrite seems like the best time to try something like only using an interpreter? Assuming only using an interpreter dosen't work out or if we end up wanting a dev backend again, this work would revive, but at least for now, the hope is to avoid any dev backends if possible.

Wizard ish (Feb 02 2025 at 18:50):

Nah it’s all cool, plus I’ve been putting off learning zig so this’ll be fun

Richard Feldman (Feb 02 2025 at 18:50):

I appreciate your having an awesome attitude about it! :smiley:

Last updated: Jul 26 2025 at 12:14 UTC

Stream: compiler development

Topic: Dev Backend IR

Brendan Hansknecht (Feb 01 2024 at 22:33):

Richard Feldman (Feb 01 2024 at 23:19):

Richard Feldman (Feb 01 2024 at 23:20):

Richard Feldman (Feb 01 2024 at 23:21):

Brendan Hansknecht (Feb 01 2024 at 23:27):

Brendan Hansknecht (Feb 01 2024 at 23:28):

Folkert de Vries (Feb 01 2024 at 23:28):

Brendan Hansknecht (Feb 01 2024 at 23:37):

Richard Feldman (Feb 01 2024 at 23:50):

Richard Feldman (Feb 01 2024 at 23:50):

Brendan Hansknecht (Feb 01 2024 at 23:52):

Richard Feldman (Feb 02 2024 at 00:04):

Brendan Hansknecht (Feb 02 2024 at 00:07):

Wizard ish (Dec 03 2024 at 22:02):

Richard Feldman (Dec 03 2024 at 22:06):

Richard Feldman (Dec 03 2024 at 22:06):

Brendan Hansknecht (Dec 03 2024 at 22:15):

Brendan Hansknecht (Dec 03 2024 at 22:15):

Wizard ish (Dec 03 2024 at 22:35):

Brendan Hansknecht (Dec 03 2024 at 22:41):

Brendan Hansknecht (Dec 03 2024 at 22:44):

Brendan Hansknecht (Dec 03 2024 at 22:45):

Brendan Hansknecht (Dec 03 2024 at 22:46):

Wizard ish (Dec 03 2024 at 22:47):

Brendan Hansknecht (Dec 03 2024 at 22:47):

Brendan Hansknecht (Dec 03 2024 at 22:47):

Wizard ish (Dec 03 2024 at 22:47):

Brendan Hansknecht (Dec 03 2024 at 22:48):

Brendan Hansknecht (Dec 03 2024 at 22:48):

Wizard ish (Dec 03 2024 at 22:48):

Brendan Hansknecht (Dec 03 2024 at 22:50):

Richard Feldman (Dec 03 2024 at 23:57):

Wizard ish (Dec 04 2024 at 00:01):

Richard Feldman (Dec 04 2024 at 00:06):

Brendan Hansknecht (Dec 04 2024 at 02:37):

Wizard ish (Dec 04 2024 at 02:37):

Wizard ish (Dec 04 2024 at 02:38):

Brendan Hansknecht (Dec 04 2024 at 02:42):

Wizard ish (Dec 05 2024 at 01:12):

Brendan Hansknecht (Dec 05 2024 at 01:30):

Brendan Hansknecht (Dec 05 2024 at 01:30):

Wizard ish (Dec 05 2024 at 01:31):

Brendan Hansknecht (Dec 05 2024 at 01:31):

Brendan Hansknecht (Dec 05 2024 at 01:32):

Brendan Hansknecht (Dec 05 2024 at 01:32):

Wizard ish (Dec 05 2024 at 01:32):

Brendan Hansknecht (Dec 05 2024 at 01:33):

Brendan Hansknecht (Dec 05 2024 at 01:33):

Brendan Hansknecht (Dec 05 2024 at 01:34):

Brendan Hansknecht (Dec 05 2024 at 01:35):

Brendan Hansknecht (Dec 05 2024 at 01:35):

Brendan Hansknecht (Dec 05 2024 at 01:42):

Brendan Hansknecht (Dec 05 2024 at 01:42):

Wizard ish (Dec 05 2024 at 02:06):

Wizard ish (Dec 05 2024 at 02:07):

Wizard ish (Dec 05 2024 at 19:45):

Luke Boswell (Dec 05 2024 at 20:09):

Wizard ish (Dec 05 2024 at 20:10):

Luke Boswell (Dec 05 2024 at 20:10):

Wizard ish (Dec 05 2024 at 20:10):

Luke Boswell (Dec 05 2024 at 20:12):

Wizard ish (Dec 05 2024 at 20:12):

Luke Boswell (Dec 05 2024 at 20:13):

Luke Boswell (Dec 05 2024 at 20:13):

Wizard ish (Dec 05 2024 at 20:14):

Joshua Warner (Dec 06 2024 at 06:29):

Anton (Dec 06 2024 at 11:04):

Anthony Bullard (Dec 06 2024 at 11:44):

Anton (Dec 06 2024 at 11:47):

Anthony Bullard (Dec 06 2024 at 11:48):

Brendan Hansknecht (Dec 06 2024 at 16:22):

shua (Dec 07 2024 at 16:00):

Brendan Hansknecht (Dec 07 2024 at 17:15):

Brendan Hansknecht (Dec 07 2024 at 17:15):

shua (Dec 07 2024 at 18:17):

shua (Dec 07 2024 at 18:19):

Wizard ish (Dec 07 2024 at 18:26):

Brendan Hansknecht (Dec 07 2024 at 18:34):