Stream: compiler development

Topic: Dev Backend IR


view this post on Zulip Brendan Hansknecht (Feb 01 2024 at 22:33):

This is a pretty open thought, but I am questioning if we should change the Dev backend to have an ir that can be stored in a flat array (rather than a tree structure/not existing), that is essentially a portable assembly (one op per function in the current Assembler trait).

So it would kinda be akin to llvm ir, but just what we need to support the dev backend. It would also just be for defining within a single procedure and any jmp type instructions would just use a integer as the target that is which other instruction in the array to jump to.


The main gain of doing this is for future minor optimizations that we want to run on the ir. The big one being register allocation. With the current IR, register allocation, and lifetimes in general, are very much not a clean problem. Mono is a tree. It also is not nearly as fine grain as individual assembly instructions. So it is very hard to have any idea how many registers will be needed for a specific mono ir node.

I think that we should have a very fast pass that runs over mono and blits out a flat array of assembly like instructions. (essentially will be equivalent to the current dev backend but less complex due to having infinite registers for example).

Then we would run register allocation on the flat array (lots of good algorithms with linear time exist for this). Given the assembly ir won't perfectly match to assembly instructions, we will have a backend dependent function that will label exactly how many temporary registers an instruction will need. That is all the information we should need to deal with register allocations.

Once that works, we can consider other minor but fast optimizations:

Especially given we generate at the procedure level, it should be possible to run this part of the dev backend in parallel. Only merging the final generated assembly into an output object file.

I think this will help clean up some of the complexities of the dev backend related to where data is stored and how it is loaded. It should help separate that work cleanly from the rest of the ir selection at a minimum. It also gives us some opportunities for minor but important optimization passes over dev backend ir.


What are people's thoughts on this? I would guess @Folkert de Vries and @Richard Feldman would have the most input.

view this post on Zulip Richard Feldman (Feb 01 2024 at 23:19):

so my default thinking on anything to do with the dev backend is "fewer passes is faster, so let's minimize passes"

view this post on Zulip Richard Feldman (Feb 01 2024 at 23:20):

obviously there can possibly exist a way to take a given implementation and actually make a faster implementation which does more passes, but by default it's going to be slower

view this post on Zulip Richard Feldman (Feb 01 2024 at 23:21):

so my immediate reaction is "to what extent could we achieve some of these goals without introducing any new passes or IRs?"

view this post on Zulip Brendan Hansknecht (Feb 01 2024 at 23:27):

I think it is important to start with, we definitely want ok register allocation in the long term. It will be important to get enough performance that end users actually want to use the dev backend for reasonably sized projects. On top of that, it will be useful for making the dev backend output assembly easier to read and debug.

view this post on Zulip Brendan Hansknecht (Feb 01 2024 at 23:28):

I don't think it will be feasible to generate ok register allocation while depending only on Mono IR.

Or more accurately, we probably could find a way to force it to happen, but it would be exceptionally messy. The current super simple register and stack allocation strategy built on mono is already messy.

view this post on Zulip Folkert de Vries (Feb 01 2024 at 23:28):

I'll just say that in principle I'm on board with this idea. The current register allocation is fundamentally limited, I don't think there is a (good) way to traverse the tree structure and perform the register allocation in one pass

view this post on Zulip Brendan Hansknecht (Feb 01 2024 at 23:37):

so my default thinking on anything to do with the dev backend is "fewer passes is faster, so let's minimize passes"

My general thoughts are:

  1. The dev backend is already fast. Frontend and typechecking and such is likely to be the bottleneck by a large margin currently.
  2. The dev backend can easily be n times faster by running procedures in parallel.
  3. This change should make the dev backend simplier and easier to maintain my avoiding intermixing the handling of multiple different concerns.
  4. This is practically required to do ok register allocation
  5. All "extra" optimizations can be evaluated one at a time to see if they meet our tradeoffs.

view this post on Zulip Richard Feldman (Feb 01 2024 at 23:50):

so could it be done in 1 extra pass?

view this post on Zulip Richard Feldman (Feb 01 2024 at 23:50):

as in:

view this post on Zulip Brendan Hansknecht (Feb 01 2024 at 23:52):

Yeah. Though technically today we do one pass on mono to record lifetimes and in the new form we would do one pass on the new ir to do register allocation instead.

view this post on Zulip Richard Feldman (Feb 02 2024 at 00:04):

that sounds good to me then! :+1:

view this post on Zulip Brendan Hansknecht (Feb 02 2024 at 00:07):

One extra note, if we do any of the other extra optimizations that I mentioned, they would all run on this IR (and some may actually just be extensions to the linear register allocator pass). So definitely no other IRs.

view this post on Zulip Wizard ish (Dec 03 2024 at 22:02):

What's the status of this?

view this post on Zulip Richard Feldman (Dec 03 2024 at 22:06):

nobody is working on it right now, as far as I know

view this post on Zulip Richard Feldman (Dec 03 2024 at 22:06):

but it would be awesome if anyone wanted to get it going! :smiley:

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:15):

Yeah, totally unstarted

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:15):

Just documented

view this post on Zulip Wizard ish (Dec 03 2024 at 22:35):

also how would this pass / perserve registers when jumps happen (what's the calling convention)?

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:41):

Jumps or calls?

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:44):

Any, calls sound be c calling convention

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:45):

Jumps are up to us but theoretically can keep everything alive.

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:46):

If stashing out happen inside a loop from a join point, we would just want to hoist whatever possible before the join point so it isn't stashing on repeat in the loop

view this post on Zulip Wizard ish (Dec 03 2024 at 22:47):

calls

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:47):

Of course still might have stashing in a loop due to locals for the loop

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:47):

Yeah, calls are just c abi.

view this post on Zulip Wizard ish (Dec 03 2024 at 22:47):

c is just preserve the arguement on a stack right?

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:48):

Anytime one comes up, we just ban a list of registers and have to stash what's in them

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:48):

Yeah, we stash to the stack whenever out of register space

view this post on Zulip Wizard ish (Dec 03 2024 at 22:48):

got it

view this post on Zulip Brendan Hansknecht (Dec 03 2024 at 22:50):

In the GitHub issue, I link to some good article on linear time register allocation passes

view this post on Zulip Richard Feldman (Dec 03 2024 at 23:57):

@Wizard ish are you interested in diving into this? :smiley:

view this post on Zulip Wizard ish (Dec 04 2024 at 00:01):

@Richard Feldman Perhaps, although I'll have to freshen up on my assembly first

view this post on Zulip Richard Feldman (Dec 04 2024 at 00:06):

that's awesome! feel free to post any questions here! :grinning_face_with_smiling_eyes:

view this post on Zulip Brendan Hansknecht (Dec 04 2024 at 02:37):

For the most part you should not have to generate any assembly directly. Cause we already have the assembler trait. That said, brushing up on assembly will definitely help use the trait, map it to an ir, and debug issues.

view this post on Zulip Wizard ish (Dec 04 2024 at 02:37):

yeah and i also want to re-famaliarize myself with just the whole thing yk

view this post on Zulip Wizard ish (Dec 04 2024 at 02:38):

been like a year or two since ive done any assembly

view this post on Zulip Brendan Hansknecht (Dec 04 2024 at 02:42):

Makes sense

view this post on Zulip Wizard ish (Dec 05 2024 at 01:12):

only one thing i really disagree with, and that's that the ir should use a integer as the target for jumps, which, given that the whole point would be for optimization, seems like it would be pretty hard to deal with (i would make essientally "label" no-ops that are used when generating the actual machine code, so basically like assembly). For instance, if we have a function that is (this is a very contrived example and also i didn't test that this on any assembler (note that destination first is used)):

store LOC eax ;0
add ebx eax ;1
load eax [LOC] ;2
JumpTo: ;3
sub ebx #1 ;4
cmp ebx #5 ;5
jge JumpTo ;6

if we first transform the labels to the instructions they refrence (technically LOC is also a label but that's not important) we get:

store LOC eax ;0
add ebx eax ;1
load eax [LOC] ;2
nop ;3
sub ebx #1 ;4
cmp ebx #5 ;5
jge [3] ;6

Then when optimized, we get

add ebx eax ;0
sub ebx #1 ;1
cmp ebx #5 ;2
jge [3] ;3

Where the jge is now a loop, not the desired behavior! However, if we looked up the labels after, it is first optimized to:

add ebx eax ;0
JumpTo: ;1
sub ebx #1 ;2
cmp ebx #5 ;3
jge JumpTo ;4

Then we realize the labels:

add ebx eax ;0
nop ;1
sub ebx #1 ;2
cmp ebx #5 ;3
jge [1] ;4

We fix the problem

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:30):

I dont think we plan to support those kinds of optimizations

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:30):

Moving a nodes index is really expensive. Would requiring shifting all later nodes in the array.

view this post on Zulip Wizard ish (Dec 05 2024 at 01:31):

Not if it was a linked list

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:31):

That is a lot of unnecessary cost

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:32):

We only want something super slim

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:32):

This is the dev backend

view this post on Zulip Wizard ish (Dec 05 2024 at 01:32):

got it

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:33):

We can pattern match on final assembly generation if we want, but I don't think we want any passes that actually mutate this array in large ways (like inserting or deleting a node)

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:33):

Register and stack info probably will be sideband to avoid inserting a bunch of load and store instructions into the ir.

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:34):

At least that is the hope if it works out in practice

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:35):

Just one quick write to go from mono to a flat array of roughly assembly. N passes that just edit nodes inplace or add sideband info. A final pass to take all that info and actually generate the final assembly.

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:35):

That's roughly the theory and hope.

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:42):

Note: I totally could be off about a constraint here. I have not thought about this in a long time or dug through it in a ton of detail. For background context. There was debate around if we should even add another ir. What we do currently is super adhoc, but it generates assembly crazy fast. We really want to keep as much of that speed as possible, just get slightly more reasonable dev assembly.

view this post on Zulip Brendan Hansknecht (Dec 05 2024 at 01:42):

But the target is still roughly -O0 assembly

view this post on Zulip Wizard ish (Dec 05 2024 at 02:06):

oh yeah i thought it was gonna be something that would act as an IR for all the backends, so in other words, Roc -> IR , and then IR -> LLVM, IR -> WASM, and, for the dev backend, just have an evaluator to get rid of the giant overhead of compilation

view this post on Zulip Wizard ish (Dec 05 2024 at 02:07):

but that makes a lot less sense now that i think about it

view this post on Zulip Wizard ish (Dec 05 2024 at 19:45):

Also why isn’t the dev backend just an interpreter? Wouldn’t that make it much faster

view this post on Zulip Luke Boswell (Dec 05 2024 at 20:09):

Roc isn't limited to being used through the cli, you can also compile it and embed it in other things. Also this interpreter would need to be platform specific.

view this post on Zulip Wizard ish (Dec 05 2024 at 20:10):

No simply for the dev backend, as the idea for it is to be used for... devoloping. So keep LLVM, use that for release builds, and use dev for debugging.

view this post on Zulip Luke Boswell (Dec 05 2024 at 20:10):

I think @Ryan Barth had some interesting ideas in this direction. Not sure where we got to with that.

view this post on Zulip Wizard ish (Dec 05 2024 at 20:10):

ok, also, just because i feel like there is definitly something I'm doing wrong in #7249, how does one sign old commits that have already been pushed

view this post on Zulip Luke Boswell (Dec 05 2024 at 20:12):

An admin can bypass and merge, or you can interactively rebase and sign. I have a note on my laptop how to do it if you want to try that.

view this post on Zulip Wizard ish (Dec 05 2024 at 20:12):

yeah the rebasing, ive tried that a couple times but for some reason it appears to not fix the issue...

view this post on Zulip Luke Boswell (Dec 05 2024 at 20:13):

Nvm, I just realised what we were talking about. All done :grinning:

view this post on Zulip Luke Boswell (Dec 05 2024 at 20:13):

Thanks for working on that, and your patience

view this post on Zulip Wizard ish (Dec 05 2024 at 20:14):

ok ty and np (alas this is why i need to read the fine print of stuff...)

view this post on Zulip Joshua Warner (Dec 06 2024 at 06:29):

Pro tip, you can do git rebase --exec 'git commit --amend --no-edit -S' main to rebase on main and sign all your commits.

view this post on Zulip Anton (Dec 06 2024 at 11:04):

It can be a lot more complicated if you got merges in your commits as well

view this post on Zulip Anthony Bullard (Dec 06 2024 at 11:44):

Merges? I thought we lived in a civilized world?

view this post on Zulip Anton (Dec 06 2024 at 11:47):

Sorry I did not see your comment, I did a force push after my rebase :p

view this post on Zulip Anthony Bullard (Dec 06 2024 at 11:48):

Hopefully no one else is working on my feature branch :-)

view this post on Zulip Brendan Hansknecht (Dec 06 2024 at 16:22):

Anthony Bullard said:

Merges? I thought we lived in a civilized world?

I think that a rebase only world is delightful, but I have really learned that many people can't cope with rebase. To be fair, rebase gets absolutely horrid if you make millions of small commits. (Which I think is way more common in open source).

Also, gits and GitHub are only mid at best. If they were better, rebasing might be more common.

view this post on Zulip shua (Dec 07 2024 at 16:00):

Should 128-bit registers be part of this IR? as far as I understand, 128-bit values have to be moved to stack, then mov'd to 2 64-bit registers in our current design, but if we're going to have some abstract IR register map to chip register anyway, I wonder how tricky it would be to allow 1 IR register to map to one or two chip registers based on the size?

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 17:15):

I don't think so. Cause you don't necessarily have to have both parts loaded at the same time. And that can enable better register use.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 17:15):

So better to see it as a tuple of 2 u64

view this post on Zulip shua (Dec 07 2024 at 18:17):

do you have an example in mind? I suspect it won't be useful at the Mono->IR translation level to split across two IR registers, but rather something we do when translating IR->asm In other words, it's not useful to work on one half at a time in gen_dev/generic64/mod.rs which is where I imagine translating Mono to IR happens, but rather in gen_dev/generic64/x86_64.rs where translating IR to x86 asm happens.

view this post on Zulip shua (Dec 07 2024 at 18:19):

(I'm assuming also that allocating registers happens in the IR->asm steps)

view this post on Zulip Wizard ish (Dec 07 2024 at 18:26):

I actually think that it should just be registers are unspecified in size, and however large they need to be is specified at IR->Asm according to how large it needs to be according to instruction size, with perhaps a ‘extend’ and ‘truncate’ op, as it dosent really seem to be about “how much can we optimize instructions as “how much can we optimize memory usage” but that’s just my opinion

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:34):

The main goal is to avoid tons of extra instructions that are just spilling and loading from stack.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:34):

Not really about memory usage, more about huge time sink and perf hit

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:35):

Just too heavy to be reasonable in a loop

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:36):

ASM doesn't work on 128 but values. Only register size or less values. I think keeping that abstraction is useful to make the target as simple as possible

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:37):

That said, infinite sized registers are common with more complex register allocation algorithms.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:37):

So maybe would be valuable, haven't thought this through fully

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:39):

Given that, due to following c abi, most things will be required on the stack anyway, I had leaned towards keeping it simple and storing the true value of more things on the stack (like is done today). Then only solve register allocation for live subvalues that are actually being acted on.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:40):

Definitely possible to do the reverse. Just changes the infrastructure father away from the simpler and more direct design of today (everything large has space on the stack and we only ever load register sized chunks of it)

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 18:42):

I could be wrong, but I think with infinite sized registers, you likely need more passes and a more complex algorithm to complete register allocation

view this post on Zulip shua (Dec 07 2024 at 19:50):

fair, and just to be clear, I haven't thought much past my recent attempt to implement add/sub/mul x checked/overflow/wrap for u128/i128 as asm rather than bitcode call. It feels like it would make those cases simpler, but raised the question here because that's a pretty narrow view.

view this post on Zulip Wizard ish (Dec 07 2024 at 19:53):

Hmmm yeah that makes sense, although abstracting that way makes it fundamentally harder to translate it to 32 bit

view this post on Zulip Wizard ish (Dec 07 2024 at 19:54):

And also it might save a bit of memory by splitting registers (which would only be possible if we weren’t assuming each register is 64 bits)

view this post on Zulip Wizard ish (Dec 07 2024 at 19:54):

But that would undoubtedly make it much more complex

view this post on Zulip Wizard ish (Dec 07 2024 at 19:55):

One thing that would be hard to implement w/o memory apart from 64 bit registers are structs

view this post on Zulip Wizard ish (Dec 07 2024 at 19:55):

As you’d need to use mutiple registers

view this post on Zulip Wizard ish (Dec 07 2024 at 19:56):

Although you could just have “large registers” that will probably just become memory locations

view this post on Zulip shua (Dec 07 2024 at 19:59):

re "following c abi, most things will be required on the stack", for the very specific case of those arithmetic functions that only take two u128/i128 arguments, they fit entirely in registers; and the return fits in rax+rdx as well. I haven't found a way to work with u128/i128 args that doesn't first push them to stack, so that's a case where not having 128bit registers (or a way to say "give me the two 64bit registers that this 128bit argument has been already loaded into") ends up with more mov's between stack/register.

view this post on Zulip shua (Dec 07 2024 at 20:00):

This could also be over-optimizing an edge case. I don't know how much roc code will be using Num.mul on u128/i128, but maybe Dec?

view this post on Zulip Wizard ish (Dec 07 2024 at 20:05):

Yeah I do think that having registers of diffrent sizes would be useful, even if it was a little more complex, either with or without splitting them, and yeah ‘Dec’ is a case, and a very large case, of where a 128 bit registers would be useful (though frankly I think if performance is a concern f64 or perhaps Frac should be used instead)

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:06):

Wizard ish said:

Although you could just have “large registers” that will probably just become memory locations

Yeah, I guess this is essentially what we have now

view this post on Zulip Wizard ish (Dec 07 2024 at 20:07):

Perhaps a way to “unsplit” registers, ie, if one did ‘rax’ and ‘rbx’ become a 128bit ‘rabx’

view this post on Zulip shua (Dec 07 2024 at 20:08):

if an example is any help, here's the change to impl add u128/i128 as asm.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:11):

I don't quite follow that change, why is the argument pushed onto the stack? If it is already in registers, couldn't it be just used directly from those registers?

view this post on Zulip Wizard ish (Dec 07 2024 at 20:11):

We could have it so the virtual registers can have “hints” for the max size so if they’re moved into memory they can be smaller then the regular guarantee (ie if a register is used in a add 32 and mul 16 instructions and it has to at that point be moved into memory, instead of giving it a full 64 bits, it could just be given a 32 bit slot of memory (this would happen is IR->Backend

view this post on Zulip shua (Dec 07 2024 at 20:12):

Brendan Hansknecht said:

I don't quite follow that change, why is the argument pushed onto the stack? If it is already in registers, couldn't it be just used directly from those registers?

that's just my not understanding how is the best way to get u128 args. I couldn't find any examples using registers, only other things that pushed to stack, then read what was pushed.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:13):

ok

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:15):

Also, I had assumed this ir would be just for generic64, I guess if it is for all of gen dev and a future theoretical generic32, it would need a more featured solution

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:16):

That said, if it is too generic, it can't do register allocation

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:16):

For example, you can't register allocate until you know if an i64 fits in a register or belongs on the stack

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:17):

If it belongs on the stack you have to generate multiple instructions to handle the addition

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:17):

And that set of instructions needs register alloctions

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:19):

Given we don't want too much complexity, it is nice to have the abstraction that everything larger than a register is always on the stack. An optimization would be to allow things to be partially on the stack and partially loaded. A different optimization would be to use 2 registers as the cut off cause that matches c abi (this is the issue referenced with adding u128 currently and requiring dumping to stack). I'm not sure any of those optimizations need to be in the dev backend

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:20):

My biggest concern is with being too generic is that it will lead to more required transformation and a significant cost in time to generate a binary with the dev backend.

view this post on Zulip Wizard ish (Dec 07 2024 at 20:20):

Yeah that’s the thing ain’t it, can’t be to complex

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:20):

That is why I would learn towards just solving a bespoke solution for generic64 and not worring about 32 bit at all

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:21):

32bit really needs a different asm trait. Which means it needs an entirely different ir.

view this post on Zulip shua (Dec 07 2024 at 20:21):

yeah, I don't know what the right answer is, but "complex structs on the stack, primitives in registers" sounds correct. The question is what to do with u128/i128 and Dec. Are these primitives or complex?

view this post on Zulip Wizard ish (Dec 07 2024 at 20:22):

Although I feel that the stack should be abstracted as “struct registers” just to be the abstraction, although that might cause a performance problem

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:22):

I think you are probably right

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:23):

About the abstraction

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:24):

The assembly would have functions to load from and store to "struct registers". Then it would have tons more functions to operate on normal registers. It would later calculate stack offsets for all struct registers (maybe even using lifetime analysis to reuse stack space)

view this post on Zulip Wizard ish (Dec 07 2024 at 20:25):

Wait so I was right about it being a good idea or a bad idea?

view this post on Zulip Wizard ish (Dec 07 2024 at 20:25):

Also it could just have functions to go from struct to normal

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:25):

That having struct registers as an abstraction for the stack is a good idea

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 20:34):

shua said:

yeah, I don't know what the right answer is, but "complex structs on the stack, primitives in registers" sounds correct. The question is what to do with u128/i128 and Dec. Are these primitives or complex?

I think they have to be seen as complex due to no real assembly operations being available for them. If we want to do a minor optimization, we could make a special optimization to allow storing them in 2 registers.

view this post on Zulip shua (Dec 07 2024 at 20:53):

I think you're right. I only have one counterpoint which is in x86_64, mul rcx will have as dest, rax and rdx which is a 128bit value split over two 64bit registers.

view this post on Zulip Brendan Hansknecht (Dec 07 2024 at 22:16):

Yeah, it is definitely imperfect. We need a smart way for assembly operations to claim registers temporarily.

view this post on Zulip Wizard ish (Dec 08 2024 at 00:14):

Hmmm well for the meantime tomorrow ill write up something to describe it just so we’re all on the same page

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 00:35):

:folded_hands:

view this post on Zulip Wizard ish (Dec 08 2024 at 18:09):

Oh, and one final thing, how should results be given, should they (all the following all accomplish

C=A+BC=A+B

  1. Be stored in an Accumulator (ACC)
add A B
mov C ACC
  1. Be stored in a operand (to the function itself)
mov C A
add C B
  1. Or stored in a operand given seperatly
add C A B

view this post on Zulip Wizard ish (Dec 08 2024 at 18:19):

also IMO it shoud be called ROAR :)
ROAR = Roc Optimizable Abstract Representation

view this post on Zulip Ayaz Hafiz (Dec 08 2024 at 18:51):

I would suggest (2)

view this post on Zulip Wizard ish (Dec 08 2024 at 18:53):

yeah that is the most common, but i also like option 1, as it allows for calls to functions and their returns to be modeled quite simply, although of course that increases complexity

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 19:31):

Current assembly is all 3 arg like arm

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 19:31):

I think that is nicest

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 19:31):

Encode result register in the ASM directly

view this post on Zulip Wizard ish (Dec 08 2024 at 19:31):

i think all have their merits personally

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 19:31):

So it all matches 3 today

view this post on Zulip Wizard ish (Dec 08 2024 at 19:32):

although given it's purpose 3 is probably the best (it makes it very apparent what registers are being modified)

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 19:32):

Given risc v and arm are both 3 and x86 is the only special on (and it is trivial to decompose to x86), I would push for 3.

view this post on Zulip Wizard ish (Dec 08 2024 at 19:33):

ok then itll be 3 :)

view this post on Zulip Wizard ish (Dec 08 2024 at 19:33):

so i think thats it, which is to say in about fifteen minutes ill probably have another question :)

view this post on Zulip Brendan Hansknecht (Dec 08 2024 at 19:33):

Also, you can always do (add a a b for I place if register allocation realizes it is ok)

view this post on Zulip Wizard ish (Dec 08 2024 at 21:45):

Wizard ish said:

also IMO it shoud be called ROAR :)
ROAR = Roc Optimizable Abstract Representation

Also what yall think, yae or nae for ROAR

view this post on Zulip Sam Mohr (Dec 08 2024 at 21:46):

I'm here for the name. It's fun to pick something like that and have it persist for a decade or so and have people take it seriously

view this post on Zulip Wizard ish (Dec 09 2024 at 00:26):

Made a first draft of the proposal

view this post on Zulip Luke Boswell (Dec 09 2024 at 01:34):

Looking great, thanks for taking the time to do this design work. I think it will really help to clarify things.

I've copied some random notes from skimming back through the thread.

I think it would be helpful to provide a little more context/background or a summary to motivate this change. We can talk about how the dev backend is currently very fast but maybe fragile or overly complicated... (my words, no idea if this is accurate).

we definitely want ok register allocation in the long term. It will be important to get enough performance that end users actually want to use the dev backend for reasonably sized projects.

it will be useful for making the dev backend output assembly easier to read and debug.

Does this improve our ability to test things? should we mention that?

change the Dev backend to have an ir that can be stored in a flat array (rather than a tree structure/not existing), that is essentially a portable assembly (one op per function in the current Assembler trait). Main gain of doing this is for future minor optimizations that we want to run on the ir. The big one being register allocation. With the current IR, register allocation, and lifetimes in general, are very much not a clean problem. Mono is a tree. It also is not nearly as fine grain as individual assembly instructions. So it is very hard to have any idea how many registers will be needed for a specific mono ir node.

Should we include this motivation in the ROAR proposal?

I may have missed it, but is this mentioned in the ROAR proposal? 32-bit architectures are out of scope.

"It would also just be for defining within a single procedure and any jmp type instructions would just use a integer as the target that is which other instruction in the array to jump to."

Is this covered in the proposal?

Possible future optimisations
- grouping instructions for final output
- intentionally reusing source and dest reg to avoid extra data movement instructions
- freeing up the base pointer by calculating all offsets with the stack pointer
- storing some aggregate values in registers if available instead of on the stack
- maybe some minor and quick rewrites.

Would be good to include future work in the context of ROAR

All "extra" optimizations can be evaluated one at a time to see if they meet our tradeoffs.

Future optimizations can be evaluated individually. how would we do this? is this something we can benchmark easily?

it should be possible to run this part of the dev backend in parallel. Only merging the final generated assembly into an output object file.

Is this possible or a goal? would be good to mention it.

view this post on Zulip Wizard ish (Dec 09 2024 at 01:52):

Ok, yeah Ill get to adding those (thanks for taking time to review it)

view this post on Zulip Brendan Hansknecht (Dec 09 2024 at 19:14):

Sorry for being slow about commenting on this (energy for roc has been low lately). Will try to write up more comprehensive comments hopefully this afternoon PST.

view this post on Zulip Wizard ish (Dec 09 2024 at 21:41):

I have two ideas, both of which involve function calls, that I'm interested to hear your opionions on:

  1. While local jumps have been covered as being absolute addresses, I think that there should be a more adressing modes. As mentioned before, ROAR will diffenitly be split up into functions, with each one being a discrete chunk of code. However, I think we should also have larger segments, with those being groups of function, each one reperesenting a single program unit (a library or file, perhaps)
  2. Relatedly, ROAR does not have a way to deal with IO and other specific needs. As the goal of ROAR is to be simple, I would propose a simple "black box" instruction, int N (derived from the instruction in x86 for interupts) which would be do a special action N, which can't be modeled in ROAR. IMO this would go right along with Roc's notion of "platforms", and they could be correlated (although this isn't substantiated by any logical reason, just a general sort of feeling)
    So yeah what do yall think about these?

view this post on Zulip Wizard ish (Dec 09 2024 at 23:38):

also just making sure about this (because its a pretty big thing) the goal would be to go from Roc->ROAR->Bin as opposed to the current Roc->Mono->Bin?

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:40):

Not an expert, but presumably it'd be Roc->Mono->ROAR->Bin

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:40):

Monomorphization is the process that takes generic functions and spits out statically typed functions

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:41):

It's probably not worth having to duplicate that effort in ROAR

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:42):

Let me read the proposal first

view this post on Zulip Wizard ish (Dec 09 2024 at 23:42):

wait a moment, is Roc Mono diffrent from .Net Mono?

view this post on Zulip Wizard ish (Dec 09 2024 at 23:42):

it's not like an extension?

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:43):

I don't know dot net's Mono

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:43):

But Roc mono is a lot like Rust's mono

view this post on Zulip Wizard ish (Dec 09 2024 at 23:43):

oh that's why a lot of stuff about it didn't make much sense

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:44):

It's a phase of the compiler after type checking where we take type checked code and spit out IR that is simple yet concrete

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:44):

So if I have a function that typechecked to numToStr : Num * -> Str

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:44):

And I use it with the inputs U64 and I8

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:45):

I'll get two functions out of mono, numToStr : U64 -> Str and numToStr : I8 -> Str

view this post on Zulip Wizard ish (Dec 09 2024 at 23:45):

Yeah that makes more sense...
I was wondering why Roc used the Microsoft IR, but this makes way more sense

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:45):

Now, that's based on my incomplete understanding of what:

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:46):

https://en.wikipedia.org/wiki/Monomorphization

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:46):

Monomorphization is good because it trades a larger binary size for better performance

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:47):

If you don't do monomorphization, you end up with a single function that needs to check the input's type at runtime

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:47):

More or less

view this post on Zulip Wizard ish (Dec 09 2024 at 23:50):

I thought Mono Ir refered to this :speechless:

view this post on Zulip Luke Boswell (Dec 09 2024 at 23:51):

https://github.com/roc-lang/roc/tree/main/crates#roc-compiler-stages

This might be helpful

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:52):

Specialize is the term in that diagram

view this post on Zulip Wizard ish (Dec 09 2024 at 23:52):

yeah that was my bad for really reading things too quickly

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:52):

As in "take each specific use of generic entities and make a specialized version"

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:52):

Welcome to the club

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:53):

Don't worry about it too much. I think the more important part is that you're helping us make a big improvement part to a tricky part of the compiler. That's awesome!

view this post on Zulip Wizard ish (Dec 09 2024 at 23:54):

yeah, well fourtanetly it dosen't actually affect my understanding of it that much, as the idea is basically the same, but thank you for the patience :)

view this post on Zulip Sam Mohr (Dec 09 2024 at 23:55):

Of course

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 01:19):

Wizard ish said:

I have two ideas, both of which involve function calls, that I'm interested to hear your opionions on:

  1. While local jumps have been covered as being absolute addresses, I think that there should be a more adressing modes. As mentioned before, ROAR will diffenitly be split up into functions, with each one being a discrete chunk of code. However, I think we should also have larger segments, with those being groups of function, each one reperesenting a single program unit (a library or file, perhaps)
  2. Relatedly, ROAR does not have a way to deal with IO and other specific needs. As the goal of ROAR is to be simple, I would propose a simple "black box" instruction, int N (derived from the instruction in x86 for interupts) which would be do a special action N, which can't be modeled in ROAR. IMO this would go right along with Roc's notion of "platforms", and they could be correlated (although this isn't substantiated by any logical reason, just a general sort of feeling)
    So yeah what do yall think about these?

I don't think we'll have a need for either. The dev backend already has solutions for both.

The dev backend only ever worries about generating a single procedure at a time. As such, each procedure can be 100% self contained without any worry of what other procedures are doing. So we never need long jumps or other special things like segments. I think it would be 100% fine for ROAR to slot in such that it is only used during single function compilation. I think that it will actually fit best there.

This is the same for io. There is no need to worry about it. We never do it directly in roc. All of this will always go through a function call. So the call instruction is all you will need.

Using relocations + call instructions solve all of the above nicely.

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 01:21):

So I would imagine the compilation process as:

  1. roc source
  2. roc compilation to mono
  3. For each function in mono, run the dev backend. This consists of:
    1. convert to ROAR
    2. run register allocation (plus any other future passes)
    3. output assembly + relocations

view this post on Zulip Sam Mohr (Dec 10 2024 at 01:27):

You say each "function in mono", which is currently all top-level values, since we convert (at least most) top-level constants to thunks with force_thunk AFAICT. I presume this strategy would work if/when we decided to move towards non-thunked top level constants, but just wanna call that out in case there would be an issue

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 01:27):

Sure, we should either force thunks for the dev backend or we would also have to do constant generation, but for constants, we wouldn't need ROAR at all.

view this post on Zulip Sam Mohr (Dec 10 2024 at 01:32):

Maybe I missed something in the discussion, but it would still need to translate complex constants, right? We said above that even I128 would get multiple "registers", so a struct/discriminated union would need to be loaded like any other variable that was stored in a prior register. This should be very minor compared to the main scope of ROAR, though

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 01:37):

no, they will all just be equivalent to the constant stored on the stack. So they are just a chunk of bytes that we get the address of and load data from.

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 01:38):

So they would in the roar proposal just be represented as a structured register (which is really just a pointer to memory)

view this post on Zulip Wizard ish (Dec 10 2024 at 01:39):

Oh and relating to your comment on the PR, I've moved the proposal to it's own repo here

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 01:42):

comments on the proposal:

  1. main idea around instructions and operands sounds great.
  2. abstract registers and the lack of types sounds perfect
  3. I think we should iterate on the abstract stack. I think that by default that ROAR should not care about saving registers for call instructions. I think it is safe to just use the callee saved registers first followed by the caller saved registers. When generating a call assembly, the backend assembly can add in all of the pushes and pops. If we later find this to be a bottleneck, we can add caller and callee saved register info into the register allocation pass to make it a bit smarter. Either way, this info would not be need in the IR and would likely just be part of an optimization pass. I need to think more about jump instructions, but I think due to how roc organizes ir, something similar should hold true. I belive that roc explicitly labels all data that needs to be passed to a jump instruction (such that it looks like a call instruction as well). Need to double check on that, but I think we can get away with something simple here.
  4. For structured registers, I think that we should use an explicit load and store instruction instead of overloading the move instruction. I think it will help keep things clean and clear. We also will need some way to represent data that starts on the stack due to the c calling convention. So I think we likely need to actually store these outside of the core ir. I would propose that we have a flat dictionary of structured registers/stack values. It would be initialized with the arguments to the function that are on the stack. Instead of having a create instruction, it would be a function that adds a new index to this map. Each value in the map will hold where the data is stored and how large it is. For starters, most data will not have a real location, at some point, we will have to map all of that data to concrete stack locations (likely with a pass very similar to register allocation). Hopefully this rough idea makes sense. It essentially allows the almost everything to think in registers, but when it doesn't think in registers, it thinks in abstract memory that it can just call load and store instructions to. Later the abstract memory is mapped to real memory, but no need to change the ir, just the list of memory locations. Also, just realized, doesn't need to be a map, would just be a list where the index is the structured register number.

view this post on Zulip Wizard ish (Dec 10 2024 at 02:14):

Just one thing, if registers aren’t being explicitly saved, how will the program differentiate registers that are set to for the purpose of returning values and those that simply hold intermediate values? Would the plan be for the return and call op take an argument that is what register should be returned / saved to?

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 02:54):

Oh, this is actually a really good point I hadn't full though through. In the current design, due to having access to mono, a call or return instruction is given a list of symbols. Those symbols are the loaded and dealt with according to the c abi for that system. If we are abstracting into ROAR, we have to decide how to deal with that mapping and call convention. Hmm.

So that gives three main options I can think of:

  1. Define a restricted machine agnostic call convention such that roar deals with the call conv including stack vs register.
  2. Have roar be partially machine specific. It would query the backend to learn about it's call conv and then inserts structure register stores and data swaps to prep everything for the call.
  3. For call and return instructions, keep a mapping to the original mono nodes in roar. When generating assembly for a call instruction we pass in the layout info from mono along with a list of abstract and structural registers for the arguments. The backend then deals with loading those to match the calling convention.

I lean 3, but am open to other opinions and ideas for sure.

view this post on Zulip Wizard ish (Dec 10 2024 at 12:47):

Personally I think 1 is better, as it keeps the process pure (ie, machine agnostic and seperate from mono), which would probably help with maintainability

view this post on Zulip Wizard ish (Dec 10 2024 at 16:00):

Also if we restricted ROAR functions to returning one argument, we could just have ‘call’ instructions take an output register that would the be set to a corresponding register returned with ‘ret’
As for passing arguments, this could be even more simple, just pass them in registers, and only differentiate it from the original once it’s been set

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 16:09):

It definitely has to learn to speak c abi no matter what, sadly. It has to be callable from the platform and be able to call the zig builtins and platform effects.

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 16:22):

This is why the dev backend currently speaks c abi. I found it easier to be consistent than makes exceptions for the edges.

view this post on Zulip Wizard ish (Dec 10 2024 at 20:11):

but shouldn't this be abstracted as part of the ROAR->Binary interface, given that one of the primary goals is to be platform agnostic? (which ccould be solved by somethig like this?)

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 21:11):

I don't think the goal of roar is really to be platform agnostic. It is trying to abstract away a level of assembly details, but register allocation (which will run on roar) is platform specific. On top of that specific handling of the stack, that will come out of converting from structured registers to stack use is also platform specific.

So roar should be expected to pull in platform details. That said, I think that 3 is platform agnostic. Roar will simply record the information it needs to pass to the final assembly generation for it to deal with generating the correct call conv.

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 21:14):

The reason I don't think we need something like black box. is that nothing is black box. They are just function calls. The same as any other function call.

view this post on Zulip Wizard ish (Dec 10 2024 at 21:17):

ok got it

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 21:21):

I'm not sure how much you have look at the current traits, but we actually have one for assembly and a second for callconv. If you look at a callconv trait impl this is what making a function call looks like today: https://github.com/roc-lang/roc/blob/f8b0be725c1677dd1c39cc68723fe4ad5ce09ade/crates/compiler/gen_dev/src/generic64/x86_64.rs#L292-L337

ROAR could call essentially the same primitive.

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 21:23):

Just instead of using Symbol from mono, it would use abstract and struct registers.

view this post on Zulip Wizard ish (Dec 10 2024 at 21:37):

so if i'm understanding correctly you would just have the call function accept any number of args, including one for where to place the return value?

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 21:41):

yep

view this post on Zulip Brendan Hansknecht (Dec 10 2024 at 21:41):

With the minor caveat that we may want to handle storage in a special way to avoid bloating the roar enum in rust. Don't want to take up too many bytes per assembly instruction

view this post on Zulip Wizard ish (Dec 12 2024 at 02:22):

I've been making a little progress with the proposal, just wanted to make sure I'm putting down correctly what yall mean, in particular about the function calls https://github.com/wizard7377/ROAR

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 04:57):

I'm really liking how this is looking!

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 04:58):

I'm not sure the value of making cmp, jmp, ret, etc functions return a null value, but if in practice it helps build out a consistent interface, that sounds fine. I assume the assembly will map to rust enum. The powerful pattern matching in rust should make it ok for some ops to differ.

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 05:00):

ROAR is that every function is pure (with respect to registers)

I'm not 100% sure the caveats of the meanings here, but it is important to note that some values passed into function will get mutated. That said, it will never be a register value. It will always be a pointer to data on the heap that gets mutated (like a list with a refcount of one).

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 05:00):

I think this likely falls into what you are saying, but wanted to clarify

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 05:02):

Also, for structured registers (at least as roc exists today), all loads and stores should be to a compile time known offset. Also, we will need the ability to load at different widths, but that can just be a load8 vs load32 vs load64.

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 05:02):

And roc will make sure that all loads are properly aligned based on the layout of the data. So we should have no concerns there.

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 05:03):

Also, for jumps, I know that roc only has structured loops, so we may be able to take advantage of something there. But I'm not sure what exactly. Just noting.

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 05:05):

Oh, one other note for jmps. Roc's jumps technically have arguments, but we should be able to map that into editing the same roar registers over and over again.

view this post on Zulip Wizard ish (Dec 12 2024 at 14:53):

Yeah, I might’ve forgotten about the references not being mutated dosent correlate to the referents being mutated

view this post on Zulip Wizard ish (Dec 12 2024 at 14:56):

Also can you explain a little bit more about what you mean about them taking arguments, do you mean they should be similar to RISC branch instructions (ie, the instruction contains the condition)

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 16:32):

I just mean that in mono ir, jumps look a lot like function calls.

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 16:34):

There is a block declared with a joinpout. The join point has a list of args. Within the block you can access any constants from before the jointpoint and the args as well. At some point in the main code flow, it calls a jump instruction to a join point with a list of args.

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 16:37):

In roar, this could be modeled as raw assembly style jumps with register updates before the jump to set the args. Or it could be modeled closer to a function call. Not sure either is better, just worth noting and thinking about.

view this post on Zulip Wizard ish (Dec 12 2024 at 20:15):

yeah personally im much more familiar with x86 and 6502 assembly, and in those, jmps basically only change the instruction pointer, so that just seems more natrual to me, but only because that's the only thing ive ever encountered

view this post on Zulip Brendan Hansknecht (Dec 12 2024 at 20:41):

Yeah, rocs mono isn't trying to be assembly. So makes sense it has a high level concept

view this post on Zulip Wizard ish (Dec 13 2024 at 21:27):

should ROAR have seperate floating point registers?

view this post on Zulip Brendan Hansknecht (Dec 13 2024 at 21:32):

Would be nice for a tiny bit of extra safety, but would be fine to start with unified registers and then only convert to float vs general registers during allocation.

So I think either is fine before register allocation. I lean slightly towards just having them

view this post on Zulip Wizard ish (Dec 13 2024 at 21:44):

yeah i also lean on having them, theyre isnt really a reason not to and basically all systems do so it probably will be more efficient

view this post on Zulip Wizard ish (Dec 16 2024 at 01:45):

Did some work on the proposal (still not done)... and i may have just realized gh md has a diagramming feature... https://github.com/wizard7377/ROAR#optimizations

view this post on Zulip Wizard ish (Dec 17 2024 at 02:44):

Ok I think this is the first draft, is there anything any of you think needs to be added to this (obviously needs to be cleaned up a lot)

view this post on Zulip Brendan Hansknecht (Dec 17 2024 at 02:47):

I'll read through it when I get the chance, but packing tonight and traveling tomorrow.

view this post on Zulip Wizard ish (Dec 17 2024 at 02:48):

nice safe travels :)

view this post on Zulip Brendan Hansknecht (Dec 17 2024 at 02:49):

Also, on the llvm side, I think I am going to greatly simplify abi (including the c abi we generate). Dev backend may want to do the same (might not really affect the proposal). The core change is that all aggregate types will be passed by const ref. Then only enums and numeric types are passed in registers.

view this post on Zulip Wizard ish (Dec 17 2024 at 02:50):

if anything that actually works with the proposal quite nicely

view this post on Zulip Anthony Bullard (Dec 17 2024 at 03:18):

I don't know very much about c abi, but I can say that this was a well-written doc that even a mid-wit (as regards assembly) like myself could understand

view this post on Zulip Wizard ish (Dec 17 2024 at 03:23):

Thank you :)

view this post on Zulip Anthony Bullard (Dec 17 2024 at 03:24):

So could I say (as a midwit as noted above) that this is a SSA representation that's meant to be a level higher than say LLVMIR because it can assume a lot of Roc's language semantics?

view this post on Zulip Anthony Bullard (Dec 17 2024 at 03:25):

But still low-level enough that we can use those semantics to drive efficient optimization passes before codegen?

view this post on Zulip Wizard ish (Dec 17 2024 at 03:26):

Yes, that is the primary purpose

view this post on Zulip Anthony Bullard (Dec 17 2024 at 03:26):

Ok, I passed my comprehension test. I'm going to bed now :-)

view this post on Zulip Anthony Bullard (Dec 17 2024 at 03:27):

I'll be following the (assumed) implementation of this, as I am very interested in learning a lot in this area

view this post on Zulip Ayaz Hafiz (Dec 18 2024 at 05:50):

A few thoughts on the ROAR document:

view this post on Zulip Wizard ish (Dec 18 2024 at 12:37):

I personally think the null register is quite important. It allows statements to be consistent internally as one structure, rather than having expressions which can be statements, which would significantly complicate the process of obtaining modified registers. In addition, sometimes one wants to have expressions which don’t write to anything, for example on x64 ‘cmp’ vs ‘sub’, where the only difference is whether they write to something or just set flags. In addition, the null register wouldn’t be that hard to implement, and in case this wasent clear the null register can’t be read, it just serves as a “trash” register

view this post on Zulip Wizard ish (Dec 18 2024 at 12:49):

Also the purpose of this is to be stored in a flat array, which wouldn’t allow for nested expressions

view this post on Zulip Wizard ish (Dec 18 2024 at 12:51):

(Note, refer back to the first message in this channel)

view this post on Zulip Wizard ish (Dec 18 2024 at 12:52):

Brendan Hansknecht said:

This is a pretty open thought, but I am questioning if we should change the Dev backend to have an ir that can be stored in a flat array (rather than a tree structure/not existing), that is essentially a portable assembly (one op per function in the current Assembler trait).

So it would kinda be akin to llvm ir, but just what we need to support the dev backend. It would also just be for defining within a single procedure and any jmp type instructions would just use a integer as the target that is which other instruction in the array to jump to.


The main gain of doing this is for future minor optimizations that we want to run on the ir. The big one being register allocation. With the current IR, register allocation, and lifetimes in general, are very much not a clean problem. Mono is a tree. It also is not nearly as fine grain as individual assembly instructions. So it is very hard to have any idea how many registers will be needed for a specific mono ir node.

I think that we should have a very fast pass that runs over mono and blits out a flat array of assembly like instructions. (essentially will be equivalent to the current dev backend but less complex due to having infinite registers for example).

Then we would run register allocation on the flat array (lots of good algorithms with linear time exist for this). Given the assembly ir won't perfectly match to assembly instructions, we will have a backend dependent function that will label exactly how many temporary registers an instruction will need. That is all the information we should need to deal with register allocations.

Once that works, we can consider other minor but fast optimizations:

Especially given we generate at the procedure level, it should be possible to run this part of the dev backend in parallel. Only merging the final generated assembly into an output object file.

I think this will help clean up some of the complexities of the dev backend related to where data is stored and how it is loaded. It should help separate that work cleanly from the rest of the ir selection at a minimum. It also gives us some opportunities for minor but important optimization passes over dev backend ir.


What are people's thoughts on this? I would guess Folkert de Vries and Richard Feldman would have the most input.

view this post on Zulip Wizard ish (Dec 20 2024 at 19:43):

Also, just making sure of something: ROAR would require reimplementating the Backend trait, but not the Assembler and CallConv trait (as the backend trait takes in mono as input)

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 20:15):

Yeah, might even end up replacing the backend trait with just a raw method to go from mono to roar and a separate method that directly calls out to the assembler and call conv traits

view this post on Zulip Wizard ish (Dec 20 2024 at 20:25):

also was just thinking, it might be useful to have a concept of "true global variables" in ROAR. While struct register persist, they require refrences being passed around, and what I'm thinking of would just be accessed with it's symbol. This wouldn't be prefered (ie, local vars would still be in abstract registers) but i think might make it a lot more clean

view this post on Zulip Wizard ish (Dec 20 2024 at 21:25):

oh also if no one has anymore comments on this, ill probably start working on it tommorow

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 21:36):

While struct register persist, they require refrences being passed around, and what I'm thinking of would just be accessed with it's symbol.

I'm not sure I fully follow what you are suggesting here

view this post on Zulip Wizard ish (Dec 20 2024 at 21:38):

basically having things like struct registers, but instead of symbols being dynamically generated their symbols persist from mono

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 21:42):

I think most of the symbols in mono are locals. There are true globals. For those, we will need some solution. The final output is a chunk of data and a relocation to load that data locally in a function.

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 21:47):

Also, my only main comment on the proposal is that I would try to stay very close to the assembler trait to start with (in terms of which operations to support). For example, it looks like we currently only need a jmp and jne instruction. Clearly more optimizations are possible and in some cases other jump instructions might map better, but I think it will make the transition smoothest if we start as close as possible. Then incrementally make things more powerful.

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 22:04):

Oh and instead of using things like u32 directly, wrap them in nominal types to avoid using an index as a register or a float register as an int register etc

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 22:05):

I think that was already the plan, just explicitly clarifying

view this post on Zulip Wizard ish (Dec 21 2024 at 14:20):

also just so yall can comment if you like, i made a draft pr at #7397

view this post on Zulip Wizard ish (Dec 21 2024 at 15:02):

first piece of ROAR (handcoded into rust), only the printing is done

[%32] {
    %78 <- addu %73 %32
    _ <- subs %73 %32
}

also i do plan to make the code not break every single style guideline possible

view this post on Zulip Brendan Hansknecht (Dec 21 2024 at 17:16):

Left a handful of comments

view this post on Zulip Wizard ish (Dec 21 2024 at 18:02):

ok, yeah i wasn't thinking that much about the 128 bit types, those might be a small problem

view this post on Zulip Wizard ish (Dec 21 2024 at 18:03):

actually having the abstract registers be sized might not be that bad of an idea

view this post on Zulip Wizard ish (Dec 21 2024 at 18:05):

also maybe seperating structure refrences from regular registers

view this post on Zulip Brendan Hansknecht (Dec 21 2024 at 18:17):

I think it should be fine to see 128 bit values as a structured registers containing two 64bit registers

view this post on Zulip Brendan Hansknecht (Dec 21 2024 at 18:18):

That is how operations function on them anyways

view this post on Zulip Wizard ish (Dec 21 2024 at 18:18):

okay

view this post on Zulip Brendan Hansknecht (Dec 21 2024 at 18:18):

Mostly called them out to make sure they are being thought of

view this post on Zulip Wizard ish (Dec 22 2024 at 17:22):

Relating to your comment here, what do you think should be the max size per instruction?

view this post on Zulip Brendan Hansknecht (Dec 22 2024 at 18:08):

I would hope it can fit into 4 u64s, but that may not be realistic.

1 for op code and any sort of metadata
1 for output register
2 for input registers

If an op might need more than 2 input registers. The input u64s would be instead a pointer and length to a sideband array of extra inputs.

view this post on Zulip Wizard ish (Dec 22 2024 at 20:23):

i dont it will be naccasary to have them actually modeled as inputs, as they should be consant, and therefore are not a part of the internal logic

view this post on Zulip Wizard ish (Dec 22 2024 at 20:50):

also just realized it was never discussed if function pointers should be included or not

view this post on Zulip Brendan Hansknecht (Dec 22 2024 at 21:13):

I think a function pointer can be treated the same as an int constant with a special op to create it.

view this post on Zulip Wizard ish (Dec 22 2024 at 21:14):

ok!

view this post on Zulip Wizard ish (Dec 23 2024 at 00:56):

why does roc use bump allocation? It does seem to speed it up, but it makes the code much less clean (same thing with layout interning)

view this post on Zulip Brendan Hansknecht (Dec 23 2024 at 01:00):

Cause we want a really fast compiler. That is also why it is written in rust

view this post on Zulip Brendan Hansknecht (Dec 23 2024 at 01:00):

Obviously there is still large room for improvement, but we care a lot about compile time.

view this post on Zulip Wizard ish (Dec 23 2024 at 01:00):

fair enough

view this post on Zulip Wizard ish (Dec 25 2024 at 18:54):

Also, I won’t be able to work on this for a couple days

view this post on Zulip Wizard ish (Dec 25 2024 at 18:54):

On a related note, anyone know of any cheap, small, low-middle end pc’s?

view this post on Zulip Wizard ish (Dec 25 2024 at 18:56):

(I somehow managed to break my laptops keyboard by cleaning it… so now the whole thing is crippled)

view this post on Zulip Wizard ish (Dec 26 2024 at 16:08):

update on this, the classic strategy of doing nothing about it has worked, and now my keyboard is working again :partying_face:

view this post on Zulip Wizard ish (Jan 18 2025 at 23:15):

Hey everyone, sorry about the slow progress, it’s just that school takes up a lot of my time, and in particular recently I’ve just been doing way too much stuff, hopefully however I’ll be able to do more work on ROAR this week (and after that) :)

view this post on Zulip Luke Boswell (Jan 18 2025 at 23:16):

It's alg, we're all volunteers here juggling life things. :smiley:

view this post on Zulip Wizard ish (Jan 19 2025 at 18:54):

error[E0515]: cannot return value referencing local variable `env`
   --> crates/compiler/build/src/program.rs:589:5
    |
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                     ---- `env` is borrowed here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
600 | |         },
601 | |     )
    | |_____^ returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing local variable `layout_interner`
   --> crates/compiler/build/src/program.rs:589:5
    |
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                                         -------------------- `layout_interner` is borrowed here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
600 | |         },
601 | |     )
    | |_____^ returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing local variable `interns`
   --> crates/compiler/build/src/program.rs:589:5
    |
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                           ------------ `interns` is borrowed here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
600 | |         },
601 | |     )
    | |_____^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `interns` because it is borrowed
   --> crates/compiler/build/src/program.rs:597:13
    |
549 |   fn gen_from_mono_module_dev_assembly<'a>(
    |                                        -- lifetime `'a` defined here
...
562 |           mut interns,
    |           ----------- binding `interns` declared here
...
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                           ------------ borrow of `interns` occurs here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
597 | |             interns,
    | |             ^^^^^^^ move out of `interns` occurs here
...   |
600 | |         },
601 | |     )
    | |_____- returning this value requires that `interns` is borrowed for `'a`

error[E0505]: cannot move out of `layout_interner` because it is borrowed
   --> crates/compiler/build/src/program.rs:598:13
    |
549 |   fn gen_from_mono_module_dev_assembly<'a>(
    |                                        -- lifetime `'a` defined here
...
564 |           mut layout_interner,
    |           ------------------- binding `layout_interner` declared here
...
577 |           roc_gen_dev::build_module(&env, &mut interns, &mut layout_interner, target, procedures);
    |                                                         -------------------- borrow of `layout_interner` occurs here
...
589 | /     (
590 | |         CodeObject::Vector(module_out),
591 | |         CodeGenTiming {
592 | |             generate_final_ir,
...   |
598 | |             layout_interner,
    | |             ^^^^^^^^^^^^^^^ move out of `layout_interner` occurs here
599 | |             expectations: loaded.expectations,
600 | |         },
601 | |     )
    | |_____- returning this value requires that `layout_interner` is borrowed for `'a`

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.

anyone know how to fix this? (full state is here https://github.com/wizard7377/roc/tree/roar)

view this post on Zulip Wizard ish (Jan 19 2025 at 18:55):

(note the change in this signature https://github.com/wizard7377/roc/blob/d4de3b1545fcfd3bbb4c6ccc0ee7b918c663208d/crates/compiler/gen_dev/src/object_builder.rs#L49)

view this post on Zulip Wizard ish (Jan 20 2025 at 13:52):

anyone can help (rust should definitly have something to bypass the borrow check in development...)

view this post on Zulip Anton (Jan 20 2025 at 14:39):

I can look when I'm done with #7530

view this post on Zulip Wizard ish (Jan 20 2025 at 14:40):

thank you!
(this has been driving me insane)

view this post on Zulip Anton (Jan 20 2025 at 16:22):

@Wizard ish using the latest roar branch commit I currently only get a mismatched types error at crates/compiler/build/src/program.rs:456:17. That's with cargo build --release. How can I produce your borrow checker errors?

view this post on Zulip Wizard ish (Jan 20 2025 at 16:24):

run with -F use_roar

view this post on Zulip Wizard ish (Jan 20 2025 at 16:25):

(the flag is temporary and will be replaced with a sensible build system later)

view this post on Zulip Wizard ish (Jan 20 2025 at 16:29):

also if it's helpful this is the mess that i use to compile it clear && RUSTFLAGS="-Awarnings" cargo run -F use_roar -- --dev ../tests/fizzbuzz.roc

view this post on Zulip Wizard ish (Jan 20 2025 at 17:41):

sorry it's really a mess

view this post on Zulip Anton (Jan 20 2025 at 18:20):

No problem, we all have to learn by doing :)

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:27):

fn build_module_help<'a, 'r : 'a>(
    env: &'r Env<'a>,
    interns: &'r mut Interns,
    layout_interner: &'r mut STLayoutInterner<'a>,
    target: Target,
    procedures: MutMap<(symbol::Symbol, ProcLayout<'a>), Proc<'a>>,
) -> Object<'a> {

Why are 'a and 'r linked? I don't think that is correct. 'r : 'a. Not that I 100% know what that syntax means.

Anyway, I think you should be able to fully remove 'r?

view this post on Zulip Wizard ish (Jan 20 2025 at 18:30):

yeah it was that i changed it bc in the Converter it (the borrow checker) required 'b to outlive 'a (the Converter's 'a), which then required 'r to outlive 'a (of the above funciton)

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:32):

Something is definitely wrong is 'r needs to outlive 'a. 'a is essentially a global lifetime. 'r is temporary for the current function. So the code creating that requirement must be wrong

view this post on Zulip Wizard ish (Jan 20 2025 at 18:34):

Oh wait :skull: i somehow forgot the most crucial point, without that requirement, i get this error

error: lifetime may not live long enough
   --> crates/compiler/gen_dev/src/roar/convert.rs:240:22
    |
162 | ...c, 'a: 'c, 'b: 'c> Converter<'a, 'b>
    |       --      -- lifetime `'b` defined here
    |       |
    |       lifetime `'a` defined here
...
240 | ...          &mut sym_map.insert(*sym, (Output::Register(new_reg.clone()), repr, form...
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'b` must outlive `'a`
    |
    = help: consider adding the following bound: `'b: 'a`

error: could not compile `roc_gen_dev` (lib) due to 1 previous error

view this post on Zulip Wizard ish (Jan 20 2025 at 18:34):

(all the random lifetimes are me just trying to fix this)

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:38):

I think you have a loop that is angering rust. env -> arena -> converter -> back to env.

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:39):

That in general won't work

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:39):

likely converter shouldn't be allocated in the env at all

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:39):

Should just be passed around on the stack

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:40):

Then just converter -> env, interner, etc

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:40):

no loop

view this post on Zulip Wizard ish (Jan 20 2025 at 18:41):

ok

view this post on Zulip Wizard ish (Jan 20 2025 at 18:45):

Wait so onlyy the converter object or also everything created by it

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:46):

I assume most things can be created in the env. The problem with converter is specifically the loop.

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:46):

so just converter

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:48):

And I think you will be a able to remove a bunch of type variable like &'c mut self will just be &mut self

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:49):

And I think proc_queue: RefCell<Vec<'b, (symbol::Symbol, &'b ir::ProcLayout<'b>, &'b ir::Proc<'b>)>>, Likely should be all 'a? Cause those are all allocated in the env. So they have a lifetime same as the bump in the env.

view this post on Zulip Wizard ish (Jan 20 2025 at 18:51):

yeah this is a little bit of a mess...

view this post on Zulip Brendan Hansknecht (Jan 20 2025 at 18:51):

Part of learning rust in a codebase that uses arenas (thus tons of extra lifetimes to manage)

view this post on Zulip Wizard ish (Jan 27 2025 at 01:12):

the good news is it's fixed :)
the bad news is that it was fixed by using incredibly bad practice to get around the borrow checker...

view this post on Zulip Wizard ish (Feb 02 2025 at 17:55):

Hey, given the plans to completly rewrite the compiler, does it make sense to continue this project in Rust (as opposed to switching it to Zig)?

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:02):

I'd say switch to zig!

view this post on Zulip Wizard ish (Feb 02 2025 at 18:02):

okay :)

view this post on Zulip Brendan Hansknecht (Feb 02 2025 at 18:05):

As a note, some of the rewrite plans may make this project obsolete. As part of the switch, we want to try using an interpreter instead of the dev backend.

view this post on Zulip Brendan Hansknecht (Feb 02 2025 at 18:06):

We need an interpreter anyway for constant folding and some other work and we think that an interpreter should be capable of running fast enough that it makes the dev backend a lot less useful as a proposition

view this post on Zulip Brendan Hansknecht (Feb 02 2025 at 18:06):

So we want to start by trying to only have the interpreter and skipping the dev backend.

view this post on Zulip Wizard ish (Feb 02 2025 at 18:07):

hmmm okay, although still might be useful as a sort of in between for the interpreter and the new mono ir

view this post on Zulip Brendan Hansknecht (Feb 02 2025 at 18:14):

Interpreter is planned to run higher up the stack (it may still want a specific IR, but haven't thought through the details yet).

view this post on Zulip Brendan Hansknecht (Feb 02 2025 at 18:15):

Totally will be lots of things to help with and IR design work, just might not map straight to roar.

view this post on Zulip Brendan Hansknecht (Feb 02 2025 at 18:24):

I'm really sorry about this and any time that was wasted. The big zig rewrite seems like the best time to try something like only using an interpreter? Assuming only using an interpreter dosen't work out or if we end up wanting a dev backend again, this work would revive, but at least for now, the hope is to avoid any dev backends if possible.

view this post on Zulip Wizard ish (Feb 02 2025 at 18:50):

Nah it’s all cool, plus I’ve been putting off learning zig so this’ll be fun

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:50):

I appreciate your having an awesome attitude about it! :smiley:


Last updated: Jul 06 2025 at 12:14 UTC