I want to take a crack at getting desugaring going so that we have a sugar-free IR to work with canonicalization. Is the correct approach here to just use the existing Parse IR and walk through and just move nodes in that have no sugar, and then break down sugar and insert it?
And also, do we even have enough sugar at this point to warrant a whole separate step in the compiler for it? And if we should, should we try to minimize the number of modules that we touch in desugar by marking a Parse IR struct as "needing desugaring" when we insert nodes that require it?
I think it would even be nice if we can track nodes that need desugaring and literally just mutate the existing IR (a downside being those nodes then are probably quite a ways away from each other more likely in larger files)
to warrant a whole separate step in the compiler for it
I am in favor of a separate step. I feel like it would help maintainability. It's nice for on-boarding and debugging to have a place where all the desugaring is happening, vs having it intertwined with a bunch of other stuff.
Another approach could be to have the sugar nodes describe themselves for the purposes of formatting and parse error reporting, but also insert the sugar when we insert the node and then node would have a link to it - and then Can can just skip past it's details and fetch the real nodes
should we try to minimize the number of modules that we touch in desugar
That seems like something we could leave for later if need be?
Anton said:
should we try to minimize the number of modules that we touch in desugar
That seems like something we could leave for later if need be?
Yeah, I'm trying not to OVERLY optimize too early, but I think this is plain perf lying on the floor if we are using the same IR
sugar nodes describe themselves for the purposes of formatting and parse error reporting
How does this work in the old compiler?
We just walk the entire AST and return new AST nodes that are still Parse AST nodes
(And just a note, I'm not going to start on this today - just trying to line up some tasks for the coming weeks. I want to have a broad high-level consensus before I put hands on keyboard)
Anthony Bullard said:
Another approach could be to have the sugar nodes describe themselves for the purposes of formatting and parse error reporting, but also insert the sugar when we insert the node and then node would have a link to it - and then Can can just skip past it's details and fetch the real nodes
That seems like a reasonable approach but it's not my area of expertise
I think it's fine to start as a separate step
I have a vague intuition that our current set of sugar can all be done during canonicalization without a separate pass (which was not true of our pre-0.1.0 sugar!)
but if that turns out to be true, then it should be super easy to just take the separate pass and incorporate it into canonicalization anyway, so having it as a separate pass shouldn't be a significant downside
I could be misremembering, but I think we already did Pratt parsing to resolve operator precedence during parsing, right?
if so, I assume the only remaining concern is associativity (and reporting errors there), which can be a desugaring thing
Richard Feldman said:
if so, I assume the only remaining concern is associativity (and reporting errors there), which can be a desugaring thing
i thought pratt parsing took associativity into account? no?
i implemented that so long ago i forgot the specifics and im not at my machine
oh maybe it does, I forget too :laughing:
i've been digging through the Canoncalization code wrapping my mind around what @Sam Mohr has already built. fixed a number of assumptions that have changed since then, but there is a lot around the way idents, modules, and aliases are handled that i'm having a whale of a time wrapping my mind around
@Anthony Bullard want to chat about it? I'm free in like 30 min
i'd love to! i have a "wellbeing day" on friday and would love to use it to make some progress while the kids are at school
I can also answer one off question in DMs or in the channel
Though I'm sure Richard knows what he's talking about
I am going to take you up on this @Sam Mohr I'm trying to understand how you imagined the Can IR working in later stages. It seems that it stores things in a way that makes it difficult to walk down through the AST in order - but maybe I'm missing something
This is my initial PR with some basic tests on adding idents and being able to see if they are or are not in scope
https://github.com/roc-lang/roc/pull/7806
In a convo I had with @Richard Feldman and @Luke Boswell , we talked about not needing typevars in the Can IR. I also think we should probably move to a Node/Extra Data structure like we did for parse
As opposed to today where every type of node has its own list
yeah I think if we start with "TypeVar
is just a type alias for Can IR Node Index" then a lot of things will fall out of that
e.g. if all the info we need to store for 1 node doesn't fit in 1 node slot (without the size of one node getting unacceptably large), such as an if
node, then ideally we use multiple node slots to store that extra info
so for example, in the case of if
, the information we need is:
else
branch, if there is onein the Rust compiler's If
node, we store an explicit type variable for the condition and another for the branches.
We don't need to store one for the conditional anymore, because the node index of the condition is automatically a type variable, so we just use that.
We also don't need to store one for the branches, because we're going to unify all the branches together anyway, so we just need to know what the first one's variable is and we're all set. And if we have the node ID of the first branch, then we know its variable, because in this design they're the same.
so an example of how we could do If
completely inline:
If
node which stores the node index of the first else
branch and that's itIf
node's index + 1 in the flat array of nodes) we have the conditional IR node. We don't need to write down its index because we already know it's just at our own index + 1.then
branch immediately after the conditional, then we don't need to store an index for that either because we'll be right there as soon as we finish processing the conditionalelse
branch is (without traversing the whole then
branch) because if the conditional is false, it will want to jump straight there - but fortunately, that's the one index we did write down in the original node, so we're all set thereelse if
, then the else
branch will immediately begin with an if
, and this whole pattern repeatsif
without an else
, we can record that in the original node (e.g. have a different discriminant for if
and if_without_else
and they work slightly differently)so basically what we're building is kind of like a linked list of nested if
/else
s, except it's all inline in the one flat array of nodes, no side tables needed
and all the info necessary is already there for traversing it, checking its types (including type vars), interpreting it, code-genning it, etc.
and memory locality is pretty much as good as we can get it because it's all contiguous in one flat array
this would seem to work pretty well but unless we return the last node we touched when fetching the "usable struct" that represents the node, we will run into problems with any say conditional that's more than a simple expr
say "getExpr" would need to return the Expr union member AND the last node index we touched constructing it
which actually would require building out (or at least traversing ) the entire expr tree to find thst
which would require pointers since it would be a recursive data structure
or if you want to avoid that, walking the nodes for a complex expression over and over to to get the child nodes all the way down
whereas storing the span of the referenced nodes avoids the need for that
For instance, take a if
expression like this:
if something.foo("Hello", 123) {
Stdout.line!("Branch A")
} else {
Stdout.line!("Branch B")
}
Would result (with the straight-line left-to-right approach you lay out above) in the following node layout
IF_THEN_ELSE, STATIC_DISPATCH(args=2), IDENT, IDENT, STRING(parts=1), STRING_PART, INT,
APPLY(args=1), QUALIFIED_IDENT, STRING(parts=1), STRING_PART,
APPLY(args=1), QUALIFIED_IDENT, STRING(parts=1), STRING_PART
To know how to get the else body would require traversing through the conditional and the then body, and would to wanting to collecting something like
const If = struct {
conditional: Node.Idx,
"@then": Node.Idx,
"@else": Node.Idx,
}
const this_if = If{ .conditional = 1, ."@then" = 7, ."@else" = 11 };
And when you fetch the conditional, you'd have to run through all the nodes again (until we have two args) to get something like
const StaticDispatch = {
subject: Node.Idx,
"@callee": Node.Idx,
args: []Node.Idx,
};
const this_sd = StaticDispatch{ .subject = 2, ."@callee" = 3, .args = &.{4, 6} };
Since we have to skip the STRING_PART at Index 5.
So we've already walked through nodes 1-6 twice. I don't think that's a big deal, but is that we want to be doing? We'd have to have walk*
methods as well as get*
methods so that we walk through the nodes without building intermediate structs that we will through away (they would instead return the last node idx touched)
I think the structure of having a Node that stores a primary piece of data, and a span of extra data in a side array allows us to basically "memoize" that node traversal at the cost of moving between two arrays (I would assume the loaded chunks of those arrays could both fit nice and cozy in the L1 cache of most modern processors for most expressions (or even an average decl))
So I guess my preference is to lean heavily on the structure of the Parse IR which seems to be incredibly fast and the mechanics of it are easy to understand
looking at it while thinking about TypeVars, I think that should work fine :thumbs_up:
I'd say go for it!
Push up the broken rough draft state of where I'm headed with the IR - again heavily based on Parse IR
Anything outside of the NodeStore is probably incorrect or only partially correct. Most likely out of date
So just focus on the high-level API of the NodeStore
I'm done for the day, hope to have some time tomorrow morning to clean up some of the broken stuff and then fill in the minimal details to get my can tests (and any other broken tests and code) working again
I gave it a quick read - looking good so far! :raised_hands:
Reading through the PR now, it seems like we've removed the ownership of the ModuleEnv
from the can.IR
. Are we not planning on doing the two syscall caching of the Can IR, maybe instead opting for a manual (de)serialization?
If not, then the ModuleEnv
should continue to be owned by the can.IR
it's a decision i'm going back on :rolling_on_the_floor_laughing:
i still don't understand how we can do it in two syscalls with all of the pointers going on here
but it's just not the kind of thing i have experience with
but module work is forcing my
hand
Do you agree with the ModuleWork mechanism?
Its goal is to allow simple parallel compilation that doesn't require complex/non-trivial lookup of compilation artifacts from other compiler phases
it seems fine every if i haven't read through it all
Yeah, okay
I think another mechanism could work
But high level seems good
possibly but i'm hyper focused on getting Can moving at the moment
That's great, thanks for picking up the slack!
I'm gonna try again to read through the new IR shape
I'm feeling like the new IR has further indirection from the underlying concept of a syntax tree
Which makes programming with it harder to get correct
The old IR is basically just a tree, but instead of every node being behind a heap allocation, they're instead indices into arrays of the right node type
This mechanism flattens everything into a single bucket, and the Node
needs to be interacted with via some unwrapping API for interaction with the syntax tree
Is there a reasonable perf benefit to the parse style IR over the old IR?
Well you are more likely to have cache locality if all of the nodes are in one contiguous block of memory, whereas the different slices we had before could have their backing storage allocated all over the place. But that's more theory (and me watching Matt Lugg's talks about Zig's compiler) than anything else.
And they will be in one contiguous block since we created the nodes in order while running through the tree
The ParseIR design has also been shown to be very fast, and I know how to work with it - and someone that learns one IR can translate that knowledge towards understanding the second
And during unification/type checking we can just add the typevars off row using the same index as the original node
(Richard said he wanted to avoid having TypeVars in the CanIR, unlike what we did with the Rust compiler)
But I want your feedback if you are willing and able @Sam Mohr I trust your opinion and design sense.
Yeah, cache locality is the argument that drives all of this array-backed storage in the first place
In this case, it feels like the parse IR approach (do you have a better name for the pattern?) would have better spatial locality
The bias I have towards the existing pattern is that it lets the compiler check we're doing things correctly at the type level
And that's not really true of the parse IR, its runtime benefits notwithstanding
We have to really get it right in terms of putting in and taking out the right data for the right IR node
However, this is compiler dev
And two rules are important to consider:
So if you're working on it and no one else is, then getting something that works, has the right behavior, and won't need to be made more performant in the future sounds good to me
Last memory-related thought: the single, mother Node approach means that all nodes have the same size, which is the minimum required to represent the largest node type
But with heterogeneous storage of IR nodes, only a few nodes (e.g. if
and when
) need a lot of memory, and the rest can just be small pointers to those
So as much as we probably store children closer to their parents, we probably use more memory
No idea how much that trade-off is though :shrug:
But those nodes are made minimal and just point (through extra_data an array of just u32s) to other nodes which contain the other information
If they are larger than what we consider minimal
Ah, my memory was off
A modern CPU core can fit I think I counted a chunk of 1000 nodes and a chunk of 1000 extra_data into L1 cache and still have plenty of room for the stack
Something I guess I hadn't done yet was the extraction of all the bigger nodes to their own lists
Basically, you have data_1
, data_2
, and data_3
on every node. How few data
fields could you get away with if you moved the bigger node types to their own arrays?
I think that being able to just grab numbers off a chunk likely in L1 cache to lookup a node that is probably also in L1 cache is going to be ridiculously fast.
Yeah, I'll admit that I am much more strongly motivated by the "have the code look like the domain model" argument than the perf argument, and that first argument doesn't matter as much
Both should be very fast, and your approach is probably faster
You use three data nodes fields. One is for primary data (the main business of the node), and data_2 and data_3 will be either used for ancillary data or storing a Span for accessing a chunk of extra_data that points to more nodes
You can look at how we store if_then_else exprs in the Parse IR
https://github.com/roc-lang/roc/blob/main/src/check/parse/IR.zig#L1189 <- Add the node
But if 50% of nodes only need 2 data
fields, then you could save 4 / 2 = 2 bytes per IR node on average
https://github.com/roc-lang/roc/blob/main/src/check/parse/IR.zig#L1916 <- Get the node
That's the same line number
And then the owner of the IR can lazily grab the other bits as it needs it
I understand this
snark not intended
It's true, but then we would more likely have to create more nodes
It's relatively simple, yes, it just leaves us open to math errors in a way that a more typed approach doesn't
It's probably not an exact science, but we can measure and adapt the size of the nodes as we find things working
Yes, agreed
I think that this is a pretty modular part of the compiler
The nice thing is the way things can go bad is small and it's limited to these API functiosn
So the surface error of type unsafety is very contained for the amount of simplicity we get
You just have to communicate that direct interaction with the underlying data should never be done
This sounds like a job for Captain Encapsulation!
I just want to trust that compiler devs are smart enough to understand that :-). And that thorough snapshot testing will have our backs
Well, better to not rely on trust
I believe in our testing suite
And some asserts
maybe even a lot of asserts at scale
How would our testing suite prevent someone from directly reading/writing to an array?
If you do something wrong - the snapshots will change in a pretty profound and noticable way
I think you can make things work properly by manually manipulating the data, even if it's more effort than using the provided APIs
Do you think that encapsulation is actively bad here?
No, that's why I provide the API I do
Zig doesn't really believe in encapsulation
Yeah, that's the underlying issue, isn't it
Cause I'd say that API is only half of a solution, like a jar without a lid
Andrew seems to not be the biggest fan of the concept
You can't encapsulate without privacy
It's kind like, you wouldn't touch the items
pointer in a slice, would you?
So much of the Rust compiler's development seemed to be done by people inferring how it worked based on what they saw in undocumented code
...
I guess if it's not a problem, it's not a problem
I'd just always look for a low-cost way to rely on zero trust, or less trust if that's what I can get
But Zig doesn't make that easy without awkwardly naming fields itemsDontTouchMePlease
So maybe it's not worth it
I search for "Andrew Kelley on Zig encapsulation" and AI gave me this, I wonder if Andrew would agree with the slop:
Andrew Kelley, the creator of the Zig programming language, has not specifically addressed encapsulation in the same way as other languages like C++ or Java. However, Zig does offer features that enable encapsulation through other mechanisms. Kelley's focus is on a system-level programming language that emphasizes safety and performance, with encapsulation being a secondary concern.
Here's a breakdown of how encapsulation works in Zig and what Kelley's perspective is:
- Encapsulation through Modules and Structures:
- Zig uses modules to group related functions and data, providing a basic level of encapsulation.
- Structures (like
struct
) are used to define data types, and while they don't have the same access modifiers as other languages, they can be used to create data-hiding mechanisms.
- Implicit vs. Explicit Encapsulation:
- Zig doesn't have explicit access modifiers (like
public
,private
,protected
).- Encapsulation is achieved through a combination of module organization, structure definition, and coding practices.
- Kelley's Emphasis on Safety and Performance:
- Kelley's primary focus is on creating a language that is safe by default and provides excellent performance.
- This means that while encapsulation is important, it's not the central focus in the same way as in languages that prioritize object-oriented programming.
- Alternative Encapsulation Techniques:
- Zig developers can use techniques like using private functions within modules, using unions for data representation, and employing memory-safe allocation methods to achieve levels of encapsulation.
- Kelley's Views on Encapsulation:
- While Kelley doesn't have a strong emphasis on traditional object-oriented encapsulation, he does understand its importance in building complex and robust software.
- He has suggested that Zig's structure and module system allow for a reasonable degree of encapsulation without sacrificing the language's focus on safety and performance.
In essence, Zig's approach to encapsulation is more functional and less object-oriented, with encapsulation being achieved through a combination of modules, structures, and coding practices. Kelley's focus on safety and performance means that while encapsulation is important, it's not the primary driver in the same way as in other languages.
A lot of what's said there mirrors my memory of what he has said on the topic before
And I know that big Zig users like TigerBeetle solve this through a HEAVY use of Asserts and some very sophisticated testing techniques
Here's his reason from the mouth (finger mouth?): https://github.com/ziglang/zig/issues/9909#issuecomment-942686366
the more effective strategy is to provide composable abstraction layers.
The current IR feels like it follows that better by not requiring the half of encapsulation that Zig lets us implement, but again, it should be good enough to do it the way you've outlined, as well as faster and in line with the proven effective approach in the parse code
We could try this approach if you are really concerned: https://github.com/ziglang/zig/issues/9909#issuecomment-2679366674
Lmao let's not
I'm not really concerned. I'm continuing this conversation because it seems like you're more interested in us finding alignment, but I don't strongly hold my fundamentally different opinion
It's pretty simple. Just a bitCast inside every method that needs the private field (NodeStore here)
Cool
I'm just trying to ride with wave of what Zig does, and since I am working on it with very smart people - hoping that we will all be striving for understanding before moving and all row the boat in the right direction
Yeah, that's part of it, I like doing what Zig devs people would do, let's work with the language instead of against it
Agreed :handshake:
But now my kids woke up, so I'll have to actually code later ;-)
kk
Anthony Bullard said:
i still don't understand how we can do it in two syscalls with all of the pointers going on here
Just to get back to this, the goal is to save/load cache data at a module level (and maybe for other data) as quickly possible. We can manually write serializers and deserializers to do this, but then they need to write to a file on the scale of bytes at a time, rather than all of the memory we care about all at once.
Luckily for us, Richard had an idea (coming from discussion with Andrew, I think) that if all of the memory that we want to cache exists in one contiguous array, we can just write that to a file directly. And even better, we can read it directly back to memory on cache load with no deserialization step!
This only works if we have all of the memory we want in a single block with no pointers (pointers will be wrong if the heap is different in different runs, but indices are fine). That's why ModuleEnv
is owned by can.IR
: then everything is in one place!
just a quick note - the approach @Jared Ramirez used in https://github.com/roc-lang/roc/pull/7772 checks all these boxes!
Hmm, I thought the design of the new compiler's canonicalize pass was already very similar to just 'desugaring'. What's the delta between those?
That's likely what we will get Joshua
But right now I'm dealing with a clean compile that crashes with a SIGBUS error
So that's a lot of fun
Does zig crash or does our (partial) compilation of a roc file crash?
a test crashes
net new to my change
it's some sort of alignment issue, but it's not clear what's the cause
there is no stack trace
Sounds tough, I can take a look if you put up a branch :)
valgrind should help you get additional info
I've got a theory that I'll test, then try valgrind, then I'll phone a friend :wink:
I'm going to have to push the branch anyway to run this on my Linux machine since I valgrind doesn't work on macos
Even tearing my test down to being pretty minimal (not calling the actual can method) I've went from SIGBUS to SIGSEGV to now a SIGTRAP
Back to seg fault :-)
It has something to do with deinit'ing an ArrayListUnmanaged
I'll have to figure it out tomorrow
This is a good thing for me to struggle with, since there is obviously something I don't understand
This might be why :rofl: :
Deinit levels backing arraylist with 16562106820025843780 items
What's weird is that it "just happens all the sudden", here's all the logs at every place we handle it until deinit'ing that array:
Enter with 0 levels...Returning a scope with 1 levels...can has a scope with 1 levels...Deinit levels...Deinit levels backing arraylist with 7484431091188432957 items...Segmentation fault at address 0x67de0a6c582d003d
And you can see the number of items has changed, which tells me the items array has junk in it
That's also not addressable in even 64 bytes (7 quintillion???)
This code:
var can = Self.init(&can_ir, &parse_ir);
std.debug.print("can has a scope with {d} levels...", .{can.scope.levels.levels.items.len});
can.deinit();
// Deinit for can
pub fn deinit(self: *Self) void {
std.debug.print("Can has a scope with {d} levels at deinit...", .{self.scope.levels.levels.items.len});
self.scope.deinit();
}
Will produce this:
can has a scope with 1 levels...Can has a scope with 4371552663 levels at deinit...
It's something with pointers....
It's that the Scope was created internally by Can in an init function, but not heap allocated so it fell off the stack and the pointer did in fact point to garbage.
Oh interesting
Yeah, the issue I have now is leaks.
There seems to be some contention between wanting the ModuleEnv to be owned by Can IR and not having issues with memory leaks
Or a skill issue on my part (probably the most likely answer)
And...I think I just solved it
Finally:
Build Summary: 4/4 steps succeeded; 117/117 tests passed
test success
└─ run test 117 passed 318ms MaxRSS:3M
└─ zig test Debug native success 2s MaxRSS:535M
└─ options cached
Updated my PR
And now....to actually do canonicalization stuff
for binops and unaries, two questions:
and specifically for ?? do we want to continue desugaring to match or a function call?
i'm also assuming return
will need to do a pretty substantial rewrite when desugaring?
normal functions by default, but not if control flow is involved
return
shouldn't be sugar, it should be a first-class thing
so and
, or
, ?
, and ??
all desugar to control flow
that said, I actually kinda wonder if we should skip desugaring until later
that said, I actually kinda wonder if we should skip desugaring until later
like just keep all of those first-class for now
in canonicalization
because we already do CalledVia
to distinguish between plain function calls and desugared operators, so are we actually saving ourselves anything to desugar at this point, or are we just making roc check
do unnecessary work?
when that work could be deferred to actual builds instead
Oh that would make can much simpler
If it's easy enough to type check those constructs, that's fine with me
I think as far as unifying types goes, and
, or
, ?
and ??
should be fine – we'll just have to internally give them the same type as what the functions they desugar to when generating constraints
it feels like Can is doing work that 90% of which COULD be done at parse time
1 is Dispatch
The dispatch type checking should be checking the type of the first arg, so desugaring to a method call is just extra work for later
wait
Okay, so foo.bar()
doesn't just desugar to bar(foo)
or foo->bar()
since it really means Foo.bar(foo)
Meaning that if I write foo + bar
, that needs to turn into Foo.add(foo, bar)
I think there's a benefit to desugaring, because the canonicalize
stage of the compiler is cached and the resolve_imports
stage is not
Literally any work that we can do before caching without having to read any files outside of the current file should be done in can
(and maybe a future solo_typecheck
)
A more complex caching mechanism was discussed at some point to allow for caching of work from later stages (at least check_types
) to recognize a tree of dependencies, AKA if Foo.roc
imports Bar.roc
then Foo.roc
can be loaded so long as Bar.roc
hasn't changed
Anyway, for now, I think desugaring should be done at the beginning of or during canonicalize
because it will almost always require less work overall
I dunno, I hear what you're saying Sam - but I'd like to try deferring it and see how it goes
maybe I'm wrong in practice but I'd like to try it that way as a first pass
We can definitely try it!
Anthony Bullard said:
it feels like Can is doing work that 90% of which COULD be done at parse time
you mean as in resolving names based on scope?
there's another piece we haven't gotten to, which is building a dependency graph between top level decls so we can sort them before type checking (which is very important for type checking correctness)
I presume you could do that by notating any idents used in the body of a decl and seeing what they point to, not needing to maintain a graph within the decls
I know you're more getting at the macro recipe of can, just throwing that out
right, I just mean conceptually
how do we solve for mutual recursion?
i thought the solve their was just walking the top level decls collecting the idents, and then just assigning each one a type var
and then walking them again to solve them
and then you just have to have cycle checking for condition less mutual recursion(no base case) like
a = |x| b(x)
b = |x| a(x)
which is silly and diabolical if done earnestly
You look for SCCs
And think in terms of those, unless the SCC is size 1
Have you seen how we handle those in the Rust code?
I think recursion can be allowed if all the decls are functions, and pretty sure they are not allowed if they are all constants, not sure about a mix
yep, exactly
ok, not familiar with the term SCC
you mean from graph theory?
strongly connected component
yeah
oh ok
i think i've implemented these without knowing they existed
basically you input all the "a depends on b" relationships and it outputs all the groups of cycles
if a group has a size of 1 then it's not involved in any cycles
etc
Last updated: Jul 06 2025 at 12:14 UTC