For those of you that don't know, I am working on reworking the canonicalization code in roc_can
for a few reasons:
In order to achieve these goals, the plan is to break roc_can
into two new crates:
roc_can_solo
: Canonicalize a module pretending that no other modules exist. Imports are treated as things that might or might not exist, stuff like that. This will be doing the majority of the work in canonicalization. When this is finished, its output will be 100% determined by the source code, independent of where it is in the filesystem. That way, we can cache this output keyed by a hash of the files contents, very simple and performant.roc_can_combine
: Take the outputs of the roc_can_solo
crate and stitch them together, including:import "path/to/file.json" as data : List U8
)Once roc_can_combine
is finished, we can pass the results to the type checker as before. In the future, we can even do partial typechecking on the values in each solo module, which should not even require that many changes to roc_constrain
, but I don't know how feasible that is.
Some nice things that will come out of this change:
CompilerProblem
, and all places where we "need to crash" can just use one of those as a runtime error-generating AST node. This is more threading in the compiler than before, but it's way safer.#![no_std]
, which makes that impossible. By having all canonicalization info under the same bump allocator, we get caching for free, not to mention a good performance improvement!load_internal
. This should make canonicalization more modular than before.The plan for implementing this change comes down to me understanding the final outcome's shape well enough to implement it in small, peer-reviewable chunks. To that end, I'm working in this branch of my fork with a machete to roughly shape a copy of roc_can
into the right shape. At the same time, I have a markdown document on my machine where I'm writing down what the "plain English" recipe for a solo module canonicalization, and then a combined one, will look like. Once I have those together, I'll start making PRs!
Feel free to ask any questions.
Awesome!!!
Think I'll take a pause on trying to resolve can
panics for the moment then
This is really my main focus when working on Roc at the moment, but I am just one person. If anyone is concerned by the likely slowdown to implementing static dispatch in a big bundle with all of these other changes, I understand and partly share your concern. Since this doesn't immediately slot into the rest of the compiler plan, it might be a couple months for us to have static dispatch available. So if someone really wants it now, I'd be happy to talk about how we can parallelize this work
Otherwise, I'm happy to do this myself. It's really so enriching to get to put blood, sweat, and tears into what will be the best language someday
Ah man, you're taking all the fun jobs :smiling_face:
I volunteer to help find all the bugs you leave behind
Oh yeah, I forgot lmao. When I was writing the snake_case
conversion, I wanted to break a test intentionally and write panic!("Luke, you're my only hope")
or something.
I'll make sure to do that for this next set of PRs just for you
We have lots of language changes that need to be implemented, and it'll be faster to just implement those all in one go instead of incrementally. I'm talking:
- static dispatch
- removing tracking of lambda sets/captures (this will be done later)
- setting up for the rest of the new compiler pipeline
- and more...
These are not like the syntax changes, it seems like difficult bugs are a real possibility here. For debugging it can help a lot if you only need to consider a small set of changes. Are you sure this will be faster if you include potential debug time for issues that may pop up from the entire Roc ecosystem?
The plan is not to make one giant PR with all changes included, but to make the changes incrementally on a new canonicalization code
I just have no idea what it'll look like yet. What change do I make first?
I don't know how to make these changes in steps. One benefit to this approach is that I get to ignore a lot of features to begin with. I'm starting without tracking lambda sets, without module params, etc.
changes incrementally on a new canonicalization code
Can you explain this in more detail?
If someone would know how to do this without such a nuclear option that wouldnt take 6 months, it'd be great to hear
I'm planning on modelling my changes on the strategy Agus has been taking with the new monomorphization code, more or less
Start by outlining the end shape, and leaving a whole lot of "implement this and TODO" in places where it's obvious what needs to happen
That can help us start with PRs that other people can understand
To help with the robustness of this, I think a very important step will also be defining a testing strategy for all of these features
Makes sense!
I've not dug too deeply into that side of things, but the canonicalization testing today is mostly testing individual warnings here or there, and a lot of desugaring testing.
We'll need to figure that out as well. My hope is that we can do more unit testing on "just canonicalize this alias", not "create a whole module with aliases and check the problems that arise"
That should make it more readable, and more modular
Probably once I figure out the overall plan, I'll try to write it up in more detail and jump on a call with someone. That will give me an opportunity to make sure that there isn't a big hole in it somewhere.
I think the main steps before I can start drafting an outline for PR are:
var
and shadowing works with the new scope planroc_can_solo
and roc_can_combine
If someone would know how to do this without such a nuclear option that wouldnt take 6 months, it'd be great to hear
I've been thinking about this.
We will be in a position with two Can stages, the current (legacy) one, and the (new) one being developed. Both of these take the same input, Parser AST... and eventually produce the same output, Mono IR?? right?
Can we wire up a test harness that can feed the same input in, and confirm it's getting the same output?
Starting with the most basic of expressions, but over time as the new Can implementation matures we can add tests and eventually be in a position where we have feature parity.
Or maybe there is a way to use the fuzzer, and incrementally add supported AST nodes
The same output won't come out because of a number of changes. Static dispatch for instance
This won't catch all the new features... but maybe it helps get us something we can use sooner
And I'm avoiding supporting abilities
But yes
For those things in common, it should output the same Mono IR
Well...
Lambda sets are getting built differently as well
In that they're supposed to be built later in the compiler
Even without Abilities and Lambda sets... we could still cover a lot of the AST though
Probably
Worth a try to make sure we're on the right track
If we had the new Can module (even just stubbed out) @Joshua Warner might be able to help with the test harness
Sure!
I think I'd be able to get something in the next few weeks as a stub
Sam Mohr said:
And I'm avoiding supporting abilities
Is there a way we could rip this out of current Can, and make it another pass to the side, or move it to the end or something? Basically... could we do something now so we can keep the current impl and then it could be compatible with the new Can?
And I guess lambda sets are in the same boat
In my professional experience, it can be very very tempting to do a rewrite _and_ make significant functionality changes at the same time, but it's almost always a terrible idea
Yeah, I'm trying to find ways we can keep everything online and enable an incremental approach.
That's why I think the first step is to try to understand the end state, and then write down a plan that outlines what things should look like at the end state, and then break that into incremental changes as much as is feasible
It's also ok to start and change course along the way. More of a discovery or R&D type approach than an up front engineering effort
Yes, I'd call this the R&D stage for sure
An idea: between roc_can_solo
and roc_can_combine
, the latter is basically what we do today, but separated. We can maybe start by making a very small roc_can_solo
that only does a little bit of work, and then passes everything else to the old roc_can
And eventually we move as much as possible to roc_can_solo
until it all works
So step two would be to figure out the caching mechanism and roughly set that up
And step one is to do the prep work of making roc_can
ready for this work
Meaning moving to use arenas as much as possible, changing names of things, using CompilerProblem
s where possible
What if we did something like:
Yeah, so for anything low technical maturity/R&D I would highly recommend taking a more agile/incremental approach -- keeping everything online and running ops normal.
I think the biggest risk here is the unkown-unkowns (sorry for the cliche's).
Yeah, Josh's suggestion is basically what I was expecting. I can try it
The subtle difference is that I think that roc_can_solo
and roc_can_combine
will use the same IR
There's no reason that can't be the case eventually
Well, one option is for the new desugared
IR to be a roc_can_solo::Expr
that looks just like desugared IR to start with, but over time we change it bit by bit, and once roc_can_solo::Expr
and roc_can_combine::Expr
are the same thing, we can use roc_can_solo::Expr
I would try to get there incrementally tho
so now that the Frontend Masters stuff is wrapped up, I have a backlog of things I should be doing...but what I'm fired up to do instead is to write some Zig canonicalization code :grinning_face_with_smiling_eyes:
what's the current status of that? I have no idea how far along things are!
I'm assuming @Sam Mohr might know?
I have some local changes to implement sexprs for the can ir, planning on submitting a pr “soon”
I'd love to see Richard working on it! I don't have that much work done, but I can put it in a branch and see what comes out.
If it's not already obvious, I've been burned out on Roc development for like a month and I don't know how to fix it. I was hoping taking time to play games and not think about it would work, but nothing seems to be working... Life outside has been tough. I'll give an update soon when I have the energy to come back.
So yes, thank you Richard for picking up my slack!
Sam Mohr said:
If it's not already obvious, I've been burned out on Roc development for like a month and I don't know how to fix it. I was hoping taking time to play games and not think about it would work, but nothing seems to be working... Life outside has been tough.
This is normal and something that might just take time or the right break/inspiration. My time invest in roc significantly varies month by month. Often times, it just takes a while to revive. Generally, certain things re-energize and inspire (like community events and longer vacations eg holidays).
Take the time you need and don't worry about roc. It will keep moving and it will still be here when you get back.
We will miss your presence in the chat, hope to see talk to you soon buddy
Thanks Brendan
Yeah, maybe when work chills out
yeah super normal feeling... please don't feel bad about it! You're welcome whenever you're feeling it, just drop in and we'll catch you up on whatever's been happening :heart:
and thanks for all your awesome contributions so far!
also @Sam Mohr I'm happy to start from a blank slate, so no need to push a WIP branch unless you really want to :big_smile:
Life outside has been tough.
It pains me to hear that, I hope things get better :hugging:
Adding sexpr formatting to the can IR: https://github.com/roc-lang/roc/pull/7737
Note that this is largely untested & doesn't get hit (yet)
I’m constantly impressed by how mature and emotionally healthy the Roc community is :heart:
Yeah, it's really nice to see
There's a selection bias in this group for people that are willing to give their free time for no pay to improve the state of programming
So how surprised can we be?
I wonder if anyone has made any progress here?
I'd really love to do enough to get a Hello World program to be able to run (in an interpreter)
I know this PR is moving, but not sure how much else has moved: https://github.com/roc-lang/roc/pull/7772
I'd love to sit with someone as they review this and learn how to even make it through it. It's just too much code in an area I am not an expert in for me to review with any sort of authority at the moment
I've implemented a few simple type checkers, and unification once before (but not to completion). But this is a LOT
Haha, this is an area of the compiler I tend to avoid. I don't feel like it is that complicated, but I have never felt like groking all the type checking pieces.
yeah I took a bunch of notes about that PR on the plane (no wifi, couldn't comment) - overall looks good, I just want to leave some comments
but yeah I'm planning to merge it this weekend!
I also have some cache serialization stuff that's close but needs some more work
I've been following along with all the commits. But I also don't feel qualified to really comment on it.
Yeah, sorry the PR is so huge — I know it makes it really hard to digest. I debate breaking it up, just didn’t have time. I’ll make sure to chunk things up better in the future for ease of review.
Planning on looking at Richard’s comments in detail later today, but probably will merge as-is then open a follow up PR addressing comments this week.
And I’m happy to find a time with whoever is interested (@Anthony Bullard ?) and talk through it to share the knowledge!
heck yeah! thanks Jared!
I'd love to join that discussion too
I think if no one else is working on it, I'd like to try to set up desugaring. Should I start a new topic on that?
I'm going to assume so :-)
Last updated: Jul 06 2025 at 12:14 UTC