7897: Handle parsing ambiguous nodes
Starting work on updating if-else:
predictNodeIndex
Just a note on the above, I'll be touching huge swaths of check/parse/AST.zig, a good bit of check/parse/NodeStore.zig, and a bit of check/parse/Parser.zig. If you could check in with me about changes there that will be coming in within the next week, it would be helpful to avoid a very painful merge conflict.
I do aim to merge the most impactful stuff to AST and NodeStore as soon as I have tests passing
I'm getting rid of other uses of predictNodeIndex
@Anthony Bullard would you mind looking at https://github.com/roc-lang/roc/pull/7898 does that impact on your parser work?
I'm wanting to finish my crusade to add a single unit test per Can NodeStore add/get variant to verify they roundtrip correctly.
I'm working on SingleQuote
tokenization and possibly parsing @Anthony Bullard it will likely interfere with you changes. although the conflicts shouldn't be very big
Also, I plan to make tokenizer.zig
use Region
s in relevant places in a separate pr
@Kiryl Dziamura I don't think there is work to do here...Maybe I'm wrong. But the main thing that needs to be done is adding a node for it in the Parser and actually parsing it.
The tokenization is happening here: https://github.com/roc-lang/roc/blob/259b290c2a39e45f683d329d16ba2963cec13c68/src/check/parse/tokenize.zig#L980
and it lacks return after this line:
https://github.com/roc-lang/roc/blob/259b290c2a39e45f683d329d16ba2963cec13c68/src/check/parse/tokenize.zig#L999
Oh, I didn't even see that! Great catch!
Planning on starting type checking on match statements once Luke's draft PR lands
I may get the Can for match (at least with current supported syntax) done by your time tomorrow. Somewhat tempted to take a detour and cleanup the snapshot mess I'm feeling like I've contributed to
started working on tokenize.zig
cleanup related to offsets (the goal is to use regions if they don't introduce any penalty):
https://github.com/roc-lang/roc/pull/7917
Generally question about the above PR. Why are we starting lengths in the tokenizer at all?
Retpkenizing should be essentially free.
So why not only store the offset and leave regions to later parts of the stack. Even later parts of the stack likely could just reference two token indices and then just request the start and end from the two tokens.
Not saying that is the best setup, but curious our strategy here
I guess interned
is the reason? maybe it's redundant tho
Also, whitespaces are skipped, I think without end offset we'd have to collect all gaps as tokens
Jared Ramirez said:
Planning on starting type checking on match statements once Luke's draft PR lands
Working on canonicalization + type checking for match
@Jared Ramirez are you building on top of my branch? Does it look ok to merge as is?
Yeah, it looks good to me!
I've been mostly reading about exhaustiveness checking so far, haven't written any code yet. But was planning on starting with error messages until your branch is merged!
#7919 -- I've started adding a roc package. I'm thinking about and exploring how we will do multi file snapshots. Also wanting more realistic examples to test our parser/can implementation against to understand what areas need work still.
My current line of thinking is we make a directory that represents the root, with the .roc
files in it exactly as they would be.
Until we get multiple module things up and running, I might have the snapshot tool just pickup these .roc
files and treat them as independent file
type snapshots and generate a .md
for each...
I'm going to start investigating support for Nominal Tag Unions.
PR https://github.com/roc-lang/roc/pull/7922
#7923 - draft of making effectful functions work (and not use a type variable anymore!) along with some type inference fixes and other miscellaneous improvements
I've started refactoring CIR a little... https://github.com/roc-lang/roc/pull/7925. Basically pulling the obvious parts into separate files (Expressions, Statements, Patterns, TypeAnnotations etc), adding doc comments and examples. etc.
I've been researching how to implement new features like nominal types, and figure I may as well clean up and document everything as I learn more. It's easier for me to understand how things are wired together when it's organised, and hopefully helps the next guy that comes along too.
I can keep this in Draft and rebase it until @Richard Feldman and @Jared Ramirez land those two PRs sometime tomorrow (my time) I assume, I'm not tracking anyone else working in CIR right now.
I've rebased this CIR refactor on main. I'll continue poking at it for a few hours. Please avoid making PR's that touch CIR unless your basing off this branch.
I'm taking a look at type checking binops. Are all of the following, minus and
, or
, pipe_forward
and null_coalesce
going use static dispatch?
/// Binary operators available in Roc.
pub const Op = enum {
add, // +
sub, // -
mul, // *
div, // /
rem, // %
lt, // <
gt, // >
le, // <=
ge, // >=
eq, // ==
ne, // !=
pow, // ^
div_trunc, // //
@"and", // and
@"or", // or
pipe_forward, // |>
null_coalesce, // ?
};
pipe_forward
should be ->
instead of |>
(and prob could use a different name)
it doesn't use static dispatch
it's just sugar for a normal function call
(maybe arrow_call
might be a better name?)
e.g. arg1->my_fn(arg2, arg3)
does the same thing as my_fn(arg1, arg2, arg3)
null_coalesce
also doesn't use static dispatch, and should be renamed to something like return_err
it works like this:
answer = my_result?
...is equivalent to:
answer = match my_result {
Ok(val) => val
Err(err) => return Err(err)
}
or maybe postfix_question_mark
if we want to name it based on how it looks :smile:
btw I definitely want to make it so that if we get type mismatches with these, we report them using the binop names (since that's what was in the source code) rather than the functions they effectively desugar to!
like if I write +
or ?
in my code, I should see +
or ?
in the error message (and not just in the source code snippet, but in the words too!)
okay cool, since no desugaring has happened by type checking, nice error messages should be straightforward here
then ?
shouldn't actually be a binop, since it's really just a suffix? Maybe I should remove it?
true
Maybe this null_coalesce
was ported from the older ??
or optional record fields?
ohh
yeah ??
is totally a binop
that's like Result.withDefault
right?
kinda - it's also not static dispatch, but rather a match
sugar:
answer = my_result ?? 5
...is equivalent to:
answer = match my_result {
Ok(val) => val
Err(_) => 5
}
the distinction matters if you want to use it with things that affect control flow, like ?? return 5
or ?? crash "blah"
whereas if it just desugared to actual .with_default
, those use cases wouldn't work
I'd like to re-attack Nominal Types.
Before I can do that we need to support parsing multiple UpperIdent
separated by Comma
tokens, and also a package prefix, e.g. json.Core.Utf8.Encoder
might be an example of the Encoder
type declaration (presumably nominal but could be an alias) in the Core/Utf8.roc
module inside the json
package.
I've pushed my WIP to https://github.com/roc-lang/roc/pull/7931
I think it's looking ok, but I'm still working through the diffs in the snapshots and picking up minor bugs that I'm fixing. I've ran out of time today to finish this... I should be able to pick it up again Sat evening.
@Anthony Bullard there are changes in here in the Parser and AST/NodeStore to support parsing of qualified types and values in packages/module chains.
Hey! I continue reading the codebase. Now I have some questions regarding parsing. can you please take a look https://github.com/roc-lang/roc/pull/7936/files ?
I was looking at switching our build script back to a proper check
step. On top of that, making check
cover all executables and tests, but then having smaller steps for faster checking. The most relevant probably being check-test
which would likely be a solid default for most work. One thing that still annoys me with how zig/zls currently do checking is that if you have multiple check targets that include the same source, you get a ton of repeated error mesages. So for every executable, we get another copy of each error message. Theoretically the fix for this is to use more modules. That way the executables just import our "main" module and the "main" module runs check once on all the files it contains. I haven't tried factoring this way yet, but I want to make some of this nicer overall (though it would require a refactoring of imports).
Have a draft PR for at least some starter work as I think of better factoring: https://github.com/roc-lang/roc/pull/7942
Not actually sure this is the right way to go, just tinkering and trying to match what zls suggests.
I feel like adding support for qualified Idents, and Nominal Types (including recursive) has been a mighty big yak.
I've gotten to the home stretch a few times now, and then noticed a bug which has taken me down another rabbit hole.
I got type instantiation landed, but it needs a refactor. gonna do that next.
Gonna work on checking nominal types next.
Thinking a next mini-milestone for me might be: after nominal type checking work, I might create a pared-down version of the bool.roc built-in file. Then canonicalize and type check that, and pass those types into the actual user roc file to to use the built-in Bool nominal type in type checking in things like if conditions.
that sounds sweet!
if you wanted to get advanced, you could do the same with List.roc
and implement it as a Cons list (just for now, of course)
:alert: PSA! :alert: as of https://github.com/roc-lang/roc/pull/7949 we now have a new EXPECTED
section of snapshots, which lists the PROBLEMS
we expect to see in the rest of the snapshot (just the ALL CAPS name of the problem and the source region, not the entire error message)
the goal here is just that if we accidentally cause regressions in our snapshots, this will let us know! (If we actually make fixes and reduce the reported PROBLEMS
, then we can update the snapshot with a revised EXPECTED
section.) And since it doesn't verify the entire contents of the snapshot, just the errors reported, it shouldn't give us false positives for things like refactors that change ident numbers etc.
Wow, that PR hits more files than I expected...we have way more snapshots than I realized.
in the future we could maybe also do something with like verifying expected types of things
yeah we've accumulated a lot of them already :smile:
many of them have problems though
(as in, unintentional ones, e.g. because we haven't implemented things yet, or in some cases because they're using old syntax)
I've been talking with @Joshua Warner about the snapshots. I've started reviewing all of the old-syntax ones and I'm going through and deleting any that I think are not helpful, and updating the ones I think are useful.
I still plan on implementing Can for where
, but just taking a quick detour to clean up that snapshots a little. I got a bit carried away just converting them all using a script, and didn't take the time to properly review them.
I appreciate that! now that we have EXPECTED
in there, would also be great to review those
right now I set it so that all the EXPECTED
fields just line up with the current PROBLEMS
but we should review whether they're actually correct
e.g. I've already seen quite a few where the snapshot seems to be trying to verify one thing, but actually it gets derailed by a syntax error (which is reported in PROBLEMS
, at least!) which causes it to not actually be type-checked (or whatever thing it's actually trying to test)
I'm gonna start working on canonicalizing and type-checking other modules (e.g. imports, exposing, etc.)
cc @Jared Ramirez since I saw you had a commit involving canonicalizeHeaderExposes
yesterday! :smile:
Tomrrow will work can/type checking for type arguments in nominal tags
If we're happy with where
clause for now. I'm thinking of digging into the issues from our realistic example src/snapshots/plume_package/Color.md
. It looks like we need to investigate;
I’m currently working on nominal tag payloads!
I'd like to change canonicalize/NodeStore.addExpr to return an error...
// FROM
pub fn addExpr(store: *NodeStore, expr: CIR.Expr) CIR.Expr.Idx
// TO
pub fn addExpr(store: *NodeStore, expr: CIR.Expr) std.mem.Allocator.Error!CIR.Expr.Idx
But I'd like to merge all these PR's that touch Can first to avoid a conflict
Why? I thought OOMs were deemed unrecoverable errors. Mostly curious cause if we want this, we probably should be consistent across the compiler.
Richard has said he would like to use this pattern more.
I'm not sure where. But @Jared Ramirez and I have been slowly converting to use the zig errors like this.
I guess it's more explicit at the callsite that the function is allocating
yeah, that plus also I think there may be an increasing number of cases where we're only using allocation for Problems
and in those cases I think it would be interesting to have Problems
record an "OOM Problem" and stop recording further problems
which would mean that pushing to Problems
could stop having allocation errors, and then it would be more useful to see which things are allocating and which ones aren't
separately, I also like the idea of actually knowing what will happen in resource-constrained environments, e.g. we could offer a CLI flag for constraining how much memory the interpreter can use when evaluating compile-time constants, we could have a better UX in wasm (e.g. in the playground) if we want to limit total memory usage in the browser and be able to more gracefully handle OOMs etc.
I thought to myself ... it should be nice and easy to implement Can for these guys;
dbg x
crash "msg"
expect 1 == 1
return x
Turns into another massive Yak :melt:
Starting looking at de-structuring record sub-patterns, going on another tangent with record update syntax, as
patterns, and pattern alternatives. :sweat_smile:
#7979 -- WIP implement module caching.
Currently we have a unit test that round-trips a whole module, and we added a verification test for every snapshot that serialises and deserialises from memory and compares the SExpr matches.
Next steps are to cleanup some of the serialisation logic (type.Store looks particularly hairy), and wire up hashes and filesystem parts into roc check
.
Lmk how I can help with types store! I'm also working on some changes in there related to aliases and nominal types, should hopefully MR tomorrow or thursday
I also would like to reduce the number of backing arrays in types.Store
too, which may be relevant
I'll see if I can fix the valgrind issue and maybe we merge as is, and fixup the types.Store backing/serialization in a follow up
It's taking a little while, I can't run valgrind on my mac so doing what I can and then throwing at CI to test.
Turns out the cleanup I had in mind is just the remedy I think for our valgrind issues :smiley:
Today I have been down the alignment rabbit hole -- I do not yet know if I will emerge victorious
I'm poking at the module caching again, there are some obvious improvements we should make. I just hacked something together to get it working yesterday, but would like to clean it up and simplify it a little today.
I've started working on unit tests to round-trip the AST https://github.com/roc-lang/roc/pull/8002
I've experimented with using RNG to provide integers (deterministically) which I think significantly reduces boilerplate.
// example using random helpers to reduce visual noise in the tests
try headers.append(AST.Header{
.app = .{
.packages = rand_idx(AST.Collection.Idx),
.platform_idx = rand_idx(AST.RecordField.Idx),
.provides = rand_idx(AST.Collection.Idx),
.region = rand_region(),
},
});
I'm reasonably confident this will help with the Parser NodeStore refactor, as we're not touching the internal Node representation, just using the AST representation.
Up next on my immediate todo list is:
Span
instead of Range
, and flatten backing arrays (eg tag_args
, tuple_elems
, etc into a single arraycanonicalizeTypeAnno
and canonicalizeTypeAnnoToTypeVar
, but I suspect if we generate type vars correctly in canonicalizeTypeAnno
we don't need this extra var conversion func, which would remove a step and I think dedup logictag
patterns to contain in qualifiers (to deal with pattern matching on nominal tag unions)^ FYI taking a detour to refactor Can to ensure that CIR nodes<-> type vars always match 1 to 1
I'm probably going on a crusade tomorrow to eradicate exitOnOom
, and anything else that is OS specific deeper in our compiler than the cli.
This is in preparation for WASM -- so we can avoid a more painful extraction later.
Hey everyone. From my side, I started chipping away at the parse fuzz crashes that show up in https://roc-lang.github.io/roc-compiler-fuzz. The codebase is a treat to work with :100:
I am also planning on configuring a machine to run the fuzzing myself.
Luke Boswell said:
I'm probably going on a crusade tomorrow to eradicate
exitOnOom
, and anything else that is OS specific deeper in our compiler than the cli.This is in preparation for WASM -- so we can avoid a more painful extraction later.
Exiting is os specific?
JRI98 said:
I am also planning on configuring a machine to run the fuzzing myself.
Awesome. Just make sure you are actually getting fuzzer coverage info when you do so. I know that afl++ and zig is a bit finicky.
@Brendan Hansknecht yeah currently our impl prints to stdout
std.io.getStdErr().writer().print(format, args) catch unreachable;
if (tracy.enable) {
tracy.waitForShutdown() catch unreachable;
}
std.process.exit(1);
Ah... Duh
Luke Boswell said:
I'm probably going on a crusade tomorrow to eradicate
exitOnOom
, and anything else that is OS specific deeper in our compiler than the cli.This is in preparation for WASM -- so we can avoid a more painful extraction later.
exitOnOom is invasive... this is going to be a mammoth PR that basically touches everything.
I tried to do a little bit or think about gradual migration... I'm currently going with the rip-the-bandaid off approach.
In a way it's easier this way, because I can quickly scan through code and see what needs to be updated. Anything that allocates should return std.mem.Allocator.Error!...
etc
Thanks, I need morale support... this is a very mechanical change. Thank the Zig gods for making a fast compiler :pray:
Might be a good job for claude or similar
If you aren't already done that is.
I think I'm probably way faster than Claude at this. It's pretty mechanical.
Also -- I wouldn't trust an LLM with this kind of refactor, how would you ever review the changes with any confidence... there is just waaay too many.
hmm. I haven't tested claude on this type of refactor, but it tends to do a great job with mechanical simple error driven things. Especially once you set up a base example.
how would you ever review the changes with any confidence
The same way I would review your code once you make the PR?
going to work on parsing utf8 escape seq in strings and single quote (\u(...)
)
nice! I think we ended up using curly braces instead of parens for those in the old compiler (we should keep whatever syntax it uses)
I can't help myself it seems... still tweaking CSS for the playground.
This is why people shouldn't let me touch any CSS -- I have the same problems with PPT slides in case anyone was wondering.
I used to love hacking css so much that my over-engineered nonsense would end up on codepen's front page. but then I looked over the fence and found a healthy lifestyle without alcohol. now I can’t look at any css snippet without feeling pain. I even started thinking html tables were enough. just give me a11y. I don’t care about all your fancy dynamic layouts, animations, or endlessly improvable design systems. argh.
Working on refactoring types store and type snapshots store to do less allocations
Working on consolidating canonicalizeTypeAnno
and canonicalizeTypeAnnoToTypeVar
in Can
Moving single quote contents validation from can to tokenize: https://github.com/roc-lang/roc/pull/8064/files
@Richard Feldman you mentioned you wanted to parse actual values values during tokenization, could you please clarify how you thought it to be implemented? create a new collection in module and put parsed numbers there in a similar to interned logic way?
@Kiryl Dziamura yeah like how we do it in canonicalization currently
basically put it in extra
like we do with interned strings
(as in, add new entries to the extra
union for in-memory numbers like u8
, i8
, etc.)
extra
takes 64 bits now, but we would need 128 for numbers?
for bigger ones we can put them on the heap, like we do with interned strings
but almost all number literals should fit in 64 bits
yeah, I don't expect i128 be written as a literal (I hope it won't)
we have dec_small
for fitting float literals in less space
yeah I could actually see an argument for just storing 64-bit and smaller inline, and then if it's bigger than 64 bits, just store the full string
(which we'll need for arbitrary-sized int literals)
actually, do we really need to parse them straight away exactly for that reason? we don't know if the number is custom ahead of time. when we got there, we'd need to have normalized num literals rather than parsed numbers
I mean, we likely need to parse them during tokenization as a normalized num literal (NaN | +-Infinity | { value: +-digits, exponent: +-digits }
)
we don't have it right now, but I think it means we need to convert the digits to actual number at canonicalization anyway, right?
the plan is to never support infinity/-infinity/Nan literals; they are special constants exposed by the relevant modules
fair point about not knowing if the number is custom ahead of time, but 99.99% of the time they will be not custom and will fit in 64 bits, so I think we should optimize for that case and convert back if it turns out we were wrong :smile:
ok, so we want to get as much as we can from module file contents while it's in cache because later we want to touch heap anyway and it's better if this heap is optimized for the future cache rather than having random access to the module file contents via regions?
I'm asking because I'm learning along the way. I'm not experienced in system level development so my questions may be... simple :smile:
I'm looking at the build system with the goal to re-use modules for our different targets and also to help our generated docs be useful.
Refactoring to use modules and then setting up the tests to run on those modules has uncovered a heap of lurking bugs :bug:
I've been working on the WASM playground again :smiley:
I'm working on Eval for Lambdas
I have the simplest of expressions working so far (|x| x + 1)(5)
:smiley: -- but working on padding that out into something more complete
Taking a slight detour implementing unary_minus -x
... this Yak was just asking for a shave
https://github.com/roc-lang/roc/pull/8085
I'm still working on interpreter Eval... I'm learning as I go and also getting distracted building debug tooling. :smiley:
You can never have too much debug tooling :p
Goal: more lines of code in debug tools than in the compiler
That honestly would probably actually be a good thing if the debug tooling was good
#8104 DRAFT PR implements closure captures for the interpreter.
Tests are passing but I think there's some hacks in there we need to clean up.
I'll definitely go through it myself again a few times. But if anyone else has time I would appreciate any thoughts.
I've made some progress on the interpreter (I think) ... haven't got all my tests passing yet.
I also need to do a rebase on main now Richard's mammoth PR removing CIR landed.
I'd love to have something working to share in the online meetup :smiley:
Ok, I've merged main into https://github.com/roc-lang/roc/pull/8104 so CI should pass now
Last updated: Jul 26 2025 at 12:14 UTC