Stream: compiler development

Topic: zig compiler - parser

Sam Mohr (Feb 18 2025 at 01:49):

Joshua Warner said:

Sam Mohr the reason I'd like not to just directly have a header/statements is I want this to be able to directly parse individual expressions both for testing and for repl evaluation

Okay, then what's the best way to:

parse just the headers, finding declared packages (and ideally save the parse position)
parse a header and then a list of statements for canonicalization

Notification Bot (Feb 18 2025 at 02:22):

A message was moved here from #compiler development > zig compiler - spike by Joshua Warner.

Joshua Warner (Feb 18 2025 at 02:24):

@Sam Mohr Why do you need to just parse the header? My intent would be to make parsing fast enough that parsing the whole file is plenty fast.

Sam Mohr (Feb 18 2025 at 02:25):

We'd want to parse just the header of the root main.roc of each app/package/platform to get their dep packages to discover the file trees of each package

Sam Mohr (Feb 18 2025 at 02:25):

But if it's fast enough to just parse the whole thing, then no need

Sam Mohr (Feb 18 2025 at 02:26):

Also, if we persist said list of imported packages in the CanIR, then we can just pull it from the cache and keep around the CanIR until later

Joshua Warner (Feb 18 2025 at 02:29):

Cool. We can explore that optimization in the future. that does add a non-trivial amount of complexity though so I'd like to do the simple thing for now

Sam Mohr (Feb 18 2025 at 02:30):

Yeah, totally agreed

Joshua Warner (Feb 18 2025 at 03:37):

Ok, I think this test should get you going:

test "canonicalize" {
    const source =
        \\app [name] { pf: platform "main.roc" }
        \\
        \\name = "Luke"
    ;

    var parse_ir = parse.parse(std.testing.allocator, source);
    var can_ir = IR.init(std.testing.allocator);
    parse_ir.store.emptyScratch();
    canonicalize(&can_ir, &parse_ir, std.testing.allocator);
}

Joshua Warner (Feb 18 2025 at 03:38):

The primary problem with the test you posted is just that module headers are not currently supported... and the particular way error recovery happens right now is looping forever (whoops!), which made that harder than expected to figure out.

Joshua Warner (Feb 18 2025 at 03:39):

(whoops left some debug stuff in; now removed)

Sam Mohr (Feb 18 2025 at 03:56):

Okay, this is a good start!

Joshua Warner (Feb 18 2025 at 04:32):

If I wanted to start looking at intern'ing for symbols in the tokenizer - is there an existing structure you're expecting that data to go into?

Sam Mohr (Feb 18 2025 at 04:34):

The interners in the ModuleEnv https://github.com/roc-lang/roc/blob/26f9416929aa0cd52ca732fc533b4a94a690de04/src/base/ModuleEnv.zig#L22

Sam Mohr (Feb 18 2025 at 04:36):

Ideally, we'd put text in the "right" bucket:

idents in Ident.Store
tag names and type names (e.g. the List of List a) in TagName.Store
field names in FieldName.Store
string literals in StringLiteral.Store

Sam Mohr (Feb 18 2025 at 04:37):

But if that turns out to be difficult/not possible to do consistently during tokenization, then maybe we reduce down to just Ident.Store

Joshua Warner (Feb 18 2025 at 04:39):

I can split lower and upper idents, that's about it.

Sam Mohr (Feb 18 2025 at 04:39):

I figured

Joshua Warner (Feb 18 2025 at 04:39):

At the layer of hashing, do they need to be in separate buckets?

Sam Mohr (Feb 18 2025 at 04:39):

Not sure what you mean by the layer of hashing

Sam Mohr (Feb 18 2025 at 04:40):

I think we should just make 2 buckets, upper_idents and lower_idents

Joshua Warner (Feb 18 2025 at 04:42):

I'm thinking we can do something like:

Have everything in the same bucket (indexable in the same index space)
Tokenizer just throws everything in there and gets back an id, which it passes to the parser
The parser receives that id, knows more about how it's used, and can wrap that in a safe wrapper type - FieldNameIdx, TagNameIdx, IdentNameIdx, etc

Sam Mohr (Feb 18 2025 at 04:42):

That works!

Sam Mohr (Feb 18 2025 at 04:43):

We'd want those IDs to be pretty much as granular as possible, within reason

Joshua Warner (Feb 18 2025 at 04:43):

Can you expand on that? Do you mean for the purpose of types? Or is it beneficial to have the id space be compact?

Sam Mohr (Feb 18 2025 at 04:44):

Which should mean:

Ident.Idx
ModuleName.Idx
TagName.Idx
FieldName.Idx
TypeName.Idx
TypeVarName.Idx

Sam Mohr (Feb 18 2025 at 04:44):

For the purpose of types

Sam Mohr (Feb 18 2025 at 04:44):

Using a single bucket makes the most sense

Sam Mohr (Feb 18 2025 at 04:50):

I'm not actually sure how much benefit there is for having these distinct types compared to just LowerIdent.Idx and UpperIdent.Idx

Sam Mohr (Feb 18 2025 at 04:50):

The TagName and FieldName stuff is an artifact from the specialize_types prototype from @Agus Zubiaga

Sam Mohr (Feb 18 2025 at 04:52):

I think we can just start with a single SmallStringInterner that we have UpperIndex.Idx and LowerIdent.Idx both point into

Joshua Warner (Feb 19 2025 at 03:27):

In Ident.Store:

pub fn insert(self: *Store, ident: Ident, region: Region, problems: *std.ArrayList(Problem)) Idx {

... I'm a little worried about the overhead of alloc'ing an ArrayList of Problem for every ident.

I'm also somewhat concerned at doing the interning in the tokenizer based on this Ident type, since these:

/// Attributes of the identifier such as if it is effectful, ignored, or reassignable.
attributes: Attributes,

... are things we don't really know at this point in compiling and I'm hesitant to create a lie; I think that'll lead to issues down the road.

Furthermore:

/// Problems with the identifier
/// e.g. if it has two underscores in a row
/// or if it starts with a lowercase then it shouldn't be `lowerCamelCase`, it must be `snake_case`
problems: Problems,

These are things that IMO should be reported as diagnostics in the tokenizer, and none of the rest of the compiler should really need to track (or care about at all).

I'm thinking instead I'd like to do interning on a lower-level type that _only_ represents the text, giving a simple intern'd string id (u32) and nothing else.

Thoughts @Sam Mohr / @Luke Boswell ?

Richard Feldman (Feb 19 2025 at 03:47):

I assumed the point of passing in problems like that would be to just have one for the whole pass that gets passed around, and each call may or may not push problems onto it

Richard Feldman (Feb 19 2025 at 03:47):

(as opposed to allocating a new one each time)

Joshua Warner (Feb 19 2025 at 03:50):

Possibly

Joshua Warner (Feb 19 2025 at 03:51):

But given what's currently in Problems, I don't see why the design would warrant that

Joshua Warner (Feb 19 2025 at 03:52):

There is no need to dynamically allocate these. and the ones that are there right now probably ought to be just tokenizer diagnostics.

Joshua Warner (Feb 19 2025 at 03:52):

So I guess, the question is what might be in there in the future that would justify this?

Joshua Warner (Feb 19 2025 at 03:53):

I expect there was some discussion about this that I just was not part of and want to make sure I'm not missing anything

Luke Boswell (Feb 19 2025 at 04:13):

I don't think we've really discussed it in any detail. Sam has just been developing these data types and structures as a best effort based on what we know so far, so we have something to start with. We expect them to evolve a lot as we go.

Sam Mohr (Feb 19 2025 at 04:18):

Richard Feldman said:

I assumed the point of passing in problems like that would be to just have one for the whole pass that gets passed around, and each call may or may not push problems onto it

This

Sam Mohr (Feb 19 2025 at 04:19):

The reason I put the problems as a mutable reference was so that you knew as a caller that your problems on the Ident would be reported on interning

Sam Mohr (Feb 19 2025 at 04:19):

It prevents someone from forgetting to intern the problems

Sam Mohr (Feb 19 2025 at 04:20):

We could alternatively pass the problem reference to the Ident.Store, but that gets us closer to pointer jungle

Sam Mohr (Feb 19 2025 at 04:20):

So this seems less tangled from bird's eye view

Brendan Hansknecht (Feb 19 2025 at 04:22):

Minor note, this may be the wanted type for storing problems: https://ziglang.org/documentation/master/std/#std.BoundedArray

Luke Boswell (Feb 19 2025 at 04:22):

Do you think we're just going with UpperString and LowerString for interners, or are we having all the different variants?

Brendan Hansknecht (Feb 19 2025 at 04:23):

as opposed to slices and counts or growing arraylists

Joshua Warner (Feb 19 2025 at 04:42):

What's the use case for some later compiler stage knowing that an identifier has a subsequent_underscores problem? That should have already been reported in the tokenizer.

Luke Boswell (Feb 19 2025 at 04:45):

So we can report that to the user as a warning?

Luke Boswell (Feb 19 2025 at 04:46):

I'm not sure why we store a list of problems in the Ident though

Brendan Hansknecht (Feb 19 2025 at 04:46):

What's the use case for some later compiler stage knowing that an identifier has a subsequent_underscores problem?

I think the goal is to collect all problems and then dispatch in one place to decide exactly how everything is reported to the end users.

Luke Boswell (Feb 19 2025 at 04:46):

I thought all the Problems would be stored in the ModuleEnv

Luke Boswell (Feb 19 2025 at 04:47):

I'm guessing this field can be removed, because we store all the problems in ModuleEnv

/// Problems with the identifier
/// e.g. if it has two underscores in a row
/// or if it starts with a lowercase then it shouldn't be `lowerCamelCase`, it must be `snake_case`
problems: Problems,

Joshua Warner (Feb 19 2025 at 04:53):

Makes sense

Joshua Warner (Feb 19 2025 at 04:53):

How is this intended to work?

/// Identifier attributes such as if it is effectful, ignored, or reassignable packed into 3-bits.
pub const Attributes = packed struct(u3) {
    effectful: bool,
    ignored: bool,
    reassignable: bool,
};

Joshua Warner (Feb 19 2025 at 04:53):

Those are things that will be different for different instantiations of the same identifier

Joshua Warner (Feb 19 2025 at 04:54):

e.g. the same name used in as different local variables in two different functions

Luke Boswell (Feb 19 2025 at 04:54):

This is a very good point...

Joshua Warner (Feb 19 2025 at 04:54):

Actually

Joshua Warner (Feb 19 2025 at 04:55):

I guess reassignable would be the _ suffix, effectful would be the ! suffix, and ignored would be the _ prefix?

Joshua Warner (Feb 19 2025 at 04:55):

Maybe that is actually a textual property of the identifier

Joshua Warner (Feb 19 2025 at 04:59):

The tokenizer will actually know those already, so we can just pass those in pre-computed

Joshua Warner (Feb 19 2025 at 04:59):

Easy enough

Sam Mohr (Feb 19 2025 at 05:48):

Joshua Warner said:

Maybe that is actually a textual property of the identifier

This was the plan yes

Joshua Warner (Feb 19 2025 at 06:13):

How do those attributes apply to things like uppercase idents, field names, etc?

Sam Mohr (Feb 19 2025 at 06:17):

They don't, which is part of the reason why they used to be separate

Sam Mohr (Feb 19 2025 at 06:17):

But it seems to be cheap enough to just set the attributes to all false, AKA zero, and leave them in there

Sam Mohr (Feb 19 2025 at 06:21):

However, if those attributes are easier to parse unilaterally, then having them for non idents, e.g. for ignored field names in record builders, they'll just get ignored

Joshua Warner (Feb 19 2025 at 06:39):

Ok cool

Joshua Warner (Feb 19 2025 at 06:39):

Initial PR for integrating interning in the tokenizer: https://github.com/roc-lang/roc/pull/7624

Anthony Bullard (Feb 20 2025 at 14:20):

PR for parsing the new match expression and some more patterns: https://github.com/roc-lang/roc/pull/7626

Anthony Bullard (Feb 20 2025 at 14:21):

Make sure to keep an eye on this test, as it is the "everything implemented right now" test:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| {
        \\    match a {
        \\        Blue -> 47,
        \\        Green -> 19,
        \\        Red -> 12,
        \\        lower -> 1,
        \\        [1, 2, 3, .. as rest] -> 123,
        \\        3.14 -> 314,
        \\    }
        \\}
        \\
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Let me know if anything looks wrong here.

Anthony Bullard (Feb 20 2025 at 14:23):

I'll try to get Tags with payloads and record patterns parsing this afternoon (and probably Tuple patterns)

Anthony Bullard (Feb 20 2025 at 14:24):

My plan after that is Type decls and Type annotations

Anthony Bullard (Feb 20 2025 at 14:24):

And then the hard stuff - Tuples, Records, and BinOps

Anthony Bullard (Feb 20 2025 at 14:25):

If someone needs/wants something done sooner (like other headers), let me know - or feel free to put in a PR yourself and send it to me!

Anthony Bullard (Feb 20 2025 at 14:26):

The hold-up with Records is I am going to have to refactor Body/Block to be just another type of expression and not it's own distinct type of Node

Anthony Bullard (Feb 20 2025 at 14:28):

@Joshua Warner remind me, did we decide to implement a NoSpaceColon token that will be required for (expression) Record Fields?

Anthony Bullard (Feb 20 2025 at 14:28):

I think Record Fields in type annotations can be more forgiving

Anthony Bullard (Feb 20 2025 at 14:30):

I think the only real alternative would be something more drastic like inline type annotations which has been ruled out in the past by Richard

Richard Feldman (Feb 20 2025 at 15:26):

what if we made match consistent with if and have it just use curly braces instead of a special -> and , thing?

Richard Feldman (Feb 20 2025 at 15:26):

e.g. instead of this:

match_time = |a| {
    match a {
        Blue -> 47,
        Green -> 19,
        Red -> 12,
        lower -> 1,
        [1, 2, 3, .. as rest] -> 123,
        3.14 -> 314,
    }
}

Richard Feldman (Feb 20 2025 at 15:26):

...we do this:

match_time = |a| {
    match a {
        Blue { 47 }
        Green { 19 }
        Red { 12 }
        lower { 1 }
        [1, 2, 3, .. as rest] { 123 }
        3.14 { 314 }
    }
}

Richard Feldman (Feb 20 2025 at 15:27):

seems like it's just as easy to read, it's one less piece of syntax to learn, and if you want a multiline branch you're already all set up with the curly braces

Richard Feldman (Feb 20 2025 at 15:33):

it always annoys me in Rust having to switch back and forth between , and { :stuck_out_tongue:

Brendan Hansknecht (Feb 20 2025 at 15:45):

Hmm... I guess it is just a bit less distinct, but once you have mutiline blocks is common anyway

Brendan Hansknecht (Feb 20 2025 at 15:46):

Also, will lead to more common fake record syntax

Joshua Warner (Feb 20 2025 at 16:08):

@Anthony Bullard I think we do want NoSpaceColon for now. We can always have the parser accept either token in cases where it’s not ambiguous.

Richard Feldman (Feb 20 2025 at 16:51):

@Anthony Bullard how about we try the "no arrows" style above (as in, we just parse a pattern followed by an expression, and the formatter always chooses to put braces around that expression, just like with if) but if someone puts in -> or => or , we treat them like semicolons (warn, and then the formatter drops them)

Agus Zubiaga (Feb 20 2025 at 17:14):

That’s syntax is great!

Agus Zubiaga (Feb 20 2025 at 17:14):

Richard Feldman said:

it always annoys me in Rust having to switch back and forth between , and { :stuck_out_tongue:

I also hate this

Joshua Warner (Feb 20 2025 at 17:17):

Hmm, I'm a little worried about ambiguities without -> or similar

Joshua Warner (Feb 20 2025 at 17:18):

Blue { foo } could either be Blue -> {foo:foo}, or it could be Blue {foo} (i.e. a tag with a record pattern)... and then we need to keep parsing to find out what the branch body is.

Joshua Warner (Feb 20 2025 at 17:21):

Actually no, it's not the first thing. (brain fart)

Joshua Warner (Feb 20 2025 at 17:22):

It's ambiguous between {foo} being the body and {foo} being a record pattern with the body to come later.

Joshua Warner (Feb 20 2025 at 17:22):

You can keep going with that logic indefinitely

Joshua Warner (Feb 20 2025 at 17:23):

Err actually wait nvm ignore me; that'd be Blue{(foo})
EDIT, nope: Blue({foo})

Joshua Warner (Feb 20 2025 at 17:23):

:man_facepalming:

Brendan Hansknecht (Feb 20 2025 at 17:30):

beautifully cursed brackets {(})

Isaac Van Doren (Feb 20 2025 at 17:35):

I like that the syntax requires the braces so that you never have to decide if they should be included or not

Agus Zubiaga (Feb 20 2025 at 17:37):

It's a bit inconsistent with lambdas because it's still optional there

Agus Zubiaga (Feb 20 2025 at 17:37):

but I think I prefer it remains optional

Brendan Hansknecht (Feb 20 2025 at 17:48):

To remain optional would just be?

match_time = |a| {
    match a {
        Blue 47
        Green 19
        Red 12
        lower 1
        [1, 2, 3, .. as rest] 123
        3.14 314
    }

Richard Feldman (Feb 20 2025 at 17:48):

yeah, that's what I think it should be

Richard Feldman (Feb 20 2025 at 17:48):

but then the formatter adds braces for clarity

Richard Feldman (Feb 20 2025 at 17:48):

it's conceptually simple: you just alternate patterns and expressions, that's it

Agus Zubiaga (Feb 20 2025 at 17:48):

but not in lambdas

Agus Zubiaga (Feb 20 2025 at 17:49):

(the formatter doesn't add braces)

Richard Feldman (Feb 20 2025 at 17:49):

right, just in match

Richard Feldman (Feb 20 2025 at 17:49):

match { 1 2 1 2 1 2 } would technically be valid :stuck_out_tongue:

Richard Feldman (Feb 20 2025 at 17:49):

at least from a parsing perspective

Richard Feldman (Feb 20 2025 at 17:49):

with 1s being patterns and 2s being expressions

Agus Zubiaga (Feb 20 2025 at 17:49):

that might be nice too for typing speed

Richard Feldman (Feb 20 2025 at 17:49):

yeah that's the symmetry with if

Richard Feldman (Feb 20 2025 at 17:50):

where like if True False else True is valid

edit: was originally if True False True - forgot the else!

Richard Feldman (Feb 20 2025 at 17:50):

but the formatter adds braces for clarity

Brendan Hansknecht (Feb 20 2025 at 17:50):

if True False True how is this valid?

Richard Feldman (Feb 20 2025 at 17:50):

so if you want to take the shortcut when typing, you can

Brendan Hansknecht (Feb 20 2025 at 17:50):

No else of else if

Richard Feldman (Feb 20 2025 at 17:50):

oops, fixed!

Brendan Hansknecht (Feb 20 2025 at 17:50):

Brendan Hansknecht (Feb 20 2025 at 17:51):

I kinda hate the lack of grouping in all of this, but I guess the formatter readds it, so maybe ok

Richard Feldman (Feb 20 2025 at 17:51):

in general the theme is "braces are never required, but the formatter may add them"

Brendan Hansknecht (Feb 20 2025 at 17:51):

That said, I'm really not a fan of same line things without grouping

Agus Zubiaga (Feb 20 2025 at 17:51):

we could soft enforce them with a warning

Richard Feldman (Feb 20 2025 at 17:52):

could, but probably unnecessary

Brendan Hansknecht (Feb 20 2025 at 17:52):

Even

if True
    False

and

if True False

Feel like they shouldn't be allowed.

Richard Feldman (Feb 20 2025 at 17:52):

I like the conceptual simplicity though - braces are a special case of expressions

Brendan Hansknecht (Feb 20 2025 at 17:52):

I know

Richard Feldman (Feb 20 2025 at 17:52):

so anywhere you're using them as an expression, you could of course omit them

Brendan Hansknecht (Feb 20 2025 at 17:53):

But feels like conceptional simplicity being traded for allowing messy code

Brendan Hansknecht (Feb 20 2025 at 17:53):

Extra symbols and splitting things up can definitely help with readability

Richard Feldman (Feb 20 2025 at 17:53):

I hear that, but there's all sorts of messy code you can write if you don't use the formatter :laughing:

Richard Feldman (Feb 20 2025 at 17:53):

and I think we could reasonably add a warning if it seems like people are actually doing it in practice

Richard Feldman (Feb 20 2025 at 17:54):

my assumption is that they wouldn't

Brendan Hansknecht (Feb 20 2025 at 17:54):

idk...a lot of people like being terse way too much

Brendan Hansknecht (Feb 20 2025 at 17:54):

but yeah, probably just a limited few

Agus Zubiaga (Feb 20 2025 at 17:55):

I like the idea of allowing a lot of common mistakes to parse/run, but still discourage them with a warning, and have the formatter fix them automatically

Agus Zubiaga (Feb 20 2025 at 17:56):

it's the best of both worlds

Brendan Hansknecht (Feb 20 2025 at 18:46):

As long as the formatter never fails and can format all bad code consistently

Anthony Bullard (Feb 20 2025 at 21:39):

So basically I should just remove the need for the arrow, and in the formatter force braces?

Anthony Bullard (Feb 20 2025 at 21:39):

And remove the need for commas? @Richard Feldman

Anthony Bullard (Feb 20 2025 at 21:39):

I personally think we should keep commas between branches

Anthony Bullard (Feb 20 2025 at 21:40):

As it is an unbounded list of things

Anthony Bullard (Feb 20 2025 at 21:41):

Lists, Tuples, Record Fields, Function Args, Lambda Args, Exposes items, package entries, they all require , between the "items"

Anthony Bullard (Feb 20 2025 at 21:42):

Joshua Warner said:

Anthony Bullard I think we do want NoSpaceColon for now. We can always have the parser accept either token in cases where it’s not ambiguous.

How then are we going to distinguish between a record or a block starting with a type annotation with backtracking?

Luke Boswell (Feb 20 2025 at 21:59):

Should this be spun out into an ideas thread? #compiler development > zig compiler - parser @ 💬

I feel like we should put together some larger examples. My concern is the strangeness budget with removing the arrows which seem to be pretty universal in other languages. Not saying I couldn't get used to it, but just not sure discussion in the middle of a parser thread really meets our usual standard for these things.

Luke Boswell (Feb 20 2025 at 22:01):

The braces on a single line look a little strange to me { 4 } ... I like the braces for multiple lines though.

Richard Feldman (Feb 20 2025 at 22:03):

Anthony Bullard said:

Lists, Tuples, Record Fields, Function Args, Lambda Args, Exposes items, package entries, they all require , between the "items"

yeah, but switch statements don't, and when they're all on different lines, it just feels to me like a chore that doesn't add value

Richard Feldman (Feb 20 2025 at 22:04):

this does make me realize the formatter could add missing commas in lists and records though! :smiley:

Luke Boswell (Feb 20 2025 at 22:05):

I'm not sure if these are formative ideas, or decisions from our BDFN

Richard Feldman (Feb 20 2025 at 22:06):

fair point on starting a thread about -> in match, I'll write one up later

Richard Feldman (Feb 20 2025 at 22:07):

mainly I just saw the example code, thought "oh we definitely don't need commas" and then thought I'd mention that idea while I was at it - but it does deserve its own thread

Richard Feldman (Feb 20 2025 at 22:08):

the formatter being able to add missing commas in lists and records (and tuples!) is just an observation about a convenience we could add

Richard Feldman (Feb 20 2025 at 22:08):

not a syntax change, just error recovery

Anthony Bullard (Feb 20 2025 at 22:14):

I think we could do the same here

Anthony Bullard (Feb 20 2025 at 22:15):

Allow commas, but not require

Anthony Bullard (Feb 20 2025 at 22:15):

And the formatter add them

Anthony Bullard (Feb 20 2025 at 22:15):

This is what is currently implemented:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue { 47 },
        \\    Green { 19 },
        \\    Red { 12 },
        \\    lower { 1 },
        \\    [1, 2, 3, .. as rest] { 123 },
        \\    3.14 { 314 },
        \\}
        \\
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Anthony Bullard (Feb 20 2025 at 22:15):

I can't as a human parse the branches without at least the commas

Anthony Bullard (Feb 20 2025 at 22:16):

switch gets away without them because it has case keyword before each branch

Anthony Bullard (Feb 20 2025 at 22:17):

And usually in C-like languages : between the "pattern" and the block (and also many times a near-mandatory break; statements at the end of a case block.)

Anthony Bullard (Feb 20 2025 at 22:18):

Sometimes syntax that is unnecessary for machine parsing is very much necessary for human visual parsing

Anthony Bullard (Feb 20 2025 at 22:19):

To your point though Richard, I think we at a point where for machine parsing there is no need for commas in ANY collection-like syntactic construct

Anthony Bullard (Feb 20 2025 at 22:19):

They are there for the humans :-)

Luke Boswell (Feb 20 2025 at 22:33):

Richard Feldman said:

what if we made match consistent with if and have it just use curly braces instead of a special -> and , thing?

I made a PR for realword to see what this looks like https://github.com/rtfeldman/roc-realworld/pull/1

Luke Boswell (Feb 20 2025 at 22:33):

It's based on @Anthony Bullard's current implementation of match above. I'm not sure if the commas are optional but I included them.

Luke Boswell (Feb 20 2025 at 22:34):

I really like with zig how adding the comma lets the formatter know to make it multi-line or not... if we can borrow that feature I think that would be nice.

Luke Boswell (Feb 20 2025 at 22:35):

I think I missed a couple of matches ... I'll add those (done)

Luke Boswell (Feb 20 2025 at 22:44):

From making this PR.. I would say removing -> and just using braces feels good.

match client.query!(cmd) {
    Ok([row]) { Article.fromRow(row).(Ok) },
    Ok([]) { Err(NotFound) },
    Err(db_err) { Err(InternalErr(db_err.inspect())) },
}

Richard Feldman (Feb 20 2025 at 23:03):

Anthony Bullard said:

And the formatter add them

we could, I just don't see a benefit :sweat_smile:

Richard Feldman (Feb 20 2025 at 23:04):

if we're already adding braces on every branch, it's super clear where each one begins and ends

Richard Feldman (Feb 20 2025 at 23:04):

to put it another way, I don't see why }, is a better delimiter for every branch than just }

Anthony Bullard (Feb 20 2025 at 23:05):

That's fair

Anthony Bullard (Feb 20 2025 at 23:08):

Pushed a change to make commas optional, and not included in the formatted output

Anthony Bullard (Feb 20 2025 at 23:09):

match_time = |a| match a {
    Blue { 47 }
    Green { 19 }
    Red { 12 }
    lower { 1 }
    [1, 2, 3, .. as rest] { 123 }
    3.14 { 314 }
}

Richard Feldman (Feb 20 2025 at 23:16):

I made #ideas > `match` without `->` to discuss the design question!

Anthony Bullard (Feb 20 2025 at 23:43):

Tuple patterns are implemented in my PR, thought I'd get records done also in this little 30 minute stretch - but I hit a weird bug trying to fix an obvious tokenizer bug.

@Joshua Warner Currently when we read a (, we have whether we emit a NoSpaceOpenRound or OpenRound token flipped from how it should be. But when I fixed that, we get an infinite loop. These are impossible to debug in Zig with print debugging because the output with std.debug.print just never gets flushed. Could you take a look at that sometime? Not a blocker or high priority, but should be fixed I think

Anthony Bullard (Feb 20 2025 at 23:45):

Here's the currently implemented syntax...focusing on solely match here:

test "Syntax grab bag" {
    try statementFmtsSame( // This is a made-up function that doesn't actually exist :-)
        \\match_time = |a| match a {
        \\    Blue { 47 }
        \\    Green { 19 }
        \\    Red { 12 }
        \\    lower { 1 }
        \\    [1, 2, 3, .. as rest] { 123 }
        \\    3.14 { 314 }
        \\    (1, 2, 3) { 123 }
        \\}
    );
}

Joshua Warner (Feb 21 2025 at 04:31):

@Anthony Bullard I repro'd one infinite loop at least, and doing this works for me:

diff --git a/src/check/parse/Parser.zig b/src/check/parse/Parser.zig
index 5ba9d25af4..aec25d5179 100644
--- a/src/check/parse/Parser.zig
+++ b/src/check/parse/Parser.zig
@@ -577,7 +577,7 @@ pub fn parseExpr(self: *Parser) IR.NodeStore.ExprIdx {
     if (expr) |e| {
         var expression = e;
         // Check for an apply...
-        if (self.peek() == .OpenRound) {
+        if (self.peek() == .NoSpaceOpenRound) {
             const scratch_top = self.store.scratch_exprs.items.len;
             self.advance();
             while (self.peek() != .CloseRound) {
diff --git a/src/check/parse/tokenize.zig b/src/check/parse/tokenize.zig
index c831a5ef7a..e1e180aab7 100644
--- a/src/check/parse/tokenize.zig
+++ b/src/check/parse/tokenize.zig
@@ -992,7 +992,7 @@ pub const Tokenizer = struct {
                 '(' => {
                     self.cursor.pos += 1;
                     self.stack.append(.Round) catch exitOnOom();
-                    self.output.pushTokenNormal(if (sp) .NoSpaceOpenRound else .OpenRound, start, 1);
+                    self.output.pushTokenNormal(if (sp) .OpenRound else .NoSpaceOpenRound, start, 1);
                 },
                 '[' => {
                     self.cursor.pos += 1;

Joshua Warner (Feb 21 2025 at 04:33):

The trick I've taken to doing to debug hanging tests is:

zig build test
(wait for it to hang in the test)
hit ctrl+Z
pgrep -lf test to find the name of the test binary running: something like 5107 /Users/joshw/src/github.com/roc-lang/roc/.zig-cache/o/05e8b24ea2f7133aef9cb6aa47b421e7/test --listen=-
From there I can either run that binary myself, attaching lldb to the process, etc.

I assume there must be a better way to do this tho. Maybe @Andrew Kelley knows some tricks?

Anthony Bullard (Feb 21 2025 at 11:42):

I think you are right about the debugging process. I guess I need to improve my lldb skills. (Or just set up DAP in my Neovim)

Anthony Bullard (Feb 21 2025 at 11:43):

And I'm going to add the above patch to my PR

Anthony Bullard (Feb 21 2025 at 15:20):

Current syntax supported in my latest PR:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    {foo: 1, bar: 2} -> 12
        \\    {foo: 1, bar: 2 | 7} -> 12
        \\}
        \\
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Anthony Bullard (Feb 21 2025 at 23:40):

^ UPDATED THE ABOVE

Anthony Bullard (Feb 21 2025 at 23:40):

Only thing left in patterns is Tags with payloads

Sam Mohr (Feb 21 2025 at 23:40):

Can we move to => for match? Seemed like @Richard Feldman was pushing for that, but I may be mistaken

Sam Mohr (Feb 21 2025 at 23:40):

Oh yeah! Also

Anthony Bullard (Feb 21 2025 at 23:41):

....?

Sam Mohr (Feb 21 2025 at 23:41):

.. as rest might need to be allowable, but I think we were trying to prefer ..rest

Anthony Bullard (Feb 21 2025 at 23:41):

Ok, so as is dead?

Sam Mohr (Feb 21 2025 at 23:41):

No, which is a problem...

Anthony Bullard (Feb 21 2025 at 23:41):

And are we 100% on fat arrows for match?

Sam Mohr (Feb 21 2025 at 23:41):

Joshua Warner (Feb 21 2025 at 23:42):

I would prefer landing the current diff and iterating on it later :)

Sam Mohr (Feb 21 2025 at 23:42):

Good idea

Anthony Bullard (Feb 21 2025 at 23:42):

Yeah

Anthony Bullard (Feb 21 2025 at 23:42):

I've already implemented 5 syntaxes for match :rofl:

Luke Boswell (Feb 21 2025 at 23:42):

I thought I might spend a couple hours today tinkering with the Parse IR and see if I can make it generate an SExpr for a snapshot.

Sam Mohr (Feb 21 2025 at 23:43):

We actually probably don't want to support ..<pattern> in record destructs because we already have .. as <ident> and there shouldn't be two ways to do the same thing

Anthony Bullard (Feb 21 2025 at 23:46):

Wait .. is followed by a pattern?

Anthony Bullard (Feb 21 2025 at 23:46):

That doesn't seem right...

Sam Mohr (Feb 21 2025 at 23:46):

Yeah, I agree

Anthony Bullard (Feb 21 2025 at 23:46):

I think we should use ..<ident> in both lists and records

Sam Mohr (Feb 21 2025 at 23:46):

Well, if we support that in records, then we can do { foo: 123, ..rest } and also { foo: 123, .. as rest }

Anthony Bullard (Feb 21 2025 at 23:47):

Gotta go eat dinner, I'll try to pop back on and take care of that conflict

Sam Mohr (Feb 21 2025 at 23:47):

Maybe we can put a warning on the second

Anthony Bullard (Feb 21 2025 at 23:47):

I would rather just remove the second

Anthony Bullard (Feb 21 2025 at 23:47):

There is no advantage to it

Sam Mohr (Feb 21 2025 at 23:47):

Agreed, except for the fact that there's now a specific use of as that doesn't work

Anthony Bullard (Feb 21 2025 at 23:48):

I actually haven't even implemented Record Pattern rest

Anthony Bullard (Feb 21 2025 at 23:48):

Which is?

Sam Mohr (Feb 21 2025 at 23:48):

as is usually

pub const AsPattern = struct {
    pattern: Pattern,
    ident: Ident,
    region: Region,
};

Sam Mohr (Feb 21 2025 at 23:48):

But it can't be just Pattern anymore

Sam Mohr (Feb 21 2025 at 23:49):

If you can make it work though, go for it

Anthony Bullard (Feb 22 2025 at 00:28):

Yeah, I removed this in my PR and made as a part of rest

Anthony Bullard (Feb 22 2025 at 00:44):

My PR is rebased and ready for review

Anthony Bullard (Feb 22 2025 at 00:44):

Next up for me is Type Annotations and Declarations

Richard Feldman (Feb 22 2025 at 01:41):

Sam Mohr said:

Can we move to => for match? Seemed like Richard Feldman was pushing for that, but I may be mistaken

let's do it for now just to address the parsing ambiguity on Ok(a) if a->b->c

Anthony Bullard (Feb 22 2025 at 02:14):

No parsing ambiguity as I haven't even implemented record access / static dispatch, let alone -> (which we should have a name for)

Anthony Bullard (Feb 22 2025 at 02:14):

:smile:

Anthony Bullard (Feb 22 2025 at 02:15):

But I can put it with my current change, which actually I tried to sneak into the last PR but was too late

Anthony Bullard (Feb 22 2025 at 02:15):

(Tags with payload patterns)

Luke Boswell (Feb 22 2025 at 02:15):

@Anthony Bullard you may need to rebase any PR's, we just merged a change that changed a few things across the compiler

Anthony Bullard (Feb 22 2025 at 02:16):

I have no outstanding PRs, but thanks I'll rebase now!

Anthony Bullard (Feb 22 2025 at 02:16):

Did we land interning?

Sam Mohr (Feb 22 2025 at 02:16):

Yes

Anthony Bullard (Feb 22 2025 at 02:16):

Holy hell, 8 commits?

Sam Mohr (Feb 22 2025 at 02:16):

Josh landed a primitive version

Anthony Bullard (Feb 22 2025 at 02:16):

Ok, I've rebased....I'm holding on to my seat as I rebase

Luke Boswell (Feb 22 2025 at 02:16):

I'm currently trying to figure out how to get the top-level nodes out of the parse AST

test "example s-expr" {
    const source =
        \\module []
        \\
        \\foo = "bar"
    ;
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var env = base.ModuleEnv.init(&arena);
    var parse_ast = parse(&env, testing.allocator, source);
    defer parse_ast.deinit();

    var iter = parse_ast.store.nodes.iterIndices();
    while (iter.next()) |node| {
        std.debug.print("{}", .{parse_ast.store.getStatement(node)});
    }
}

Anthony Bullard (Feb 22 2025 at 02:17):

Use parse_ast.store.getFile(), and then follow on from there

Anthony Bullard (Feb 22 2025 at 02:17):

The file will have the header and statements, and if you read the types it should be easy to figure out what store method to use to get children

Anthony Bullard (Feb 22 2025 at 02:18):

I can talk later if you need help

Anthony Bullard (Feb 22 2025 at 02:18):

Actually, look at fmt.zig

Anthony Bullard (Feb 22 2025 at 02:18):

From formatFile on

Sam Mohr (Feb 22 2025 at 02:18):

We need mental institutionalization, not just help

Anthony Bullard (Feb 22 2025 at 02:18):

Your code should be similar in how you walk the AST

Joshua Warner (Feb 22 2025 at 02:18):

BTW I'm currently reworking string tokenizing to make a little more sense

Anthony Bullard (Feb 22 2025 at 02:18):

Sam Mohr said:

We need mental institutionalization, not just help

Some more than others :-)

Anthony Bullard (Feb 22 2025 at 02:19):

I have a 4-year-old

Sam Mohr (Feb 22 2025 at 02:19):

*You have a 4-year-old and mental problems

Luke Boswell (Feb 22 2025 at 02:22):

thread 12285665 panic: reached unreachable code --- :man_facepalming:

Luke Boswell (Feb 22 2025 at 02:23):

I shouldn't be allowed to touch these things

Anthony Bullard (Feb 22 2025 at 02:23):

Errr

Anthony Bullard (Feb 22 2025 at 02:23):

What’s the sourxe

Anthony Bullard (Feb 22 2025 at 02:23):

And your code

Anthony Bullard (Feb 22 2025 at 02:23):

And the error?

Anthony Bullard (Feb 22 2025 at 02:24):

@Luke Boswell

Luke Boswell (Feb 22 2025 at 02:24):

See your DM's

Agus Zubiaga (Feb 22 2025 at 02:25):

omg I didn’t take a screenshot of it, but Apple Intelligence summarized the notifications in this channel as “Anthony has mental problems; Luke panicked”

Luke Boswell (Feb 22 2025 at 02:26):

Fair

Agus Zubiaga (Feb 22 2025 at 02:26):

That’s one way to make me read Zulip

Sam Mohr (Feb 22 2025 at 02:27):

More crazy talk, got it

Anthony Bullard (Feb 22 2025 at 02:28):

I think it’s a fair summary

Richard Feldman (Feb 22 2025 at 02:32):

nobody else here knows it, but Agus went HAM on the recent Zed release, it was really impressive!

Luke Boswell (Feb 22 2025 at 02:33):

Big W for Zed

Richard Feldman (Feb 22 2025 at 02:33):

people who joined after him probably assume he's been at Zed for years based on that :laughing:

Sam Mohr (Feb 22 2025 at 02:33):

We are thankful for your efforts! It's been powering my collab with Luke for the last week or two

Luke Boswell (Feb 22 2025 at 02:33):

Cant say if this has improved productivity...

Anthony Bullard (Feb 23 2025 at 22:40):

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Syntax supported with my latest parser+formatter PR

Anthony Bullard (Feb 23 2025 at 22:41):

Next up: Record literal expressions

Anthony Bullard (Feb 23 2025 at 22:42):

After that: BinOps ( :scared: )

Richard Feldman (Feb 24 2025 at 00:04):

for binops, is Pratt Parsing on your radar already?

Anthony Bullard (Feb 24 2025 at 00:22):

you've mentioned it

Anthony Bullard (Feb 24 2025 at 00:22):

I'll read up on it

Sam Mohr (Feb 24 2025 at 00:42):

https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html

Andrew Kelley (Feb 24 2025 at 02:01):

Joshua Warner said:

The trick I've taken to doing to debug hanging tests is:

zig build test

(wait for it to hang in the test)

hit ctrl+Z

pgrep -lf test to find the name of the test binary running: something like 5107 /Users/joshw/src/github.com/roc-lang/roc/.zig-cache/o/05e8b24ea2f7133aef9cb6aa47b421e7/test --listen=-

From there I can either run that binary myself, attaching lldb to the process, etc.

I assume there must be a better way to do this tho. Maybe Andrew Kelley knows some tricks?

That workflow seems reasonable to me. Is there a way you can imagine it being improved with some zig tooling changes?

Joshua Warner (Feb 24 2025 at 02:09):

Couple thoughts: (1) It would be convenient if the binary name was stable so I don’t need to do the ctrl+z switcheroo. Or perhaps it could print the full path of the currently running binary. (2) it would be useful for the runner to do something like print the test name that’s running (perhaps after some minimum timeout). (3) it could also be useful to be able to run only a single test via passing the name of that test on the command line (or perhaps the name of the file, if the test is anonymous)

Andrew Kelley (Feb 24 2025 at 05:36):

(1) you can get if you drop down into zig build-exe CLI with -femit-bin=foobar
(2) you can get with --verbose
(3) is --test-filter with the direct CLI, and you can introduce a -D build option into your build script and set the test filter there

if you combine (1) and (2) you can get what you want I think

one more random tip, always delete the --listen=- arg when running manually, unless you want to speak the Build Runner Protocol

Andrew Kelley (Feb 24 2025 at 05:37):

so to be clear you can use --verbose to get the zig build-exe or zig test command, then you can copy paste that, delete --listen=- add -femit-bin=foobar and then you have yourself the desired workflow

Andrew Kelley (Feb 24 2025 at 05:37):

as for --test-filter note that the filter is applied very early, so you can use this when doing large refactorings to only test a subset of unit tests even if the rest of your app is not compiling

Anthony Bullard (Feb 24 2025 at 14:48):

I'm running into a spot where Zig just doesn't enjoy passing functions. I want to have this helper:

fn parseCollection(self: *Parser, comptime T: type, end_token: Token.Tag, scratch: *std.ArrayList(T), parser: fn (*Parser) T) ExpectError!usize {
    const scratch_top = scratch.items.len;
    while (self.peek() != end_token) {
        scratch.append(parser(self)) catch exitOnOom();
        self.expect(.Comma) catch {
            break;
        };
    }
    self.expect(end_token) catch {
        return ExpectError.expected_not_found;
    };
    return scratch_top;
}

Since that exact kind of code happens at least 19 times in the parser. But if one of the parsers takes an argument, I understand I need to create a new top-level function that can be called with zero args here (calling the real function with a specific value for it's arg). But when you have recursion, usually the function is parametric on that argument, so we need to be able to call the correct parser when we recurse. Like this:

const parser: fn (*Parser) IR.NodeStore.PatternIdx = if (alternatives == .alternatives_allowed) parsePatternWithAlts else parsePatternNoAlts;

But no matter what I do, that gives me the following error:

check/parse/Parser.zig:449:58: error: value with comptime-only type 'fn (*check.parse.Parser) check.parse.IR.NodeStore.PatternIdx' depends on runtime control flow
    const parser: fn (*Parser) IR.NodeStore.PatternIdx = if (alternatives == .alternatives_allowed) parsePatternWithAlts else parsePatternNoAlts;
                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I guess i don't understand the argument here. The type of parser is always the same, and can be verified correctly at comptime. So why is this not allowed?

Anthony Bullard (Feb 24 2025 at 14:49):

Now obviously if we went with #ideas > ✔ `or` instead of `|` in `when` branches (or one of the similar proposals) then this problem goes away because there is no longer contention between pattern alternatives and |...| lambda args.

Richard Feldman (Feb 24 2025 at 14:53):

let's just try that

Richard Feldman (Feb 24 2025 at 14:53):

if we don't like or for alternatives in practice, we can always reconsider, but it seems like a reasonable design in its own right

Anthony Bullard (Feb 24 2025 at 14:57):

If you are OK with it, I will do it

Richard Feldman (Feb 24 2025 at 15:05):

go for it

Richard Feldman (Feb 24 2025 at 15:06):

worst-case scenario is that we find out we don't like it, which would justify the more involved implementation

Joshua Warner (Feb 24 2025 at 15:55):

@Anthony Bullard I wonder if instead of a function pointer, you could do a comptime enum for all the collection variants? Or perhaps you can pass in the desired return type as comptime, and you call ::parse() on that?

Brendan Hansknecht (Feb 24 2025 at 16:20):

The issue is that a function is a comptime only type

Brendan Hansknecht (Feb 24 2025 at 16:21):

You want to use a function pointer if the value can change at runtime

Brendan Hansknecht (Feb 24 2025 at 16:21):

Zig does not have first class functions or lambdas or closures

Brendan Hansknecht (Feb 24 2025 at 16:22):

I think the original code just needs *const fn ...

Andrew Kelley (Feb 24 2025 at 22:28):

I think you're gonna have a better time if you always pass functions comptime unless you're doing the vtable pattern

Andrew Kelley (Feb 24 2025 at 22:28):

the optimizer will definitely have a better time

Andrew Kelley (Feb 24 2025 at 22:30):

also I recognize this is more of a functional/imperative preference thing, but I reworked a contributor's code that used that parser function pointer pattern in zig's parser to simply not do that, and felt like the result was better. yeah it technically is less DRY

Andrew Kelley (Feb 24 2025 at 22:30):

anyway the comptime suggestion is irrelevant to that though

Anthony Bullard (Feb 24 2025 at 23:08):

Brendan Hansknecht said:

I think the original code just needs *const fn ...

Interesting .. why did I think that Zig did not support function pointers? I think moving towards or for pattern alternation will make a lot of things clearer in OUR grammar anyway, open up more places where it can be used, and also make this code a lot easier to understand.

Richard Feldman (Feb 24 2025 at 23:15):

yeah like in my mind both are totally reasonable options as an end user, just with some different tradeoffs, and since we've only tried one but have an implementation reason to try out the other, seems like a fine excuse to try it out and see how we like it in practice

Anthony Bullard (Feb 25 2025 at 12:12):

#7638 Parsing Records and Tuples

Anthony Bullard (Feb 25 2025 at 12:12):

Support syntax:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    # New!
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    # New!
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Anthony Bullard (Feb 25 2025 at 12:13):

Next up: BinOps (via Pratt Parsing)

Anthony Bullard (Feb 25 2025 at 12:14):

(Note: I will come back to replace | with or, I want to push the supported syntax forward right now, and I think that BinOps are just critical to getting that done)

Anthony Bullard (Feb 25 2025 at 12:15):

After BinOps, I think you should be able to parse some simple programs

Andrew Kelley (Feb 25 2025 at 23:12):

That unit test looks familiar :)

Anthony Bullard (Feb 26 2025 at 13:52):

BinOps have landed (in my PR) - using Pratt Parsing which was soooooo easy (thanks @Richard Feldman ):

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\expect blah == 1
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1
        \\    blah = 1
        \\    blah == foo
        \\}
    );
}

Anthony Bullard (Feb 26 2025 at 13:56):

This test for the binops may also be helpful for people to correct any precedence or associativity errors I may have made:

test "BinOp omnibus" {
    const expr = "Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5";
    const expr_sloppy = "Err(foo)??12>5*5 or 13+2<5 and 10-1>=16 or 12<=3/5";
    const formatted = "((((Err(foo) ?? 12) > (5 * 5)) or (((13 + 2) < 5) and ((10 - 1) >= 16))) or (12 <= (3 / 5)))";
    try exprFmtsSame(expr, .no_debug);
    try exprFmtsTo(expr_sloppy, expr, .no_debug);
    try exprFmtsTo(expr, formatted, .debug_binop);
    try exprFmtsTo(expr_sloppy, formatted, .debug_binop);
}

Anthony Bullard (Feb 26 2025 at 13:57):

Not that the ()-laden formatted variable is using a debug flag in the formatter to show the boundaries of the individual operations (until we have full SExpr IR support here)

Sam Mohr (Feb 26 2025 at 17:48):

Anthony, I'm gonna have to break your legs

Sam Mohr (Feb 26 2025 at 17:48):

You're going too fast

Anton (Feb 26 2025 at 17:52):

Drastic actions :p

Anthony Bullard (Feb 26 2025 at 18:26):

Sam Mohr said:

You're going too fast

I’m only doing this an hour (or two) a day! I’ve felt like I’m moving at a glacial pace, so thanks for the pick-me-up

Anthony Bullard (Feb 26 2025 at 18:31):

But don’t feel bad - adding expect, crash, and return statements is causing a lot of problems

Sam Mohr (Feb 26 2025 at 18:41):

If there's problems from requiring return to not have statements after, then feel free to make the canonicalize code handle it

Sam Mohr (Feb 26 2025 at 18:41):

I presume that's not the main issue, though

Anthony Bullard (Feb 26 2025 at 18:52):

No I haven’t figured it out, hit it right before work so I’ll find out tonight or in the morning

Richard Feldman (Feb 26 2025 at 19:35):

oh yeah canonicalize should definitely handle that imo :big_smile:

Anthony Bullard (Feb 26 2025 at 23:56):

Updated the syntax for my PR above to include expect, crash, and return

Anthony Bullard (Feb 26 2025 at 23:56):

I think import exposing and the rest of the headers should be next

Anthony Bullard (Feb 26 2025 at 23:57):

And then we need to talk about static dispatch and record access

Anthony Bullard (Feb 26 2025 at 23:57):

I want to treat it similar to binop, but not being a binop

Anthony Bullard (Feb 26 2025 at 23:58):

It's <expr><apply args><try suffix><dot access><binop....> in terms of (not real) binding power

Anthony Bullard (Feb 26 2025 at 23:59):

Or maybe try suffix goes last...

Anthony Bullard (Feb 26 2025 at 23:59):

Do we want to support some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?

Brendan Hansknecht (Feb 27 2025 at 00:00):

That code looks like it should be valid

Anthony Bullard (Feb 27 2025 at 03:11):

Brendan Hansknecht said:

That code looks like it should be valid

And it is so....

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\expect blah == 1
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    # NEW!!!
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1
        \\    blah = 1
        \\    blah == foo
        \\}
    );
}

Anthony Bullard (Feb 27 2025 at 03:15):

I forgot I have unary ops (easy), and record updaters and record builders. What are the current status and agreed-upon syntax for them? The same as in the current alpha?

Luke Boswell (Feb 27 2025 at 03:15):

Yeah I think there's no change to Record Builder

Anthony Bullard (Feb 27 2025 at 03:17):

Are we doing something with .. for record update?

Luke Boswell (Feb 27 2025 at 03:17):

See https://github.com/roc-lang/roc/issues/7091

Anthony Bullard (Feb 27 2025 at 03:18):

Ah...ok, so it's like List rest

Luke Boswell (Feb 27 2025 at 03:18):

Or it's parent Issue https://github.com/roc-lang/roc/issues/7106 "Syntax Changes"

Anthony Bullard (Feb 27 2025 at 03:18):

Oooo, the ellipsis keyword! Almost forgot about that!

Anthony Bullard (Feb 27 2025 at 03:19):

Ok, I'll make sure this all gets done

Anthony Bullard (Feb 27 2025 at 03:19):

That'll leave me with Custom Types and the replacement syntax for abilities

Anthony Bullard (Feb 27 2025 at 03:19):

Do you have the issue handy for that? Or is there one?

Luke Boswell (Feb 27 2025 at 03:20):

https://github.com/roc-lang/roc/issues/7458

Anthony Bullard (Feb 27 2025 at 03:20):

I guess I just look at the static dispatch doc...

Anthony Bullard (Feb 27 2025 at 03:24):

And this is the doc for Custom Types...

https://docs.google.com/document/d/10OFeNl9KAYAErajE0Wio4AAR66yM2u13bku0mTUawVk/edit?tab=t.0

Anthony Bullard (Feb 27 2025 at 03:26):

If there has been any meaningful changes to this proposal (the thread had no concrete takeaways that were scannable), let me know

Luke Boswell (Feb 27 2025 at 03:32):

I'm wondering if we still need the * in module [User.*, SomethingElse, …etc]. Didn't we discuss somewhere that fields in a Custom Record would always be public.

Richard Feldman (Feb 27 2025 at 03:39):

for nominal types, the only one we're going to support is tag unions (not records or tuples after all)

Richard Feldman (Feb 27 2025 at 03:39):

and they do still need the .* syntax for making the tags optionally public

Luke Boswell (Feb 27 2025 at 03:47):

That should be the only use of * then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451

Anthony Bullard (Feb 27 2025 at 14:09):

Luke Boswell said:

That should be the only use of * then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451

PR for this: https://github.com/roc-lang/roc/pull/7642

Anthony Bullard (Feb 27 2025 at 14:20):

Anthony Bullard said:

Luke Boswell said:

That should be the only use of * then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451

PR for this: https://github.com/roc-lang/roc/pull/7642

Added ... expression to the above

Anthony Bullard (Feb 27 2025 at 14:21):

Update from the above PR:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\expect blah == 1
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    # NEW!!!
        \\    ...
        \\    match_time(...)
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1
        \\    blah = 1
        \\    blah == foo
        \\}
    );
}

Luke Boswell (Mar 07 2025 at 07:13):

Ok... so what should be happening with malformed nodes?

For example say I have a roc file that is literally

modZZZ [foo]

foo = 1

I have a malformed header here and so expect there to be an error pushed into the list of Diagnostics and a malformed node added to the AST.

Now if I want to format this AST ... what should happen when I get to this point?

pub fn getHeader(store: *NodeStore, header: HeaderIdx) Header {
    const node = store.nodes.get(@enumFromInt(header.id));
    switch (node.tag) {
        .app_header => { ... },
        .module_header => { ... },
        else => {
            std.debug.panic("Expected a valid header tag, got {s}", .{@tagName(node.tag)});
        },
    }
}

We call getHeader and then it blows up because we don't have a malformed node.

Maybe we might actually want to add a malformed node here... so I gave that a try. Heres a PR that does that for just the header https://github.com/roc-lang/roc/pull/7672

Am I on the right track?

Luke Boswell (Mar 07 2025 at 07:14):

Side note -- I think I found a memory leak in the error diagnostics here by accident.

$ zig build snapshot
warning: file /Users/luke/Documents/GitHub/roc/src/snapshots/003.txt: contained 1 errors, skipping
info: processed 2 snapshots in 2 ms.
error(gpa): memory address 0x102928700 leaked:
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1081:62: 0x102873d1f in ensureTotalCapacityPrecise (snapshot)
                const new_memory = try allocator.alignedAlloc(T, alignment, new_capacity);
                                                             ^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1058:51: 0x1028628e3 in ensureTotalCapacity (snapshot)
            return self.ensureTotalCapacityPrecise(allocator, better_capacity);
                                                  ^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1111:41: 0x102847daf in addOne (snapshot)
            try self.ensureTotalCapacity(allocator, newlen);
                                        ^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:848:49: 0x10281b687 in append (snapshot)
            const new_item_ptr = try self.addOne(allocator);
                                                ^
/Users/luke/Documents/GitHub/roc/src/check/parse/Parser.zig:121:28: 0x1028462ef in pushMalformed__anon_11980 (snapshot)
    self.diagnostics.append(self.gpa, .{
                           ^
/Users/luke/Documents/GitHub/roc/src/check/parse/Parser.zig:943:38: 0x10281a103 in parseExprWithBp (snapshot)
            return self.pushMalformed(IR.NodeStore.ExprIdx, .unexpected_token);
                                     ^

Anthony Bullard (Mar 07 2025 at 15:59):

This is a great question for @Joshua Warner since he designed the malformed node system. Ideally it would just be something aggressive like (MALFORMED <tag>)

Anthony Bullard (Mar 07 2025 at 16:04):

Need some alignment on syntax supported in exposes/exposing list(both in headers and in imports)

lower and upper idents obviously
they can have alias using as (?)
they can have a . with either * or curlies around a comma separated list of lower and upper idents
Can Those ^ have aliases?

import SomeModule exposing [
    lowerIdent!,
    UpperIdent,
    something as something_else, # ???
    Foo.*,
    Bar.{ fn_one, fn_two },
    Baz.{ fn_one as function_one, fn_two as function_two }, #???
]

Anthony Bullard (Mar 07 2025 at 16:04):

And are there differences worth worrying about between headers and imports?

Anthony Bullard (Mar 07 2025 at 16:05):

(At least for parsing)

Anthony Bullard (Mar 07 2025 at 16:05):

@Richard Feldman since you are the keeper of the syntax :-)

Sam Mohr (Mar 07 2025 at 17:30):

Those last two don't exist anymore, right?

Richard Feldman (Mar 07 2025 at 17:42):

also .* should not be allowed in imports

Richard Feldman (Mar 07 2025 at 17:43):

only in the specific case of a module header that's exposing a nominal tag union

Sam Mohr (Mar 07 2025 at 17:43):

You're thinking of glob exposes for custom unions, Anthony

Anthony Bullard (Mar 07 2025 at 18:44):

Probably. I think I forgot custom types became a tag union rather than a record

Anthony Bullard (Mar 07 2025 at 18:46):

import SomeModule exposing [
    lowerIdent!,
    UpperIdent,
    something as something_else,
]

And

module exposing [
    lowerIdent!,
    UpperIdent,
    something as something_else,
    Foo.*,
]

Anthony Bullard (Mar 07 2025 at 18:47):

I think I’ll parse the same and let Can throw an error on .* in an import

Sam Mohr (Mar 07 2025 at 18:57):

That all looks right to me, except for module exposing [a as b], not sure if that's planned for support

Joshua Warner (Mar 07 2025 at 18:59):

Anthony Bullard said:

This is a great question for Joshua Warner since he designed the malformed node system. Ideally it would just be something aggressive like (MALFORMED <tag>)

Yep this is exactly what I was thinking

Anthony Bullard (Mar 08 2025 at 00:53):

Sam Mohr said:

That all looks right to me, except for module exposing [a as b], not sure if that's planned for support

If it's not, it should :-)

Brendan Hansknecht (Mar 08 2025 at 00:55):

Renaming on expose? Why?

Brendan Hansknecht (Mar 08 2025 at 00:55):

Feels like that is a sign the function was simply named wrong

Anthony Bullard (Mar 08 2025 at 00:56):

It's not uncommon to encounter a situation where more than one import exposes a function/type that has the same name that you may want to use without namespacing

Anthony Bullard (Mar 08 2025 at 00:57):

But now that I think of it, maybe that mostly disappears with SD

Anthony Bullard (Mar 08 2025 at 00:57):

I think of multiple types having a map function for instance

Anthony Bullard (Mar 08 2025 at 00:58):

Ok, so the only different thing available here is the .* syntax in headers

Sam Mohr (Mar 08 2025 at 00:58):

It would make SD more complicated than "the module needs a function named xyz" to allow renaming two things to the same name in the same module

Anthony Bullard (Mar 08 2025 at 00:58):

Man that seems like such a silly thing to have to create a new node type for

Sam Mohr (Mar 08 2025 at 00:58):

And besides that, there's no value to doing it there besides at the module name

Sam Mohr (Mar 08 2025 at 00:59):

So I vote actively not adding that feature

Brendan Hansknecht (Mar 08 2025 at 00:59):

My specific comment is rename on exposing. Rename on import makes sense to me

Sam Mohr (Mar 08 2025 at 00:59):

Anthony Bullard said:

Man that seems like such a silly thing to have to create a new node type for

copium

Anthony Bullard (Mar 08 2025 at 00:59):

Brendan Hansknecht said:

My specific comment is rename on exposing. Rename on import makes sense to me

I agree with you on the header

Anthony Bullard (Mar 08 2025 at 00:59):

Sam Mohr said:

Anthony Bullard said:

Man that seems like such a silly thing to have to create a new node type for

copium

:stuck_out_tongue_closed_eyes:

Richard Feldman (Mar 08 2025 at 05:44):

the benefit is to avoid shadowing

Richard Feldman (Mar 08 2025 at 05:45):

let's say I have a module named Parser and I want to expose Parser.str but I also want to use the name str all over the place in Parser.roc

Richard Feldman (Mar 08 2025 at 05:46):

as in the module header's exposing means I can locally name it inner_str (or whatever) and then expose inner_str as str so everyone outside the module gets the nice name of Parser.str but inside Parser.roc I still get to use the name str for argument names etc.

Brendan Hansknecht (Mar 08 2025 at 06:04):

I guess

Brendan Hansknecht (Mar 08 2025 at 06:04):

I feel like that is rare

Brendan Hansknecht (Mar 08 2025 at 06:05):

And if you really need it, you can make Parser.roc that reexports things form InnerParser.roc

Brendan Hansknecht (Mar 08 2025 at 06:05):

No strong feels, just feel unnecessary.

Anthony Bullard (Mar 08 2025 at 15:28):

I feel like, reading @Richard Feldman 's message, that I should implement as in exposing

Luke Boswell (Mar 08 2025 at 22:07):

I think I can parse a crash at the top level.. but not in a def.

This blows up because it makes that a malformed node

module []

thing =
    crash "something"

But this is fine

module []

crash "something"

Luke Boswell (Mar 08 2025 at 22:11):

Actually I think it's the same for expect and return too

Sam Mohr (Mar 08 2025 at 22:24):

That shouldn't break, yes, but I think in this case it's looking for braces

Sam Mohr (Mar 08 2025 at 22:25):

Which I think is good, because it keeps statement-only behaviors from working as expressions

Sam Mohr (Mar 08 2025 at 22:25):

Though a proper error message would be nice

Luke Boswell (Mar 08 2025 at 22:26):

I thought we decided those wouldn't have braces -- edit err parens (?

Luke Boswell (Mar 08 2025 at 22:27):

Interesting... so this parses correctly

module []

foo = { crash "something" }

But the formatter removes the braces {, and then it blows up

Sam Mohr (Mar 08 2025 at 22:28):

Richard Feldman said:

here is a concrete proposal that I'm happy with. If anyone would rather we didn't do this, please say why!

Change the way ?? return "" formats to have the formatter add braces so it becomes ?? { return "" } and same with crash. No changes to the semantics of anything involved in this; it's purely to address the concern over it being not visually obvious enough what the code does.

Introduce if Ok(path) = paths.first() { pattern matching. We explored a bunch of alternatives and this still feels like the best solution to the problem at the top of the thread.

that's it, no other changes. Both of these are addressing specific ergonomics concerns with the status quo, and are not trying to go back to the drawing board and reconsider everything.

Sam Mohr (Mar 08 2025 at 22:28):

Okay, I was changing remembered history in bias of my preferences

Sam Mohr (Mar 08 2025 at 22:28):

Classic

Anthony Bullard (Mar 08 2025 at 23:48):

Luke Boswell said:

Interesting... so this parses correctly
module []

foo = { crash "something" }
But the formatter removes the braces {, and then it blows up

This is my fault. Kind of. Crash is a statement but not an expression right now, and statements are only parsed in the top level and inside of blocks

But the formatter removes the curlies form single statement blocks - assuming they are just an expression

Anthony Bullard (Mar 08 2025 at 23:49):

The only way to resolve this in the parser would be to lookup what kind of statement is in the single statement blocks to ensure it is a Expr

Anthony Bullard (Mar 08 2025 at 23:50):

But I can’t push anything today due to a day long outage by Comcast in near-west Chicago

Luke Boswell (Mar 08 2025 at 23:57):

It's alg. I'm just noting things as I find them.

I'm often not sure if its me, the parser impl or what the intended syntax is.

It's not urgent or anything.

Luke Boswell (Mar 09 2025 at 22:13):

I've just been looping with the fuzzer and fixing bugs one by one in https://github.com/roc-lang/roc/pull/7672

I'm not 100% my approach is right so I'm just keeping it in draft until I get some feedback or we have a clear direction for handling errors.

I'm just picking the first fuzz crash, fixing that and making a snapshot test for it, then moving on to the next.

I'm expecting there will be changes required and happy to do that later. But I figure it doesn't hurt to just keep going for now.

Luke Boswell (Mar 11 2025 at 22:05):

I just noticed we have two different Region types. One has indexes into the token buffer, so token indexes. The other has indexes into the source code bytes.

Should we rename one of these, e.g. TRegion or something to reflect it's spanning the tokens instead?

Luke Boswell (Mar 11 2025 at 22:07):

The other idea I have, is to move the Region that indexes tokens, under Token.. so it's a Token.Region instead

Sam Mohr (Mar 11 2025 at 22:21):

It would be nice to unify them, yes. I think the other suggestion from @Joshua Warner was to make Region be a Node.Idx. That would allow us to use half the memory (we only need one u32 value instead of a start and end that are both u32), though some diagnostics might need to store more Regions when they refer to complex regions.

Sam Mohr (Mar 11 2025 at 22:22):

At the very least, I agree that renaming one of the two Regions in our codebase would mitigate some confusion.

Brendan Hansknecht (Mar 11 2025 at 22:38):

I think the other suggestion from @Joshua Warner was to make Region be a Node.Idx.

Does this work in general? Won't a region in the parser be a range of tokens rather than a single token?

Brendan Hansknecht (Mar 11 2025 at 22:39):

Like the region of an expression will include many many tokens

Sam Mohr (Mar 11 2025 at 22:43):

Yes, it would hold many tokens

Sam Mohr (Mar 11 2025 at 22:44):

The tradeoff is that we can store everything we need a Region to do, which is display/highlight code for diagnostics, in 32 bits

Sam Mohr (Mar 11 2025 at 22:44):

But at the cost of needing the parse AST to figure out what a Region refers to

Luke Boswell (Mar 11 2025 at 22:45):

If there is a tradeoff here, I lean towards keeping it brain dead simple for now... until we have a working compiler

Sam Mohr (Mar 11 2025 at 22:45):

Agreed

Sam Mohr (Mar 11 2025 at 22:45):

Though they both seem simple to me

Sam Mohr (Mar 11 2025 at 22:47):

~~Option 1: two Region types, Region = { start: u32, len: u32 } and refers to a slice of the source code, TokenRegion is separate for the tokenizer~~

Sam Mohr (Mar 11 2025 at 22:49):

Okay, looking now at the Parse.IR.Region type, they can definitely be the same thing

Sam Mohr (Mar 11 2025 at 22:49):

/// The first and last token consumed by a Node
pub const Region = struct {
    start: TokenIdx,
    end: TokenIdx,
};

defined here

Sam Mohr (Mar 11 2025 at 22:51):

I think we should either use Region = Parse.IR.Region or Region = Node.Idx

Sam Mohr (Mar 11 2025 at 22:54):

Both of them assume that tokenizing/parsing has happened, but unless we either:

keep a copy of each source file of memory, or
trust that the contents of the source file haven't changed between the initial parse and the reporting of diagnostics

Even the Region = { start: u32, len: u32 } solution is fragile to source-code changes

Brendan Hansknecht (Mar 11 2025 at 22:55):

~~If we get diagnostics, we could rip out the relevant text before closing the file~~ Nvm, late stages might have diagnostics...was just thinking about tokenize and parse diagnostics

Sam Mohr (Mar 11 2025 at 22:55):

What about diagnostics (which I'm using interchangeably with errors/warnings) for type errors?

Sam Mohr (Mar 11 2025 at 22:55):

yep

Brendan Hansknecht (Mar 11 2025 at 22:56):

I think we just reopen the file and if it changed, oh well

Sam Mohr (Mar 11 2025 at 22:56):

Yeah, not worth making our compiler more memory-bloated to handle this

Brendan Hansknecht (Mar 11 2025 at 22:57):

Also, keeping original source ranges enables use to just drop the tokenized buffer. So I lean towards that.

Sam Mohr (Mar 11 2025 at 22:57):

The plan was to just tokenize/parse a second time

Sam Mohr (Mar 11 2025 at 22:57):

If we do that, then we can use a single u32

Brendan Hansknecht (Mar 11 2025 at 22:58):

The start offset to tokenize/parse from?

Sam Mohr (Mar 11 2025 at 22:58):

The Parse.IR.Store.Node.Idx

Brendan Hansknecht (Mar 11 2025 at 22:59):

Oh, and just assume that reparsing will generate the exact same node. Then from the node get the start and end token and from those get the region start and end bytes

Sam Mohr (Mar 11 2025 at 22:59):

Yep!

Sam Mohr (Mar 11 2025 at 23:00):

I don't believe that would be much harder than the current approach, and neither solution is implemented yet since error rendering is entirely untouched

Brendan Hansknecht (Mar 11 2025 at 23:00):

Just trying to think about the implications of llvm debug info generation (and interpreter single stepping and printing surrounding context).

Sam Mohr (Mar 11 2025 at 23:01):

The interpreter would probably want to keep a parse AST for every file, or even the source of the file at time of interpretation

Brendan Hansknecht (Mar 11 2025 at 23:01):

For llvm debug info means that we will reparse every single file (but this reduce max memory requirement of the compiler and llvm is really just for optimized builds so reparsing should be really cheap).

Sam Mohr (Mar 11 2025 at 23:02):

I was suggesting to @Joshua Warner that we used to parse just the header of files when looking for package dependencies, and he said "I'm hoping parsing is so fast that we don't need to worry about partial parsing"

Brendan Hansknecht (Mar 11 2025 at 23:02):

Sam Mohr said:

The interpreter would probably want to keep a parse AST for every file, or even the source of the file at time of interpretation

Yeah, quite possibly. Though don't want to ruin memory use due to a single gigantic file. So might want to reload only if needed. Cause execution flow doesn't need any line info, only repl and debug flows.

Sam Mohr (Mar 11 2025 at 23:02):

I think parsing can be super fast

Brendan Hansknecht (Mar 11 2025 at 23:03):

yeah, all sounds reasonable

Brendan Hansknecht (Mar 11 2025 at 23:03):

I'll be really curious to bench parsing against the rust compiler

Sam Mohr (Mar 11 2025 at 23:03):

Benchmarking of just parsing and also eventually typechecking as well would be awesome to see side-by-side!

Brendan Hansknecht (Mar 11 2025 at 23:05):

Yeah, full roc check flow once it is working

Luke Boswell (Mar 11 2025 at 23:05):

since error rendering is entirely untouched

That's not entirely true anymore

Luke Boswell (Mar 11 2025 at 23:06):

https://github.com/roc-lang/roc/blob/13c5152cb736d7d46a599fe4624e4ddd58d8a1a5/src/problem.zig#L82

Sam Mohr (Mar 11 2025 at 23:24):

Good to know!

Anthony Bullard (Mar 12 2025 at 14:12):

Holy cow

Anthony Bullard (Mar 12 2025 at 14:13):

Don't have enough time right now to read all of this, but I have a new PR that gets us down the path towards multiline and commented code formatting: https://github.com/roc-lang/roc/pull/7695

Here's a preview of the supported syntax:

    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\... # Elided for clarity and terseness - I love that this is parsed as valid code :-)
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    ...
        \\    match_time(...)
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [
        \\        add_one(number), # Comment one
        \\        456, # Comment two
        \\        789, # Comment three
        \\    ]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    multiline_tuple = (
        \\        123,
        \\        "World",
        \\        tag1,
        \\        Ok(world), # This one has a comment
        \\        (nested, tuple),
        \\        [1, 2, 3],
        \\    )
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1 # This should work too
        \\    blah = 1
        \\    blah == foo
        \\}
    );

Anthony Bullard (Mar 12 2025 at 14:13):

From here it'll be pretty mechanical

Anthony Bullard (Mar 12 2025 at 14:14):

I need to add a function to get comments at the start of a multiline construct, and do some massaging of the comment to collapse excessive newlines

Anthony Bullard (Mar 12 2025 at 14:15):

And of course adopt my new formatCollection function in about 12 places, adjusting some bad Region calculations along the way

Anthony Bullard (Mar 12 2025 at 14:16):

The hardest part will definitely be the headers more than likely

Anthony Bullard (Mar 12 2025 at 14:22):

And at the end I'll try to incorporate "trailing comma multiline forcing"

Anthony Bullard (Mar 12 2025 at 14:24):

Sam Mohr said:

Benchmarking of just parsing and also eventually typechecking as well would be awesome to see side-by-side!

I'm also interested in this as well (obviously), but it won't be quite a fair fight since the grammar is in many ways much easier to parse now without WSS

Anthony Bullard (Mar 12 2025 at 14:24):

And also, we are tokenizing before parsing now

Brendan Hansknecht (Mar 12 2025 at 15:48):

I don't think it needs to be fair. Part of the rewrite of the compiler was improving the grammar

Brendan Hansknecht (Mar 12 2025 at 15:49):

It is just important to remember that it is a multifaceted comparison, not just rust vs zig or DOD vs not or etc.

Brendan Hansknecht (Mar 12 2025 at 15:49):

Cause at the end of the day, users will care about the compiler perf and won't particularly care about any of those other details.

Brendan Hansknecht (Mar 15 2025 at 07:57):

These are only super preliminary numbers... I'm really hoping I didn't mess anything up. I don't think I did, but the numbers feel too good to be true:

M1 Mac

 hyperfine -N -w 5 -r 30 --prepare 'sync && sudo purge' "roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1: roc format /tmp/old
  Time (mean ± σ):     298.2 ms ±   0.7 ms    [User: 257.4 ms, System: 38.8 ms]
  Range (min … max):   297.0 ms … 299.6 ms    30 runs

Benchmark 2: ./zig-out/bin/roc format /tmp/new
  Time (mean ± σ):      25.7 ms ±   0.2 ms    [User: 18.3 ms, System: 6.6 ms]
  Range (min … max):    25.0 ms …  26.2 ms    30 runs

Summary
  ./zig-out/bin/roc format /tmp/new ran
   11.62 ± 0.12 times faster than roc format /tmp/old

X86 Linux

 hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' -w 5 -r 30 "./target/release/roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1: ./target/release/roc format /tmp/old
  Time (mean ± σ):     585.9 ms ±  59.5 ms    [User: 431.7 ms, System: 152.9 ms]
  Range (min … max):   453.3 ms … 713.7 ms    30 runs

Benchmark 2: ./zig-out/bin/roc format /tmp/new
  Time (mean ± σ):      85.0 ms ±   2.1 ms    [User: 46.0 ms, System: 37.8 ms]
  Range (min … max):    82.8 ms …  92.3 ms    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./zig-out/bin/roc format /tmp/new ran
    6.89 ± 0.72 times faster than ./target/release/roc format /tmp/old

Brendan Hansknecht (Mar 15 2025 at 07:59):

Note, the old compiler does have a big disadvantage that it is way larger and loads slower. That said, this is parsing and formatting 100 files with ~1000 lines (24kb each), so I don't think load time should be a big factor.

Brendan Hansknecht (Mar 15 2025 at 08:00):

more stats via poop

Note: this is with a hot cache unlike hyperfine

Benchmark 1 (11 runs): ./target/release/roc format /tmp/old
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           468ms ± 16.5ms     447ms …  493ms          0 ( 0%)        0%
  peak_rss            424MB ± 53.6KB     424MB …  424MB          2 (18%)        0%
  cpu_cycles         1.65G  ± 58.9M     1.59G  … 1.81G           1 ( 9%)        0%
  instructions       3.06G  ± 4.56K     3.06G  … 3.06G           0 ( 0%)        0%
  cache_references   53.4M  ± 1.36M     51.1M  … 55.1M           0 ( 0%)        0%
  cache_misses       5.09M  ±  137K     4.84M  … 5.30M           0 ( 0%)        0%
  branch_misses      6.94M  ± 1.69M     5.48M  … 11.7M           1 ( 9%)        0%
Benchmark 2 (57 runs): ./zig-out/bin/roc format /tmp/new
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          87.2ms ± 2.76ms    83.1ms … 93.1ms          0 ( 0%)        ⚡- 81.4% ±  1.0%
  peak_rss            901KB ±    0       901KB …  901KB          0 ( 0%)        ⚡- 99.8% ±  0.0%
  cpu_cycles          172M  ± 10.3M      155M  …  191M           0 ( 0%)        ⚡- 89.6% ±  1.0%
  instructions        321M  ±  940       321M  …  321M           0 ( 0%)        ⚡- 89.5% ±  0.0%
  cache_references   2.94M  ±  103K     2.76M  … 3.13M           0 ( 0%)        ⚡- 94.5% ±  0.7%
  cache_misses       1.95K  ±  536      1.42K  … 4.87K           7 (12%)        ⚡-100.0% ±  0.7%
  branch_misses      1.15M  ±  176K      875K  … 1.49M           0 ( 0%)        ⚡- 83.4% ±  6.4%

Brendan Hansknecht (Mar 15 2025 at 08:02):

Aside, despite using 99% less memory to parse a file, I am still stunned how much memory we use. In an allocation tracking profiler, I see ~10x memory usage compared to source file size.... Actually, that makes a lot of sense. All tokens take up 12bytes. Which is ~10x more than the single byte a token takes up in the original source file.

Brendan Hansknecht (Mar 15 2025 at 08:04):

I guess this is way we may not want to store the tag or length of tokens. Saves a ton of memory and retokenizing is theoretically super fast.

Brendan Hansknecht (Mar 15 2025 at 08:05):

Also cc @Andrew Kelley cause I know you were interesting in what perf numbers we get with the new compiler. As said, very preliminary numbers, but the parse to format loop looks to be 5 to 10x faster with 500x less memory usage.

Brendan Hansknecht (Mar 15 2025 at 08:15):

Ah, I think I found some of the discrepancy. The old compiler has extra checks to ensure that formatting is stable and not bugged. I need to disable those and see what cost savings it gets.

Brendan Hansknecht (Mar 15 2025 at 08:26):

Ok, disabled all the extra reformatting and validation logic. So both codebases are just doing their parse to format loop. So should be more apples to apples now. We are still 3 to 5x faster with the new compiler (and 200x less memory).

M1 Mac

hyperfine -N -w 5 -r 30 --prepare 'sync && sudo purge' "roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1: roc format /tmp/old
  Time (mean ± σ):     123.8 ms ±   0.6 ms    [User: 97.7 ms, System: 22.8 ms]
  Range (min … max):   122.8 ms … 125.0 ms    30 runs

Benchmark 2: ./zig-out/bin/roc format /tmp/new
  Time (mean ± σ):      25.8 ms ±   1.0 ms    [User: 18.5 ms, System: 6.6 ms]
  Range (min … max):    25.2 ms …  30.9 ms    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./zig-out/bin/roc format /tmp/new ran
    4.79 ± 0.18 times faster than roc format /tmp/old

X86 Linux

hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' -w 5 -r 30 "./target/release/roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1: ./target/release/roc format /tmp/old
  Time (mean ± σ):     305.8 ms ±  12.1 ms    [User: 144.9 ms, System: 160.3 ms]
  Range (min … max):   293.8 ms … 345.9 ms    30 runs

Benchmark 2: ./zig-out/bin/roc format /tmp/new
  Time (mean ± σ):      84.4 ms ±   1.3 ms    [User: 44.7 ms, System: 38.6 ms]
  Range (min … max):    82.9 ms …  88.8 ms    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./zig-out/bin/roc format /tmp/new ran
    3.63 ± 0.15 times faster than ./target/release/roc format /tmp/old

Brendan Hansknecht (Mar 15 2025 at 08:27):

And detailed stats with poop still look great (this is with hot cache unlike the commands above).

stats

poop "./target/release/roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1 (29 runs): ./target/release/roc format /tmp/old
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           173ms ± 25.7ms     159ms …  303ms          2 ( 7%)        0%
  peak_rss            195MB ± 63.1KB     195MB …  195MB          6 (21%)        0%
  cpu_cycles          573M  ± 17.3M      553M  …  630M           1 ( 3%)        0%
  instructions        991M  ± 32.9K      991M  …  991M           0 ( 0%)        0%
  cache_references   19.3M  ±  532K     18.3M  … 20.7M           1 ( 3%)        0%
  cache_misses       1.51M  ± 61.0K     1.36M  … 1.64M           1 ( 3%)        0%
  branch_misses      2.18M  ±  488K     1.62M  … 3.50M           0 ( 0%)        0%
Benchmark 2 (56 runs): ./zig-out/bin/roc format /tmp/new
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          87.8ms ± 2.87ms    82.4ms … 94.4ms          0 ( 0%)        ⚡- 49.2% ±  4.0%
  peak_rss            889KB ±    0       889KB …  889KB          0 ( 0%)        ⚡- 99.5% ±  0.0%
  cpu_cycles          173M  ± 10.9M      156M  …  199M           0 ( 0%)        ⚡- 69.9% ±  1.1%
  instructions        321M  ±  949       321M  …  321M           0 ( 0%)        ⚡- 67.6% ±  0.0%
  cache_references   3.02M  ± 95.1K     2.85M  … 3.30M           0 ( 0%)        ⚡- 84.3% ±  0.8%
  cache_misses       2.10K  ± 1.15K     1.32K  … 9.53K           6 (11%)        ⚡- 99.9% ±  1.1%
  branch_misses      1.17M  ±  183K      896K  … 1.61M           0 ( 0%)        ⚡- 46.2% ±  6.7%

Sam Mohr (Mar 15 2025 at 08:27):

And we haven't even looked at SIMD yet!

Brendan Hansknecht (Mar 15 2025 at 08:28):

Also, about half of execution time is spent in tokenizing, quarter in parsing, and quarter in formatting.

Richard Feldman (Mar 15 2025 at 13:06):

wowwww, this is great!

Richard Feldman (Mar 15 2025 at 13:07):

I'm personally excited that half is tokenizing - means the simd style might actually have a benefit beyond being fun to try implementing :grinning_face_with_smiling_eyes:

Richard Feldman (Mar 15 2025 at 13:11):

RIIZ :high_voltage:

Richard Feldman (Mar 15 2025 at 13:14):

are we missing any parsing features that might make it apples-to-oranges?

Richard Feldman (Mar 15 2025 at 13:15):

(aside from deprecated things not needing to be parsed of course)

Richard Feldman (Mar 15 2025 at 13:15):

or is parsing feature-complete at this point?

Brendan Hansknecht (Mar 15 2025 at 15:23):

We definitely aren't at parity in terms of edge cases, error handling, and comments, but the syntax grab bag looked featureful enough that it seemed worth testing:
https://github.com/roc-lang/roc/blob/327647a6161b96d06f6524ee393ab675c2fc1335/src/fmt.zig#L938

Brendan Hansknecht (Mar 15 2025 at 19:22):

Extra note, with a tiny bit of allocation tuning (initCapacity instead of starting empty), we can get another ~9% faster. Not allocating and copying during tokenization by allocating a large array can make tokenization 1.5x faster.

Brendan Hansknecht (Mar 15 2025 at 19:24):

That said, this is with overallocating to ensure space for all tokens...might not be worth it memory wise. Really depends on average number of characters per token as to what the default should be. And that depends on how many comments someone has among other things like variable name length.

Anthony Bullard (Mar 17 2025 at 11:20):

I didn't realize this thing I'm writing is THAT fast

Anthony Bullard (Mar 17 2025 at 11:21):

I haven't even optimized for performance besides the memory-locality stuff

Anthony Bullard (Mar 17 2025 at 11:21):

I would love to see us do SIMD tokenization

Anthony Bullard (Mar 17 2025 at 11:22):

Let's hope we can get similar speed-ups in the more compute heavy parts of the compiler!

Anthony Bullard (Mar 17 2025 at 11:30):

You can really see the advantages of the SoA architecture in the HUGE delta in cache misses. That's three orders of magnitude fewer!

Brendan Hansknecht (Mar 17 2025 at 16:34):

Yeah, simply using less memory and allocating less does a metric ton.

The old compiler does ~30x more allocations (presumably of many tiny IR nodes). I'm actually a bit surprised it is so bad in the old compiler. I thought we put all that stuff in arenas, but apparently tons of stuff is not in the arenas. Then as you mention, DOD + SOA leads to way less memory usage in general which equates to way more cache hits.

And yeah, we have room for many optimizations.

Joshua Warner (Mar 17 2025 at 17:09):

In the old compiler, Defs has a bunch of Vecs that are not in the arenas

Joshua Warner (Mar 17 2025 at 17:09):

In fact, I'm pretty confident that was a memory leak

Joshua Warner (Mar 20 2025 at 01:59):

The new parser currently expects this syntax:

Map a b : List(a), (a -> b) -> List(b)

... but with PNC I would expect that to be:

Map(a, b) : List(a), (a -> b) -> List(b)

... no?

Andrew Kelley (Mar 20 2025 at 20:48):

Brendan Hansknecht said:

Also cc Andrew Kelley cause I know you were interesting in what perf numbers we get with the new compiler. As said, very preliminary numbers, but the parse to format loop looks to be 5 to 10x faster with 500x less memory usage.

very cool, thanks for sharing!

Joshua Warner (Mar 20 2025 at 20:49):

FWIW, part of the difference there is we're actually parsing a different (simpler) language now

Andrew Kelley (Mar 20 2025 at 20:52):

I like to think that there's a relationship between syntax being simpler for the computer and also being simpler for the human to parse

Brendan Hansknecht (Mar 20 2025 at 22:25):

Joshua Warner said:

FWIW, part of the difference there is we're actually parsing a different (simpler) language now

Yeah, it is a whole mix of things. I would guess the biggest gain is in terms of memory usage and allocations. The sheer amount of data being generated and processed by the old compiler almost certainly is the bottleneck. Even if we changed the grammar, I would expect the allocations, data movement, and cache misses to dominate.

Anthony Bullard (Mar 27 2025 at 11:17):

After #7695 lands, this shows all of the syntax that will be parsed and formatted correctly (in this exact style):

try moduleFmtsSame(
        \\# This is a module comment!
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout exposing [line!, write!]
        \\
        \\import # Comment after import keyword
        \\    pf # Comment after qualifier
        \\        .StdoutMultiline # Comment after ident
        \\        exposing [ # Comment after exposing open
        \\            line!, # Comment after exposed item
        \\            write!, # Another after exposed item
        \\        ] # Comment after exposing close
        \\
        \\import pkg.Something exposing [func as function, Type as ValueCategory, Custom.*]
        \\
        \\import BadName as GoodName
        \\import
        \\    BadNameMultiline
        \\        as
        \\        GoodNameMultiline
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\MapML # Comment here
        \\    a # And here
        \\    b # And after the last arg
        \\        : # And after the colon
        \\            List( # Inside Tag args
        \\                a, # After tag arg
        \\            ),
        \\            (a -> b) -> # After arrow
        \\                List( # Inside tag args
        \\                    b,
        \\                ) # And after the type decl
        \\
        \\Foo : (Bar, Baz)
        \\
        \\FooMultiline : ( # Comment after pattern tuple open
        \\    Bar, # Comment after pattern tuple item
        \\    Baz, # Another after pattern tuple item
        \\) # Comment after pattern tuple close
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\SomeMl a : { # After record open
        \\    foo : Ok(a), # After field
        \\    bar : Something, # After last field
        \\}
        \\
        \\SomeMultiline a : { # Comment after pattern record open
        \\    foo # After field name
        \\        : # Before field anno
        \\            Ok(a), # Comment after pattern record field
        \\    bar : Something, # Another after pattern record field
        \\} # Comment after pattern record close
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\MaybeMultiline a : [ # Comment after tag union open
        \\    Some(a), # Comment after tag union member
        \\    None, # Another after tag union member
        \\] # Comment after tag union close
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg # After debug
        \\            some_func() # After debug expr
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |
        \\    a, # After arg
        \\    b,
        \\| # After args
        \\    match a {
        \\        Blue | Green | Red -> {
        \\            x = 12
        \\            x
        \\        }
        \\        Blue # After pattern in alt
        \\        | # Before pattern in alt
        \\            Green
        \\        | Red # After alt pattern
        \\            -> {
        \\                x = 12
        \\                x
        \\            }
        \\        lower # After pattern comment
        \\            -> 1
        \\        "foo" -> # After arrow comment
        \\            100
        \\        "foo" | "bar" -> 200
        \\        [1, 2, 3, .. as rest] # After pattern comment
        \\            -> # After arrow comment
        \\                123 # After branch comment
        \\
        \\        # Just a random comment
        \\
        \\        [1, 2 | 5, 3, .. as rest] -> 123
        \\        [
        \\            1,
        \\            2 | 5,
        \\            3,
        \\            .. # After DoubleDot
        \\                as # Before alias
        \\                    rest, # After last pattern in list
        \\        ] -> 123
        \\        3.14 -> 314
        \\        3.14 | 6.28 -> 314
        \\        (1, 2, 3) -> 123
        \\        (1, 2 | 5, 3) -> 123
        \\        { foo: 1, bar: 2, ..rest } -> 12
        \\        { # After pattern record open
        \\            foo # After pattern record field name
        \\                : # Before pattern record field value
        \\                    1, # After pattern record field
        \\            bar: 2,
        \\            .. # After spread operator
        \\                rest, # After last field
        \\        } -> 12
        \\        { foo: 1, bar: 2 | 7 } -> 12
        \\        {
        \\            foo: 1,
        \\            bar: 2 | 7, # After last record field
        \\        } -> 12
        \\        Ok(123) -> 123
        \\        Ok(Some(dude)) -> dude
        \\        TwoArgs("hello", Some("world")) -> 1000
        \\    }
        \\
        \\expect # Comment after expect keyword
        \\    blah == 1 # Comment after expect statement
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| { # Yeah I can leave a comment here
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return # Comment after return keyword
        \\        tag # Comment after return statement
        \\
        \\    # Just a random comment!
        \\
        \\    ...
        \\    match_time(
        \\        ..., # Single args with comment
        \\    )
        \\    some_func(
        \\        dbg # After debug
        \\            42, # After debug expr
        \\    )
        \\    crash # Comment after crash keyword
        \\        "Unreachable!" # Comment after crash statement
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [
        \\        add_one(
        \\            dbg # After dbg in list
        \\                number, # after dbg expr as arg
        \\        ), # Comment one
        \\        456, # Comment two
        \\        789, # Comment three
        \\    ]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    multiline_tuple = (
        \\        123,
        \\        "World",
        \\        tag1,
        \\        Ok(world), # This one has a comment
        \\        (nested, tuple),
        \\        [1, 2, 3],
        \\    )
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!(
        \\        "How about ${ # Comment after string interpolation open
        \\            Num.toStr(number) # Comment after string interpolation expr
        \\        } as a string?",
        \\    )
        \\} # Comment after top-level decl
        \\
        \\expect {
        \\    foo = 1 # This should work too
        \\    blah = 1
        \\    blah == foo
        \\}
    );

Anthony Bullard (Mar 27 2025 at 11:28):

One of the next things that need to be done is to start transcribing the builtins to the v0.1 syntax. Where should they live? I'll start doing that and finishing the parser by parsing hosted, platform, and package headers

Brendan Hansknecht (Mar 27 2025 at 13:57):

Preferably, we would make the old compiler able to migrate them to the new syntax

Brendan Hansknecht (Mar 27 2025 at 13:57):

That would ensure folks have an easy way to migrate when the zig compiler comes out

Brendan Hansknecht (Mar 27 2025 at 13:58):

It also makes it easier to benchmark against the old compiler if they both support the exact same syntax

Joshua Warner (Mar 27 2025 at 15:14):

I’be been working on a syntax migration tool in the old compiler, and it’s getting reasonably close. I’d say we should just use that tool to auto-translate the builtins and anything else we want to migrate, with a focus on improving the tool and rerunning it rather than doing hand fix ups to the migrated code

Brendan Hansknecht (Mar 27 2025 at 16:15):

100% agree

Notification Bot (Apr 08 2025 at 14:00):

9 messages were moved from this topic to #ideas > Needed Function signature and lambda expr change by Anton.

Anthony Bullard (Apr 13 2025 at 22:00):

Reading through the Static Dispatch document for the 27th time, am I right to understand that where are only a part of a function's type?

Sam Mohr (Apr 13 2025 at 22:01):

The where is saying that a function uses variables that have constraints

Sam Mohr (Apr 13 2025 at 22:02):

This is just a parsing question, actually

Sam Mohr (Apr 13 2025 at 22:02):

You can also use them in aliases

Anthony Bullard (Apr 13 2025 at 22:02):

Yes, but it would only appear as part of the type annotation of a function type?

Sam Mohr (Apr 13 2025 at 22:02):

We haven't ironed out the links there

Anthony Bullard (Apr 13 2025 at 22:02):

Oh yeah, and in aliases

Anthony Bullard (Apr 13 2025 at 22:02):

I think I'll just be implementing them as part of function types for now

Anthony Bullard (Apr 13 2025 at 22:03):

But if anyone (ahem @Richard Feldman ) has a nice summary of the overall plan here (as well as changes in syntax), please let me know

Sam Mohr (Apr 13 2025 at 22:03):

We want to be able to say sort_list : List(a) -> List(a) where Sortable(a)

Sam Mohr (Apr 13 2025 at 22:04):

Though a.Sortable was also thrown around

Sam Mohr (Apr 13 2025 at 22:04):

So starting with just functions is fine for now

Anthony Bullard (Apr 13 2025 at 22:04):

Yeah, I think the big issue is the current syntax - outside of aliases - is likely going to cause some parsing issues

Anthony Bullard (Apr 13 2025 at 22:06):

Dict.insert : Dict k v, k, v -> Dict k v
    where k.Hash, k.Eq

Hash a : a
    where
        a.hash(hasher, a -> hasher),
        hasher.Hasher,

I'm actually wondering if I should attach these instead to annotations and type definitions

Anthony Bullard (Apr 13 2025 at 22:07):

Because this being used inside of an aggregate type would be .... complicated

Anthony Bullard (Apr 13 2025 at 22:17):

Just thinking out loud here...

so today we have

Hasher implements
    add_bytes : a, List U8 -> a where a implements Hasher
    add_u8 : a, U8 -> a where a implements Hasher
    add_u16 : a, U16 -> a where a implements Hasher
    add_u32 : a, U32 -> a where a implements Hasher
    add_u64 : a, U64 -> a where a implements Hasher
    add_u128 : a, U128 -> a where a implements Hasher
    complete : a -> U64 where a implements Hasher

and I think we would have with v0.1

Hasher a: a where
    a.add_bytes(List U8) -> a ,
    a.add_u8(U8) -> a,
    a.add_u16(U16) -> a,
    a.add_u32(U32) -> a,
    a.add_u64(U64) -> a,
    a.add_u128(U128) -> a,
    a.complete() -> U64

Does that look correct?

Anthony Bullard (Apr 13 2025 at 22:19):

And that allows for us to then have something like:

# In `Dict.roc`
insert : Dict k v, k, v -> Dict k v
    where
        k.Hash,
        k.Eq,

# In `Str.roc`
hash :  Str, Hash.Hasher -> Hash.Hasher
hash = {
    ...# Implementation
}

# In `Hash.roc`
Hash a : a
        where
            a.hash(hasher) -> hasher,
            hasher.Hasher,
Hasher a : a
    where
        a.add_bytes(List U8) -> a ,
        a.add_u8(U8) -> a,
        a.add_u16(U16) -> a,
        a.add_u32(U32) -> a,
        a.add_u64(U64) -> a,
        a.add_u128(U128) -> a,
        a.complete() -> U64

Anthony Bullard (Apr 13 2025 at 22:23):

And then a Dict(Str, Str) is valid (assuming similar is implemented for Eq)

Anthony Bullard (Apr 13 2025 at 22:28):

So each clause in a where is one of:

LowerIdent, NoSpaceDotLowerIdent, NoSpaceOpenParen, <type args>, CloseParens, OpArrow, <type arg>
LowerIdent, NoSpaceDotUpperIdent

Brendan Hansknecht (Apr 13 2025 at 22:58):

Anthony Bullard said:

Reading through the Static Dispatch document for the 27th time, am I right to understand that where are only a part of a function's type?

Yep

Brendan Hansknecht (Apr 13 2025 at 22:59):

Anthony Bullard said:

Just thinking out loud here...

so today we have

Hasher implements
    add_bytes : a, List U8 -> a where a implements Hasher
    add_u8 : a, U8 -> a where a implements Hasher
    add_u16 : a, U16 -> a where a implements Hasher
    add_u32 : a, U32 -> a where a implements Hasher
    add_u64 : a, U64 -> a where a implements Hasher
    add_u128 : a, U128 -> a where a implements Hasher
    complete : a -> U64 where a implements Hasher

and I think we would have with v0.1

Hasher a: a where
    a.add_bytes(List U8) -> a ,
    a.add_u8(U8) -> a,
    a.add_u16(U16) -> a,
    a.add_u32(U32) -> a,
    a.add_u64(U64) -> a,
    a.add_u128(U128) -> a,
    a.complete() -> U64

Does that look correct?

Yep

Brendan Hansknecht (Apr 13 2025 at 23:01):

Anthony Bullard said:

Dict.insert : Dict k v, k, v -> Dict k v
    where k.Hash, k.Eq

Hash a : a
    where
        a.hash(hasher, a -> hasher),
        hasher.Hasher,

I'm actually wondering if I should attach these instead to annotations and type definitions

Hash looks to be defined wrong. I think it would be a.hash(hasher) -> hasher. That or it needs to be defined differently to be module based instead of standard static dispatch.

Brendan Hansknecht (Apr 13 2025 at 23:02):

The other way to define it would be module(a).hash(hasher, a) -> hasher

Anthony Bullard (Apr 13 2025 at 23:15):

What does the module(a) buy us over just a?

Anthony Bullard (Apr 13 2025 at 23:27):

Is that for when the argument of type a is not the first arg (in the "receiver" position)?

Brendan Hansknecht (Apr 14 2025 at 00:25):

Yes

Brendan Hansknecht (Apr 14 2025 at 00:25):

Need for decode

Brendan Hansknecht (Apr 14 2025 at 00:25):

Cause decode is List(U8) -> a

Brendan Hansknecht (Apr 14 2025 at 00:26):

Probably will a decoding format as well or something like that

Anthony Bullard (Apr 14 2025 at 17:32):

Decode a : a where a.decode(List(U8)) -> a
Encode a : a where a.encode(a) -> List(U8)

Brendan Hansknecht (Apr 14 2025 at 19:23):

How would that definition of decode work? You won't have a if you are trying to decode an a.

I think it has to be
Decode a : a where module(a).decode(List(U8)) -> a

Anthony Bullard (Apr 14 2025 at 19:31):

Sorry

Anthony Bullard (Apr 14 2025 at 19:31):

I was just typing out syntax

Anthony Bullard (Apr 14 2025 at 19:31):

That wasn’t meant to be real (or actually submitted)

Anthony Bullard (Apr 16 2025 at 01:50):

The PR for parsing and formatting is here: https://github.com/roc-lang/roc/pull/7745

Anthony Bullard (Apr 16 2025 at 02:13):

My absolute rough draft, first take re-implementation of Hash.roc from the builtins based on all of the syntax for v0.1 that is available today:

module [
    Hash,
    Hasher,
    hash_list,
    hash_unordered,
]

import Num exposing [
    U8,
    U16,
    U32,
    U64,
    U128,
]

## A value that can be hashed.
Hash a : a
    where
        ## Hashes a value into a [Hasher].
        ## Note that [hash] does not produce a hash value itself; the hasher must be
        ## [complete]d in order to extract the hash value.
        a.hash(hasher) -> hasher,
        hasher.Hasher,

## Describes a hashing algorithm that is fed bytes and produces an integer hash.
##
## The [Hasher] ability describes general-purpose hashers. It only allows
## emission of 64-bit unsigned integer hashes. It is not suitable for
## cryptographically-secure hashing.
Hasher a : a
    where
        ## Adds a list of bytes to the hasher.
        a.add_bytes(List(U8)) -> a,

        ## Adds a single U8 to the hasher.
        a.add_u8(U8) -> a,

        ## Adds a single U16 to the hasher.
        a.add_u16(U16) -> a,

        ## Adds a single U32 to the hasher.
        a.add_u32(U32) -> a,

        ## Adds a single U64 to the hasher.
        a.add_u64(U64) -> a,

        ## Adds a single U128 to the hasher.
        a.add_u128(U128) -> a,

        ## Completes the hasher, extracting a hash value from its
        ## accumulated hash state.
        a.complete() -> U64,

## Adds a list of [Hash]able elements to a [Hasher] by hashing each element.
hash_list : hasher, List a -> hasher
    where
        a.Hash,
        hasher.Hasher,
hash_list = |hasher, lst|
    lst.walk(
        hasher,
        |accum_hasher, elem|
            elem.hash(accum_hasher),
    )

HashFunction a : hasher, a -> hasher where a.Hash, hasher.Hasher
HashWalker container elem -> hasher, container, HashFunction(elem) -> hasher
    where
        elem.Hash,
        hasher.Hasher,

## Adds a container of [Hash]able elements to a [Hasher] by hashing each element.
## The container is iterated using the walk method passed in.
## The order of the elements does not affect the final hash.
hash_unordered : hasher, container, HashWalker(container, elem) -> hasher
    where
        elem.Hash,
        hasher.Hasher,
hash_unordered = |hasher, container, walk| {
    acc = walk(
        container,
        0,
        |accum, elem| {
            x =
                # Note, we intentionally copy the hasher in every iteration.
                # Having the same base state is required for unordered hashing.
                elem
                .hash(hasher)
                .complete()
            next_accum = accum.add_wrap(x)

            if next_accum < accum {
                # we don't want to lose a bit of entropy on overflow, so add it back in.
                next_accum.add_wrap(1)
            } else {
                next_accum
            }
        },
    )
    hasher.add_u64(acc)
}

Anthony Bullard (Apr 16 2025 at 02:15):

If that looks correct to everyone, I'll add a snapshot for it

Brendan Hansknecht (Apr 16 2025 at 02:25):

I assume a.add_u8(U8) -> hasher, should be a.add_u8(U8) -> a, same for other functions in Hasher

Also, is it now Hash(a) and Hasher(a) instead of Hash a and Hasher a

(hasher, b, (hasher, a -> hasher) -> hasher) could use a anlias to be less confusing. That or at least change the name b to container and a to elem.

Yeah, looks roughly correct.

Brendan Hansknecht (Apr 16 2025 at 02:26):

Also, I wonder if we need a better way to apply aliases. hasher.Hasher, doesn't look good.

Brendan Hansknecht (Apr 16 2025 at 02:27):

Like it doesn't feel like it is saying that hasher implements the Hasher ability. I wonder if it would be clearer to just do Hasher(hasher) or something similar....not sure

Brendan Hansknecht (Apr 16 2025 at 02:28):

Hey @Richard Feldman, do we have any plans on fixing byte hashing for static dispatch? As in, a List U8 will by default hash via hash_list instead add_bytes. As such, it leaves a lot of performance on the table by default.

Brendan Hansknecht (Apr 16 2025 at 02:29):

Really in a perfect world, all number types would hash via add_bytes and only complex types would use hash_list that walks an element at a time.

Richard Feldman (Apr 16 2025 at 03:02):

hm, I'm not familiar with the distinction :sweat_smile:

Richard Feldman (Apr 16 2025 at 03:02):

I assume hash_list is hashing each byte individually, which of course is suboptimal

Richard Feldman (Apr 16 2025 at 03:02):

but what does add_bytes do?

Brendan Hansknecht (Apr 16 2025 at 03:36):

Yeah, hash_list is the one element at a time thing

Brendan Hansknecht (Apr 16 2025 at 03:37):

And add_bytes leaves it up to the hasher and thus can run way faster algorithms

Brendan Hansknecht (Apr 16 2025 at 03:38):

The issue is that we cannot specialize hash on List U8 only on List a. So we get stuck hashing List U8 with hash_list unless a user manually overrides it in a custom type hash implementation.

Brendan Hansknecht (Apr 16 2025 at 03:40):

As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes but to mask out the padding, but that is another level of complexity to even orchestrate.

Anthony Bullard (Apr 16 2025 at 11:08):

Brendan Hansknecht said:

I assume a.add_u8(U8) -> hasher, should be a.add_u8(U8) -> a, same for other functions in Hasher

Also, is it now Hash(a) and Hasher(a) instead of Hash a and Hasher a

(hasher, b, (hasher, a -> hasher) -> hasher) could use a anlias to be less confusing. That or at least change the name b to container and a to elem.

Yeah, looks roughly correct.

I fixed the couple of things that were obviously wrong, and made some type aliases to make list_unordered's signature readable.

Currently a Type Header still uses space separated args. Did this change @Richard Feldman ?

Anthony Bullard (Apr 16 2025 at 11:09):

This is why it would be very nice if there were a Grammar specification - outside of the implementation in the parser - that we could use to track the parser's compliance (and therefore completion)

Anthony Bullard (Apr 16 2025 at 11:10):

As good as Zulip is, trying to find the answer to every such question is very difficult

Anthony Bullard (Apr 16 2025 at 11:13):

In #ideas > ✔ Using parens for types @ 💬 we decided to use parens for type arguments in the annotation itself, but no real direction of type alias headers

Richard Feldman (Apr 16 2025 at 12:56):

good question! I think we should match what patterns do

Richard Feldman (Apr 16 2025 at 12:57):

so for example:

Alias(a, b) : ...

Richard Feldman (Apr 16 2025 at 13:06):

Brendan Hansknecht said:

The issue is that we cannot specialize hash on List U8 only on List a. So we get stuck hashing List U8 with hash_list unless a user manually overrides it in a custom type hash implementation.

ah, yeah static dispatch doesn't actually help with this; the List type is still in the List module.

Anthony Bullard (Apr 16 2025 at 13:07):

Good point! The hash_list function should be in the list module

Anthony Bullard (Apr 16 2025 at 13:07):

And ok on the change to type alias headers

Anthony Bullard (Apr 16 2025 at 13:07):

I can do that real quick before work

Richard Feldman (Apr 16 2025 at 13:08):

Brendan Hansknecht said:

As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes but to mask out the padding, but that is another level of complexity to even orchestrate.

:thinking: we could make it so that all the non-pointers in structural types get hashed using add_bytes. They should all end up right next to each other anyway because of alignment, and we can't do better than that because we need to chase the pointers anyway (and not include the addresses in the hash)

Anthony Bullard (Apr 16 2025 at 13:10):

And then we need:

Whatever we call [-2, 0, 2].map(.abs().sub(1)). Tearoffs?
var statements
for statements

Anthony Bullard (Apr 16 2025 at 13:11):

And then v0.1 parsing will be complete

Anthony Bullard (Apr 16 2025 at 13:12):

Obviously the formatter will need quite a bit of iteration to dial in the style over time

Anthony Bullard (Apr 16 2025 at 13:13):

@Richard Feldman have we ever finalized the syntax and semantics for a) Tearoffs and b) local function application in a static dispatch chain?

Richard Feldman (Apr 16 2025 at 13:29):

a) let's hold off on that for now. I'd like to see how often people actually want to reach for it in practice once we have static dispatch. Might turn out to not be worth doing.
b) yeah, we settled on a->b in https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/static.20dispatch.20-.20pass_to.20alternative

Richard Feldman (Apr 16 2025 at 13:35):

more specifically:

a->b(1, 2)->c(3, 4) desugars to c(b(a, 1, 2), 3, 4)
you can omit () from these when there are no extra args to pass, so for example you can do ->Ok instead of ->Ok(), and 4->hours->ago! instead of 4->hours()->ago!()
the parser should accept either style, but then the formatter should remove ()s that are used after ->

Anthony Bullard (Apr 16 2025 at 13:37):

Just to clarify about b - We decided on -> for local dispatch / "Arrow calls", but there was some suggestion (not really touched upon) that match clauses would drop ->. Is that a thing?

Anthony Bullard (Apr 16 2025 at 14:03):

Well, this change was a tad more difficult than I expected - it creates some contention with a plain applied Tag expression. Have to make statements differentiate between top-level and inside of a block.

Anthony Bullard (Apr 16 2025 at 14:04):

So I'll try to finish it tomorrow morning :-)

Anthony Bullard (Apr 16 2025 at 14:04):

Now off to writing AI Agents...

Richard Feldman (Apr 16 2025 at 16:16):

Anthony Bullard said:

Just to clarify about b - We decided on -> for local dispatch / "Arrow calls", but there was some suggestion (not really touched upon) that match clauses would drop ->. Is that a thing?

we decided to use => with match for now

Anton (Apr 16 2025 at 16:37):

Anthony Bullard said:

Now off to writing AI Agents...

Cool, what are you working on specifically?

Brendan Hansknecht (Apr 16 2025 at 18:05):

Richard Feldman said:

Brendan Hansknecht said:

As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes but to mask out the padding, but that is another level of complexity to even orchestrate.

:thinking: we could make it so that all the non-pointers in structural types get hashed using add_bytes. They should all end up right next to each other anyway because of alignment, and we can't do better than that because we need to chase the pointers anyway (and not include the addresses in the hash)

Ah yeah, structural types without pointers can do what I said when in a list. If you have pointers, you need to do things an element at a time

Anthony Bullard (Apr 16 2025 at 18:47):

Anton said:

Anthony Bullard said:

Now off to writing AI Agents...

Cool, what are you working on specifically?

Wish I could be specific, but let’s just exploration around UI Generation when the UI is not created from source code

Anthony Bullard (Apr 17 2025 at 11:24):

Type alias header using parens for args: https://github.com/roc-lang/roc/pull/7749

Anthony Bullard (Apr 17 2025 at 11:32):

My next steps:

Move match clauses to using fat arrow =>
Implement "arrow calls" as described in #compiler development > zig compiler - parser @ 💬
Implement var statements
Implement for construct

I'll probably do 1 & 2 in a single PR, hopefully by EOW.
Then 3 & 4 should be relatively straightforward so I'm hoping to have everything done before the next contributor meet up.

Anthony Bullard (Apr 18 2025 at 10:56):

Move match clauses to fat arrow: https://github.com/roc-lang/roc/pull/7751

Anthony Bullard (May 11 2025 at 13:10):

Parse and format "Local dispatch arrow calls": https://github.com/roc-lang/roc/pull/7780

Anthony Bullard (May 12 2025 at 13:59):

Parse and format var and for statements: https://github.com/roc-lang/roc/pull/7783

Anthony Bullard (May 12 2025 at 14:01):

Unless something has been missed, the above should functionally complete the parser, parse IR, and formatter for the v0.1 syntax

Richard Feldman (May 12 2025 at 14:08):

yoooooooo :heart_eyes::heart_eyes::heart_eyes:

Anthony Bullard (May 12 2025 at 14:11):

now to find something difficult to work on :thinking:

Anthony Bullard (May 12 2025 at 14:12):

(and i want to make it very clear that the formatter style is still a WIP but it works, is stable, and loses no information)

Richard Feldman (Jun 03 2025 at 21:39):

Brendan Hansknecht said:

M1 Mac

hyperfine -N -w 5 -r 30 --prepare 'sync && sudo purge' "roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1: roc format /tmp/old
  Time (mean ± σ):     123.8 ms ±   0.6 ms    [User: 97.7 ms, System: 22.8 ms]
  Range (min … max):   122.8 ms … 125.0 ms    30 runs

Benchmark 2: ./zig-out/bin/roc format /tmp/new
  Time (mean ± σ):      25.8 ms ±   1.0 ms    [User: 18.5 ms, System: 6.6 ms]
  Range (min … max):    25.2 ms …  30.9 ms    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./zig-out/bin/roc format /tmp/new ran
    4.79 ± 0.18 times faster than roc format /tmp/old

X86 Linux

hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' -w 5 -r 30 "./target/release/roc format /tmp/old" "./zig-out/bin/roc format /tmp/new"
Benchmark 1: ./target/release/roc format /tmp/old
  Time (mean ± σ):     305.8 ms ±  12.1 ms    [User: 144.9 ms, System: 160.3 ms]
  Range (min … max):   293.8 ms … 345.9 ms    30 runs

Benchmark 2: ./zig-out/bin/roc format /tmp/new
  Time (mean ± σ):      84.4 ms ±   1.3 ms    [User: 44.7 ms, System: 38.6 ms]
  Range (min … max):    82.9 ms …  88.8 ms    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./zig-out/bin/roc format /tmp/new ran
    3.63 ± 0.15 times faster than ./target/release/roc format /tmp/old

I'm going on the Changelog podcast tomorrow, and it occurred to me that it would be cool to be able to talk about redoing these benchmarks now that we have a feature-complete parser and formatter!

Richard Feldman (Jun 03 2025 at 21:39):

would be sweet to be able to have not only a comparison with the old compiler, but also something like "we can now parse the equivalent of _____ lines of code per second"

Richard Feldman (Jun 03 2025 at 21:41):

I won't have time between now and then, but if anyone has time to run those benchmarks, would be great to have them! would be awesome to inspire some people to get involved if they're interested in helping out with a performance-focused compiler :smiley:

Anthony Bullard (Jun 03 2025 at 22:19):

i would love to see those numbers as well! i don't have a lot of time at the moment but i'd love to brag about how much better we've made things

Anthony Bullard (Jun 03 2025 at 22:47):

if the existing benchmarks hold up it's 3.9M LOC/s

Anthony Bullard (Jun 03 2025 at 22:50):

oh and that's parse and formatting across 100 files

Anthony Bullard (Jun 03 2025 at 22:51):

That portends wonderful things for the LS as well

Richard Feldman (Jun 03 2025 at 23:19):

Anthony Bullard said:

if the existing benchmarks hold up it's 3.9M LOC/s

is that for parse and format?

Richard Feldman (Jun 03 2025 at 23:19):

or just parse

Anthony Bullard (Jun 03 2025 at 23:20):

It says parse and format loop

Anthony Bullard (Jun 03 2025 at 23:21):

which means that includes at least a couple hundred sys calls for reading and writing files

Anthony Bullard (Jun 03 2025 at 23:21):

correct me if that's wrong @Brendan Hansknecht

Brendan Hansknecht (Jun 03 2025 at 23:49):

Yeah, parse and format. If I recall correctly, only about 20% is formatting....

Brendan Hansknecht (Jun 03 2025 at 23:50):

I probably could pull more updated numbers. Biggest caveat being I don't have real meaningful source code in both old and new parser syntax. Last time I was using a modified syntax grab bag which is not that much like real world code.

Luke Boswell (Jun 03 2025 at 23:51):

Could Claude generate some realistic source code?

Brendan Hansknecht (Jun 03 2025 at 23:51):

I know we wanted to make the old parser auto update, but feature like abilities make the standard library fail to port over.

Anthony Bullard (Jun 03 2025 at 23:51):

is the rust compiler v0.1 migrator finished?

Luke Boswell (Jun 03 2025 at 23:51):

@Joshua Warner

Brendan Hansknecht (Jun 03 2025 at 23:52):

I think it is mostly there but has edge cases and clearly can't handle things like abilities.

Anthony Bullard (Jun 03 2025 at 23:52):

we could try to just find a number of source files in different projects that succeed

Brendan Hansknecht (Jun 03 2025 at 23:52):

@Richard Feldman when do you need the numbers by?

Luke Boswell (Jun 03 2025 at 23:53):

Richard Feldman said:

I'm going on the Changelog podcast tomorrow...

Anthony Bullard (Jun 03 2025 at 23:54):

I think we can give the existing numbers as showing at least the order of magnitude

Anthony Bullard (Jun 03 2025 at 23:54):

and caveat it

Richard Feldman (Jun 04 2025 at 00:02):

@Brendan Hansknecht noon Pacific is when the recording starts

Brendan Hansknecht (Jun 04 2025 at 01:33):

Just testing the formatter now to see if I can get more meaningful code migrated (like some of the builtins).

Definitely a few pieces missing for migration:

abilities (no fix cause they just don't exist)
wildcard types (those can just become an abitrary letter)
pizza op (that should be able to turn into static dispatch)

I still will try to manually convert some things and see how it goes.

Brendan Hansknecht (Jun 04 2025 at 01:38):

hmm...something also definitely goes wrong with closures and braces.

Brendan Hansknecht (Jun 04 2025 at 01:48):

Also @Anthony Bullard, found a parser bug. This definitely should parse:

expect {
    foo : U32
    foo = 1 # This should work too
    blah = 1
    blah == foo
}

Specifically the foo : U32 line breaks parsing.

Anthony Bullard (Jun 04 2025 at 01:50):

surprising that it doesn't

Anthony Bullard (Jun 04 2025 at 01:50):

i thought i had a similar test

Anthony Bullard (Jun 04 2025 at 01:50):

is it the U32?

Brendan Hansknecht (Jun 04 2025 at 01:53):

I don't think so. I hit it with List(a) first.

Brendan Hansknecht (Jun 04 2025 at 01:53):

Some reason it seems to expect the expect to end right after that line.

Anthony Bullard (Jun 04 2025 at 01:55):

ah it must think it's a record literal

Anthony Bullard (Jun 04 2025 at 01:56):

that shouldn't happen. i check for this and bail out of a record and parse it as a block

Anthony Bullard (Jun 04 2025 at 01:56):

I can take a look in the morning but that's the best i can do

Brendan Hansknecht (Jun 04 2025 at 01:56):

Yeah, I don't need it fixed, just noting it

Anthony Bullard (Jun 04 2025 at 01:57):

ok sorry about that

Anthony Bullard (Jun 04 2025 at 01:57):

i'll add the above as a snapshot and makes sure it parses

Brendan Hansknecht (Jun 04 2025 at 01:58):

One other question for you, does the new zig compiler attempt to parse thing like List a and convert to List(a)?

Richard Feldman (Jun 04 2025 at 02:07):

I don't think we should try to do that honestly

Richard Feldman (Jun 04 2025 at 02:07):

I think the amount of complexity wouldn't be worth it, considering LLMs can now do that sort of migration well enough :smile:

Richard Feldman (Jun 04 2025 at 02:08):

and I remember a conversation where we discussed that if we knew the new parser didn't need to support whitespace application at all, it would significantly simplify several areas

Anthony Bullard (Jun 04 2025 at 02:08):

yeah no white space application at all

Richard Feldman (Jun 04 2025 at 02:13):

love the idea of parsing the stdlib as a point of comparison btw! :smiley:

Brendan Hansknecht (Jun 04 2025 at 02:25):

Yeah, just was curious cause in the half migrated list.roc file I have it has some cases where it doesn't seem to care about List a but others where it does care....no idea why.

Brendan Hansknecht (Jun 04 2025 at 02:30):

love the idea of parsing the stdlib as a point of comparison btw!

Yeah, just not sure if I can port the stdlib in time to get meaningful numbers. Theoretically an llm might be able to help, but they definitely don't know roc currently.

Brendan Hansknecht (Jun 04 2025 at 02:31):

Doesn't have to be a fully functional port, but at least needs to parse and format.

Richard Feldman (Jun 04 2025 at 02:31):

ah yeah I guess we don't have an equivalent of https://www.roc-lang.org/builtins/llms.txt for the new syntax

Richard Feldman (Jun 04 2025 at 02:31):

could try giving it some basic examples but maybe not enough

Richard Feldman (Jun 04 2025 at 02:31):

or like tests in the source code maybe?

Brendan Hansknecht (Jun 04 2025 at 02:40):

Oh, one other parser bug found. The new parser doesn't understand doc comments ## ...

Anthony Bullard (Jun 04 2025 at 02:41):

that's super easy to add. i thought the tokenizer handled that

Anthony Bullard (Jun 04 2025 at 02:42):

we definitely don't have a concept of a doc comment in the parser because the parser doesn't know about comments at all

Brendan Hansknecht (Jun 04 2025 at 02:43):

Ah yeah, fair

Anthony Bullard (Jun 04 2025 at 02:43):

either way should be very straight forward

Anthony Bullard (Jun 04 2025 at 02:44):

there's a function called chompTrivia that handles it i believe

Anthony Bullard (Jun 04 2025 at 02:44):

in the tokenizer

Brendan Hansknecht (Jun 04 2025 at 02:51):

Also, to note, I think it is may just be the formatting that needs to be fixed. Outputs # # ...

Richard Feldman (Jun 04 2025 at 02:53):

we may need to start keeping doc comments (although not regular ones) around for parsing so we can parse their markdown but yeah

Brendan Hansknecht (Jun 04 2025 at 04:33):

Hmm, looks like we are missing record destructuring in the new syntax:
{ start, end, step } = rec
expr_unexpected_token, at token OpAssign at ...

Brendan Hansknecht (Jun 04 2025 at 04:40):

And maybe missing guards on match statements.
[x, .. as rest] if x == delimiter =>
expr_unexpected_token, at token OpFatArrow at ...

Brendan Hansknecht (Jun 04 2025 at 05:17):

Ok, so I ported over List.roc. I think it is a reasonably representative file for well documented roc code. It is about ~800 LOC and ~800 lines of comments/blank/bracket only.
The new version is about 100 lines of brackets more than the old version. That said, total byte count is pretty similar. I padded the old roc file with an extra comment to make them the exact same number of bytes.

All numbers on M1 mac

Formatting 1000 files that are all List.roc

3.2x faster
1000x less memory

460ms total
65% of time is reading files, tokenize, and parse
So...
~1,600,000 lines parsed in ~300ms (~5 million lines per second)
~800,000 lines of actual code parsed (~2.5 million lines of code per second)

Note, many lines of code are still really short, so it may not be the most representative.

Raw times:

Benchmark 1: ./roc-rust format old-list/
  Time (mean ± σ):      1.498 s ±  0.037 s    [User: 1.200 s, System: 0.245 s]
  Range (min … max):    1.438 s …  1.574 s    10 runs

Benchmark 2: ./roc-zig format new-list/
  Time (mean ± σ):     464.9 ms ±   6.1 ms    [User: 374.0 ms, System: 86.1 ms]
  Range (min … max):   457.4 ms … 475.4 ms    10 runs

Summary
  ./roc-zig format new-list/ ran
    3.22 ± 0.09 times faster than ./roc-rust format old-list/

Formatting List.roc single file with 1000x the body size

3.6x faster
6x less memory

390ms total
65% of time is reading files, tokenize, and parse
So...
~1,600,000 lines parsed in ~250ms (~6.4 million lines per second)
~800,000 lines of actual code parsed (~3.2 million lines of code per second)

Note, many lines of code are still really short, so it may not be the most representative.

Raw times:

Benchmark 1: ./roc-rust format Old-List.roc
  Time (mean ± σ):      1.384 s ±  0.018 s    [User: 1.182 s, System: 0.165 s]
  Range (min … max):    1.357 s …  1.421 s    10 runs

Benchmark 2: ./roc-zig format New-List.roc
  Time (mean ± σ):     384.9 ms ±  10.9 ms    [User: 305.3 ms, System: 60.0 ms]
  Range (min … max):   372.3 ms … 400.2 ms    10 runs

Summary
  ./roc-zig format New-List.roc ran
    3.59 ± 0.11 times faster than ./roc-rust format Old-List.roc

Also, definitely room for more gains like making chomptrivia faster.

Brendan Hansknecht (Jun 04 2025 at 05:20):

@Richard Feldman here are some rough numbers just based on List.roc.

Clearly we leak memory with every new file in the old rust compiler.

Also, big numbers that matter:
~3 to 4x faster than rust compiler (parse and format combined)
~2 to 3 millions lines of code parsed per second
~5 to 6 millions lines per second when including comments, bracket only, and blank lines

Luke Boswell (Jun 04 2025 at 05:29):

making chomptrivia faster

Nice, sounds like a fun project :racecar:

Brendan Hansknecht (Jun 04 2025 at 05:35):

Yeah, chomp trivia and string interning are the two things that stick out most.

Luke Boswell (Jun 04 2025 at 05:36):

The plan is to do some SIMD stuff there? It sounds like it might be a really contained side project. Would it be too early to explore that now?

Brendan Hansknecht (Jun 04 2025 at 05:44):

Someone could definitely do this now if they are interested.

Luke Boswell (Jun 04 2025 at 07:17):

I had an initial stab at it... https://github.com/roc-lang/roc/pull/7815

@Brendan Hansknecht can you share your List.roc file so I can test this.

Anthony Bullard (Jun 04 2025 at 10:48):

I don't know enough about SIMD to really review your implementation @Luke Boswell but it looks good to me, thank you for adding tests to check correctness!

Anthony Bullard (Jun 04 2025 at 10:57):

Brendan Hansknecht said:

Hmm, looks like we are missing record destructuring in the new syntax:
{ start, end, step } = rec
expr_unexpected_token, at token OpAssign at ...

Can you share the overall file and I'll add it as a snapshot that I can fill in gaps. To be clear, this is supposed to be supported. I can see from even syntax grab bag (and verified by looking at Parser.parseStmt) that I must have just forgotten to look for OpenCurly and OpenSquare in the statement position to check them being used as a pattern before sending it over to parseExpr.

Anthony Bullard (Jun 04 2025 at 10:57):

And the match guards were also just for some reason not on my check list

Richard Feldman (Jun 04 2025 at 23:11):

the Changelog episode recording went great, and I used the numbers in it - thanks for doing those @Brendan Hansknecht and thanks for getting the parser and formatter to the point where we could actually compile a whole builtin module with them @Anthony Bullard! :heart_eyes:

Richard Feldman (Jun 04 2025 at 23:11):

episode should be out in a week or two

Brendan Hansknecht (Jun 04 2025 at 23:58):

@Luke Boswell I ran it on a folder with 1000 copies of this file
file_1.roc

Brendan Hansknecht (Jun 04 2025 at 23:59):

and a version of it that just duplicated the body 1000 times

Brendan Hansknecht (Jun 04 2025 at 23:59):

New-List.roc

Brendan Hansknecht (Jun 05 2025 at 00:02):

Can you share the overall file and I'll add it as a snapshot that I can fill in gaps.

I ended up removing everything that caused a problem in order to get it compiling.... I can try to find and re-add all the things that broke the tokenizer/parser. Might not have time to do so tonight though. Should just be the two issues I mentioned above and then many commented out type annotations.

Luke Boswell (Jun 05 2025 at 00:04):

This is good. Thank you. I'm currently rewriting a bunch of things. I've been experimenting with the LLM workflow, and it's amazing how useful but also unhelpful the AI can be. It gives code that looks pretty good but can be totally useless. I'm rebuilding my test cases to ensure it's working correctly.

Brendan Hansknecht (Jun 05 2025 at 00:13):

Might not have time to do so tonight though

Actually, I am confused on what day it is. I should have time tonight, don't have time tomorrow night. I'll try to skim through the file and re-add the various pieces I had to remove.

Brendan Hansknecht (Jun 05 2025 at 01:35):

@Anthony Bullard I think this file should work for your snapshot test. Also, beyond just making parse, definitely needs some formatting cleanup.
Edit: newer version of file with another fix.
List.roc

Brendan Hansknecht (Jun 05 2025 at 01:35):

For example:

walk_backwards_help = |list, state, f, index_plus_one|
    if index_plus_one == 0
        state
            else {
                index = Num.sub_wrap(index_plus_one, 1)
                next_state = f(state, get_unsafe(list, index))

                walk_backwards_help(list, next_state, f, index)
            }

Brendan Hansknecht (Jun 05 2025 at 03:17):

I keep remembering ...oh, and one more thing for this.... In the new compiler we don't allow _intionally_named_but_ignored. It has to just be _. Was this intentional? I like named ignored variables, but maybe not everyone does and we decided to remove it?

Luke Boswell (Jun 05 2025 at 03:23):

I think its meant to be the same, i.e. _named_ignored_thing is still valid. It's probably just been overlooked/not implemented yet.

Anton (Jun 06 2025 at 09:44):

Luke Boswell said:

The plan is to do some SIMD stuff there? It sounds like it might be a really contained side project. Would it be too early to explore that now?

I followed the updates and PR on this, but I think it's too early for SIMD, I would not add serious complexity to code that is still expected to change and not a bottleneck.

Luke Boswell (Jun 06 2025 at 09:52):

I agree. It was fun to explore a little, but turned out to be a little more complicated than I thought.

Brendan Hansknecht (Jun 06 2025 at 14:33):

Anton said:

I followed the updates and PR on this, but I think it's too early for SIMD, I would not add serious complexity to code that is still expected to change and not a bottleneck.

Luckily, chomp trivia is not still expected to change and is a bottleneck.

Brendan Hansknecht (Jun 06 2025 at 14:34):

It also is such a small chunk of code that it would be trivial to switch back if we need

Brendan Hansknecht (Jun 06 2025 at 14:34):

I think small localized optimizations like this should be totally fine.

Brendan Hansknecht (Jun 06 2025 at 14:37):

I say this more in terms of, people should feel free to explore optimizations and we should definitely accept them if they are small and self contained. Don't want to block that energy.

Anton (Jun 06 2025 at 15:07):

I agree, I forgot that this was about chompTrivia specifically

Anton (Jun 06 2025 at 15:24):

For the bottleneck I was thinking about parsing not being a bottleneck in compilation but yeah small, self-contained and not expected to change sounds good

Brendan Hansknecht (Jun 06 2025 at 15:29):

Yeah, for dev with an interpreter, will be interesting to see the bottlenecks

Brendan Hansknecht (Jun 06 2025 at 15:30):

Also, I guess formatting a ton of files is another workflow we want to be super fast

Brendan Hansknecht (Jun 06 2025 at 15:30):

But yeah, in general I agree

Anthony Bullard (Jun 08 2025 at 14:27):

Richard Feldman said:

the Changelog episode recording went great, and I used the numbers in it - thanks for doing those Brendan Hansknecht and thanks for getting the parser and formatter to the point where we could actually compile a whole builtin module with them Anthony Bullard! :heart_eyes:

Can't wait to see it myself!

Luke Boswell (Jun 20 2025 at 08:24):

I've noticed we don't parse tuple patterns... are these planned but just not implemented yet?

Analysis from Claude

The parser only looks for pattern assignments when it sees a LowerIdent token. For tuple patterns starting with (, it falls through to the default case where it parses an expression statement instead of a pattern assignment.

To fix this, we would need to:

Add a case for .OpenRound and .NoSpaceOpenRound in parseStmtByType
Try to parse a pattern, then check if it's followed by OpAssign
If so, create a .decl statement with the pattern

However, this approach has a problem: we'd need to look ahead quite a bit to determine if we have a pattern assignment vs just a tuple expression. A better approach might be to:

In the default case (where expressions are parsed), after parsing an expression, check if it's followed by OpAssign
If so, try to convert the expression to a pattern (or report an error if it can't be converted)
Then parse the RHS and create a .decl statement

This is likely why the current parser implementation only supports simple identifier patterns in assignments - handling complex patterns requires either significant lookahead or the ability to convert expressions to patterns after parsing.

The snapshot shows this limitation clearly - tuple "patterns" are being parsed as tuple expressions, leading to the "expr_unexpected_token" error when the parser encounters the = sign.

Snapshot

I thought these would be supported, but I think they haven't been implemented in the Parser yet

~~~META
description=Tuple pattern matching tests
type=expr
~~~SOURCE
{

    # Simple tuple destructuring
    (x, y) = (1, 2)

    # Nested tuple patterns
    ((a, b), (c, d)) = ((10, 20), (30, 40))

    # Mixed patterns with literals
    (first, second, third) = (100, 42, 200)

    # Tuple with string and tag patterns
    (name, string, boolean) = ("Alice", "fixed", True)

    # Tuple with list pattern
    (list, hello) = ([1, 2, 3], "hello")

    {}
}

Anthony Bullard (Jun 20 2025 at 20:28):

check the syntax grab bag snapshot

Anthony Bullard (Jun 20 2025 at 20:29):

that's all of the implemented syntax

Anthony Bullard (Jun 20 2025 at 20:29):

if tuples aren't there i think it was an oversight

Brendan Hansknecht (Jun 20 2025 at 21:22):

I think it is specifically for pattern matching that they aren't implemented.

Brendan Hansknecht (Jun 20 2025 at 21:22):

I mentioned this a while ago in this thread when porting List.roc over

Anthony Bullard (Jun 20 2025 at 21:31):

should be trivial to fix, but it would be MUCH easier for all of this if decls started with a keyword :rolling_on_the_floor_laughing:

Luke Boswell (Jun 20 2025 at 21:53):

Like let? What are you thinking

Anthony Bullard (Jun 20 2025 at 21:53):

yeah

Richard Feldman (Jun 20 2025 at 22:11):

this is how we end up with nice syntax: we do the thing that's nicer for the end user but harder for the compiler authors :grinning_face_with_smiling_eyes:

Anthony Bullard (Jun 20 2025 at 22:16):

oh totally! not suggesting we SHOULD

Anthony Bullard (Jun 20 2025 at 22:18):

just stating the simple fact that patterns in decls with no keywords that are almost indistinguishable from a expr of the same shape until you find the equal sign is complicated specifically for things like lists and tuples

Anthony Bullard (Jun 20 2025 at 22:18):

with a record i know after two tokens

Richard Feldman (Jun 20 2025 at 22:19):

ah for sure!

Anthony Bullard (Jun 20 2025 at 22:19):

but lists and tuples patterns can mean a theoretically unbounded lookahead

Luke Boswell (Jun 20 2025 at 22:24):

What do you think of Claude's suggestion

handling complex patterns requires either significant lookahead or the ability to convert expressions to patterns after parsing.

Could we convert it from an expression to pattern in Can?

Anthony Bullard (Jun 20 2025 at 22:41):

possibly

Joshua Warner (Jun 20 2025 at 22:42):

FWIW I’d actually like to have the same underlying ast nodes be reused for exprs and patterns

Anthony Bullard (Jun 20 2025 at 22:42):

but i think it would be a pain, but i can look at it

Joshua Warner (Jun 20 2025 at 22:42):

That way we don’t have to convert (only verify)

Anthony Bullard (Jun 20 2025 at 22:43):

Joshua Warner said:

FWIW I’d actually like to have the same underlying ast nodes be reused for exprs and patterns

there'd have to be near perfect symmetry between them for me to roll that way. i can take a look at that tomorrow

Anthony Bullard (Jun 20 2025 at 22:43):

it would unlock some nice characteristics

Anthony Bullard (Jun 20 2025 at 22:43):

really removing the need for lookahead at all

Anthony Bullard (Jun 20 2025 at 22:51):

the only issue i can think of off top is alternates

Joshua Warner (Jun 20 2025 at 22:51):

There’s not a perfect symmetry, but it’s close enough to be useful IMO. The only alternative is allowing conversions, which I think will end up being a bit costly from a perf perspective

Anthony Bullard (Jun 20 2025 at 22:52):

like Foo(1 | 2)

Joshua Warner (Jun 20 2025 at 22:52):

Alternates can’t occur in expr position anyway (at the start of a statement)

Anthony Bullard (Jun 20 2025 at 22:53):

yeah but i wouldn't want to underfit the node to support something for this kind of pattern that's also used for exprs

Joshua Warner (Jun 20 2025 at 22:54):

Oh to be clear I’m taking about the internal node types being shared, not the types exposed to the rest of the compiler

Anthony Bullard (Jun 20 2025 at 22:55):

i mean all nodes internally are identical

Joshua Warner (Jun 20 2025 at 22:55):

Right, but expressions and patterns should use the same node type IDs

Joshua Warner (Jun 20 2025 at 22:56):

We can then translate them into different public enumeration types, depending on the context

Anthony Bullard (Jun 20 2025 at 22:56):

it's just the different tags and how those tags affect the interpretation of the data and extra data

Joshua Warner (Jun 20 2025 at 22:56):

Yep, so the same internal tags, but different external enumerations

Anthony Bullard (Jun 20 2025 at 22:56):

so all use expr ids

Anthony Bullard (Jun 20 2025 at 22:56):

but you get / set with specific functions for pattern or expr?

Joshua Warner (Jun 20 2025 at 22:57):

I was matching we would also have a pattern ID that sneaky uses the same ID space as expression IDs

Joshua Warner (Jun 20 2025 at 22:57):

Sneakily

Anthony Bullard (Jun 20 2025 at 22:57):

they all have the same address space

Anthony Bullard (Jun 20 2025 at 22:57):

they are all just integer offsets into a single array list

Joshua Warner (Jun 20 2025 at 22:58):

Errrr… right, I forgot about that.

Anthony Bullard (Jun 20 2025 at 22:58):

so this shouldn't be two hard at all

Joshua Warner (Jun 20 2025 at 22:59):

I guess the only critical thing then is making sure that if you were to take a pattern ID and cast it to an expression ID, then decoded as an expression, it should “just work”

Anthony Bullard (Jun 20 2025 at 22:59):

yes

Anthony Bullard (Jun 20 2025 at 23:00):

actually that might not be that much work

Anthony Bullard (Jun 20 2025 at 23:00):

(famous last words)

Joshua Warner (Jun 20 2025 at 23:00):

Naturally, these code paths to convert to the external types will act as correctness assertions on the purse parser algorithms that are deciding what is and is not a valid expression (or pattern)

Joshua Warner (Jun 20 2025 at 23:00):

Yeah hopefully

Joshua Warner (Jun 20 2025 at 23:01):

lol dictating the word parser is impossibrl

Anthony Bullard (Jun 20 2025 at 23:02):

wow you are right!

Anthony Bullard (Jun 20 2025 at 23:02):

just tried myself

Jared Ramirez (Jul 01 2025 at 20:38):

Does the parser currently support multiple patterns in match? ie A | B => ...

Jared Ramirez (Jul 01 2025 at 20:40):

The type looks like in parse/AST.zig looks like:

pub const MatchBranch = struct {
    pattern: Pattern.Idx,
    body: Expr.Idx,
    region: TokenizedRegion,
}

So I'm guessing either it's not yet implemented, or when we parse we split A and B into their own branches pointing to the same expr?

Luke Boswell (Jul 01 2025 at 21:06):

I did think about that. I haven't implemented it (or at least not deliberately) in my PR

Luke Boswell (Jul 01 2025 at 21:07):

My guess is we'll split that into multiple branches... but the Parser should be where that is done I think. Actually I'm not sure...

Luke Boswell (Jul 01 2025 at 21:20):

We dont want to desugar in this version of the compiler so we probably need a different approach to ha dle this.

Jared Ramirez (Jul 01 2025 at 21:26):

In CIR, it has a span. Can the parse AST do the same?

pub const Branch = struct {
  patterns: Match.BranchPattern.Span,
  ...
}

Luke Boswell (Jul 01 2025 at 21:50):

I think so, it's a fair change to the parser. We may want to wait until after @Anthony Bullard has done his thing before we change it.

Richard Feldman (Jul 06 2025 at 01:31):

seems like higher-order function type annotations don't parse without parens right now :sweat_smile:

this parses:

foo : ((Str -> Str) -> Str)

...but without the outer parens, this fails to parse:

foo : (Str -> Str) -> Str

Anthony Bullard (Jul 06 2025 at 02:13):

that's definitely a bug

Anthony Bullard (Jul 06 2025 at 02:13):

if you add a snapshot for it i'll fix it

Richard Feldman (Jul 06 2025 at 03:00):

thanks! I pushed a repro snapshot to higher-order-annotations

Brendan Hansknecht (Aug 03 2025 at 19:39):

I think we may have just hit an exciting-ish point in parser fuzzing. The fuzzer isn't failing on super short edge cases anymore. Instead, it is now via forcing a stack overflow...at least that is my assumption based on the input.

zig build repro-parse -- -b MHt7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7e3t7ezA= -v

Richard Feldman (Aug 03 2025 at 19:45):

amazing!!!

JRI98 (Aug 03 2025 at 19:56):

Oh exciting indeed. But this raises the question: right now the parser is recursive by design. How should we handle these kinds of situations? Would a randomly defined recursion limit like 256 be fine? Because I can't imagine this happening in usual code :laughing:

Richard Feldman (Aug 03 2025 at 20:04):

ideally we'd make it stack-safe by using a heap-allocated stack instead of the call stack

JRI98 (Aug 03 2025 at 20:11):

That was my second option. Fuzzing can really find the edge cases

Anthony Bullard (Aug 05 2025 at 11:41):

That looks like token wise it's just 4 tokens so I think it would be a tokenization issue

Anthony Bullard (Aug 05 2025 at 11:41):

And i thought the chomping of chars in a ident was in a tight loop, not recursion...

Joshua Warner (Aug 05 2025 at 15:07):

That's the base 64 that appears to be just 4 tokens. The actual input is a boatload of open curlies.

Joshua Warner (Aug 05 2025 at 15:07):

... which leads to the stack overflow you'd expect -

...
    frame #10300: 0x00000001001003a4 repro-parse`Parser.parseStmtByType(self=0x000000016fdfa400, statementType=in_body) at Parser.zig:1214:36
    frame #10301: 0x000000010014662c repro-parse`Parser.parseStmt(self=0x000000016fdfa400) at Parser.zig:922:36
    frame #10302: 0x000000010013f854 repro-parse`Parser.parseBlock(self=0x000000016fdfa400, start=17) at Parser.zig:2791:45
    frame #10303: 0x0000000100130bcc repro-parse`Parser.parseExprWithBp(self=0x000000016fdfa400, min_bp='\0') at Parser.zig:1948:43
    frame #10304: 0x000000010011d588 repro-parse`Parser.parseExpr(self=0x000000016fdfa400) at Parser.zig:1718:36
    frame #10305: 0x00000001001003a4 repro-parse`Parser.parseStmtByType(self=0x000000016fdfa400, statementType=in_body) at Parser.zig:1214:36
    frame #10306: 0x000000010014662c repro-parse`Parser.parseStmt(self=0x000000016fdfa400) at Parser.zig:922:36
    frame #10307: 0x000000010013f854 repro-parse`Parser.parseBlock(self=0x000000016fdfa400, start=16) at Parser.zig:2791:45
    frame #10308: 0x0000000100130bcc repro-parse`Parser.parseExprWithBp(self=0x000000016fdfa400, min_bp='\0') at Parser.zig:1948:43
    frame #10309: 0x000000010011d588 repro-parse`Parser.parseExpr(self=0x000000016fdfa400) at Parser.zig:1718:36
    frame #10310: 0x00000001001003a4 repro-parse`Parser.parseStmtByType(self=0x000000016fdfa400, statementType=in_body) at Parser.zig:1214:36
    frame #10311: 0x000000010014662c repro-parse`Parser.parseStmt(self=0x000000016fdfa400) at Parser.zig:922:36
    frame #10312: 0x000000010013f854 repro-parse`Parser.parseBlock(self=0x000000016fdfa400, start=15) at Parser.zig:2791:45
    frame #10313: 0x0000000100130bcc repro-parse`Parser.parseExprWithBp(self=0x000000016fdfa400, min_bp='\0') at Parser.zig:1948:43
    frame #10314: 0x000000010011d588 repro-parse`Parser.parseExpr(self=0x000000016fdfa400) at Parser.zig:1718:36
    frame #10315: 0x00000001001003a4 repro-parse`Parser.parseStmtByType(self=0x000000016fdfa400, statementType=in_body) at Parser.zig:1214:36
    frame #10316: 0x000000010014662c repro-parse`Parser.parseStmt(self=0x000000016fdfa400) at Parser.zig:922:36
    frame #10317: 0x000000010013f854 repro-parse`Parser.parseBlock(self=0x000000016fdfa400, start=14) at Parser.zig:2791:45
    frame #10318: 0x0000000100130bcc repro-parse`Parser.parseExprWithBp(self=0x000000016fdfa400, min_bp='\0') at Parser.zig:1948:43
    frame #10319: 0x000000010011d588 repro-parse`Parser.parseExpr(self=0x000000016fdfa400) at Parser.zig:1718:36
...

Anthony Bullard (Aug 06 2025 at 11:01):

I have a fix for this, basically using a nesting counter that decrements every time we enter parsing an expression (and increments on a defer) - throwing a new Error when we try to decrement when already at zero

Anthony Bullard (Aug 06 2025 at 11:01):

I put the value for now at 128, that would probably need to vary based on platform/arch

Anthony Bullard (Aug 06 2025 at 11:02):

I can put up a PR for it, and you can give me feedback

Anthony Bullard (Aug 06 2025 at 11:08):

https://github.com/roc-lang/roc/pull/8166

JRI98 (Aug 06 2025 at 14:56):

Yeah as a temporary solution it should work

JRI98 (Aug 06 2025 at 15:00):

But I don't think this only relates to expressions. Pattern parsing is also recursive and I can imagine the fuzzer finding an example for it

JRI98 (Aug 06 2025 at 15:03):

Constructed manually:

module []
x=match l{[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[

JRI98 (Aug 06 2025 at 15:18):

Type annotations parsing too (example here needs to be extended as it is too big for Zulip):

Brendan Hansknecht (Aug 06 2025 at 20:31):

Yeah, I know in the long term we don't want arbitrary limits like this, but they definitely work for the short and probably medium term.

Anthony Bullard (Aug 06 2025 at 21:14):

Yeah good call out @JRI98 i hadn't went all the way through with this change, just wanted to see if we feel comfortable with this approach for solving this general problem for at least the medium term

Anthony Bullard (Aug 06 2025 at 21:15):

The other options, at least in Zig, are much higher effort

Anthony Bullard (Aug 06 2025 at 21:15):

And higher in complexity

JRI98 (Aug 06 2025 at 21:40):

Agree. And if we can get the fuzzer to a passing state like already happens with the tokenizer, we could maybe execute shorter runs for PRs. I don't know how feasible this would be though

Luke Boswell (Aug 06 2025 at 22:29):

Probably less effort to run the fuzzer for a few mins than to build our rust compiler in nix :sweat_smile:

Anthony Bullard (Aug 06 2025 at 22:30):

With this PR on my M1 Pro mac book i got three crashes in 5 minutes

Luke Boswell (Aug 06 2025 at 22:30):

We had the rust fuzzer running in CI for a little while there, so I think that's something we would like to do again.

Luke Boswell (Aug 06 2025 at 22:31):

We'd need to get fuzz clean first :smiley:

Anthony Bullard (Aug 06 2025 at 22:31):

haven't had time to look at the crashes yet though

Anthony Bullard (Aug 06 2025 at 22:31):

i think until the work in my dead PR is done, the fuzzer will have issues

Anthony Bullard (Aug 06 2025 at 22:32):

The parser has some cracks around differentiating blocks/record exprs/record patterns

Anthony Bullard (Aug 06 2025 at 22:32):

And for some reason parses a module with a number after the header successfully

Anthony Bullard (Aug 06 2025 at 22:33):

i just don't have the time to go super deep right now

Luke Boswell (Aug 06 2025 at 22:33):

I wonder if @JRI98 would be interested in helping with that PR? It's definitely the direction we want to go right? It needs someone familiar with the Parser to execute the refactor.

Anthony Bullard (Aug 06 2025 at 22:33):

Just trying to start the conversation about solving specifically the stack overflow on the above PR

Anthony Bullard (Aug 06 2025 at 22:34):

I'd be happy to talk someone through the high level design

Anthony Bullard (Aug 06 2025 at 22:34):

With the "Untyped nodes" solution

JRI98 (Aug 06 2025 at 22:40):

Currently I don't really have the bandwidth for big refactors. But when #8166 is merged, I could look at the fuzzer crashes that come after it

JRI98 (Aug 06 2025 at 22:54):

Anthony Bullard said:

And for some reason parses a module with a number after the header successfully

Like this?

module []

0

JRI98 (Aug 06 2025 at 23:10):

If it is that, #8167 should fix it. The parser was accepting expressions as top level statements

Brendan Hansknecht (Aug 06 2025 at 23:50):

I think the plan was to never have the fuzzer in ci. Or at least to never have it block ci.

Brendan Hansknecht (Aug 06 2025 at 23:50):

It is the kind of failure that we simply want to report but never block on.

Brendan Hansknecht (Aug 06 2025 at 23:51):

There was a good talk about this a while back by I think one of the tiger beetle folks. It's actually what inspired the design of our current fuzzing setup

Brendan Hansknecht (Aug 06 2025 at 23:52):

This one I think: https://www.hytradboi.com/2025/c222d11a-6f4d-4211-a243-f5b7fafc8d79-rocket-science-of-simulation-testing

Anthony Bullard (Aug 07 2025 at 01:36):

should i assume that with no comments on my PR I should be good to go ahead and move forward with the nesting counter strategy for patterns and type annotations as well?

Anthony Bullard (Aug 07 2025 at 01:37):

i think with that and removing expressions being valid at top-level (which was not intended), We should be as clean as we can be fuzz wise until Untyped Nodes refactor lands

JRI98 (Aug 07 2025 at 11:17):

After the previous changes, the fuzzer found an example that triggers memory leaks, caused by the parser now returning the recursion error and not cleaning up in this case.

zig build repro-parse -- -b bW9kdWxlIFtdCgpmID0gfHwgewogICAgY3Jhc2s1MkxmckAJCS4wMzdvODYtDQkJbTA4bW9kdWxlIFtNeVJldWx0LkVyfGssIGlzX29rXQoKTXllIVJlc3VsdChvaywgZXJyKSA6PSBbT2sob3JyKV0KCm9rIDogb2sgLT4gTXlSZXN1bHQob2ssIF8pCkMoYSwgYiwpIDogKGEsIGIsKQpEKGEsIGIsKSA6IEMoYSwgYiwpCkUgOiB7IGEgOiBTdHIsIGIgOiBTdHIsIH0KRiA6IFtBLCBCLF0KCmcgOiBlIC0+IGUgd2hlcmUg7/////////8pLkEsIG1vYm1sZShlKS5CLAoKIEIsXQoKZyA6IGUgLT4gZSB3aGVyZSDv/////////zEyOiB4LCBoMTM6IHsgaDEzISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhIQaQISEKCk15UmVzdWx0KG9rLCBlcnIpIDo9IFtPayghISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISgoJioiKickISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISFST0xPISEhISEhRAAhISEhISEhISEhISEhISEJISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhb2R1bGUg//9jp7O24A1bZm9vXQpmZmZmZmaaZmZmZmZmZmZmZmZmZmZmZmZmb0BAQEBvZIAAAABAIUBAQEBAOQBAQEBAyBRAQKJSAAAABEBoIGxAQHdbQEBAQDxAQEBAQECiUgAAAARAaCBsQEB3W0BAQEA8QEBvAEBAQEDIFEBAolH/82igbEBAd1tAQEBAPEBAQEBvZCEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISEhISFoIHJlc3VsdGR7X3wgCiDs7Ozs7Ozs7Ozs7Ozs7Ozs7Ozs7OwKTXlS -v

PR #8170 fixes it but someone please double check that it is fine.

Anthony Bullard (Aug 07 2025 at 11:36):

Here is the followup PR that adds the counter check to type annos, patterns, and statements: https://github.com/roc-lang/roc/pull/8171

Anthony Bullard (Aug 07 2025 at 11:46):

I closed #7897 since it'll be easier just to play the same work on a new branch given the refactor of putting parse into a new top-level directory inside of src

JRI98 (Aug 07 2025 at 14:49):

:eyes:
Screenshot 2025-08-07 at 15.49.02.png

JRI98 (Aug 07 2025 at 14:49):

I found one crash in a local run but it is an easy fix related to unstable formatting. But looking good, finally green :tada:

Anthony Bullard (Aug 07 2025 at 14:55):

No failures on 15 million runs? I'll take it!

JRI98 (Aug 07 2025 at 14:59):

After this PR that should be it #8172

Brendan Hansknecht (Aug 07 2025 at 15:44):

:tada: huge milestone!

Niclas Ahden (Aug 07 2025 at 20:32):

Would it be useful to run fuzzing on spare hardware? I’ve got some 9950Xs and Macs that I’d happily fuzz Roc 24/7 on if it’d be helpful

Brendan Hansknecht (Aug 07 2025 at 22:11):

I don't think it is particularly needed, but wouldn't hurt. GitHub lets us kick off runners pretty much 24/7 if we want. Given we find so many failures currently, I set the cadence and totally run time pretty short. I can amp it up to run for a lot longer or more frequently.

Brendan Hansknecht (Aug 07 2025 at 22:12):

Also, I don't recall for sure, but I think fuzzing may not really work right on Mac currently or that maybe it works but only we debug builds....don't recall exactly. It is an issue when dealing with afl++ and zig coverage instrumentation.

Anthony Bullard (Aug 08 2025 at 01:02):

I also have my i9 Mac Mini (2018) that has linux on it that I could do the same

Anthony Bullard (Aug 08 2025 at 01:03):

It seems to run fine on my Mac

Brendan Hansknecht (Aug 08 2025 at 02:28):

        // Work around instrumentation bugs on mac without giving up perf on linux.
        .optimize = if (target.result.os.tag == .macos) .Debug else .ReleaseSafe,

Brendan Hansknecht (Aug 08 2025 at 02:30):

Actually, may not matter. I think zig-afl-kit consumes it as llvm bitcode. So it should be able to optimize it.

Brendan Hansknecht (Aug 08 2025 at 02:32):

Not sure if it misses any zig optimizations, but it shouldn't miss any llvm optimization I think

Anthony Bullard (Aug 08 2025 at 10:32):

Ran the fuzzer overnight on my M1 Pro Macbook, no crashes after 9 hours (29M Execs)!

Screenshot 2025-08-08 at 5.31.29 AM.png

JRI98 (Aug 08 2025 at 10:44):

Roc's fuzzer found one but it is related to unstable formatting and those are simple to fix #8174

Anthony Bullard (Aug 08 2025 at 11:10):

You found a new one on main?

Anthony Bullard (Aug 08 2025 at 11:10):

This is using the fuzzer based on main

Anthony Bullard (Aug 08 2025 at 11:11):

Just a small number of hangs that are basically all some around a string of 10,000+ {s or other brace right after each other (and unclosed)

JRI98 (Aug 08 2025 at 11:20):

It must have been random luck

Richard Feldman (Aug 08 2025 at 11:23):

btw this is super random but I know we talked back in...January? about wanting a column width constraint on the formatter so that we could solve the problem of docs on the website overflowing dynamic width and creating horizontal scrollbars

I learned about @container in CSS which apparently makes it possible to do this in pure CSS, so that appears to be unnecessary anymore! :smile:

Richard Feldman (Aug 10 2025 at 20:58):

TIL that Rust already does the "don't distinguish between patterns, types, and exprs in parsing; treat them all the same in the AST and then categorize them in the subsequent pass" thing - and for the same reasons!

Rather than trying to keep the categories separately in the syntax, use the same surface syntax to express all three, and categorize later, during semantic analysis.

https://matklad.github.io/2025/08/09/zigs-lovely-syntax.html

Anthony Bullard (Aug 10 2025 at 23:00):

It's really not hard for Rust or Zig. they have keywords for everything and no ambiguity with parsing blocks vs struct literals vs struct patterns vs type annos

Anthony Bullard (Aug 10 2025 at 23:02):

if in Roc, decls and type decls had keyword introduction, and annos were inline it wouldn't be needed. But then it really wouldn't be Roc anymore, it'd virtually be gleam syntactically.

Anthony Bullard (Aug 10 2025 at 23:05):

Actually keywords for bindings and (anonymous) records having a sigil is all that would be required. Annos could stay the way they are since type annos are only on a single named binding and type decls are required to start with an UpperIdent.

Anthony Bullard (Aug 10 2025 at 23:07):

(also you could have no sigil for records if blocks used do..end syntax which i won't rehash)

Richard Feldman (Aug 10 2025 at 23:13):

I mainly just found it interesting that we're not moving towards a novel AST design; there's already a mainstream language doing it :smile:

Anthony Bullard (Aug 10 2025 at 23:44):

And here i thought we were innovating :rolling_on_the_floor_laughing:

Last updated: Oct 18 2025 at 12:13 UTC