Stream: compiler development

Topic: zig compiler - parser


view this post on Zulip Sam Mohr (Feb 18 2025 at 01:49):

Joshua Warner said:

Sam Mohr the reason I'd like not to just directly have a header/statements is I want this to be able to directly parse individual expressions both for testing and for repl evaluation

Okay, then what's the best way to:

view this post on Zulip Notification Bot (Feb 18 2025 at 02:22):

A message was moved here from #compiler development > zig compiler - spike by Joshua Warner.

view this post on Zulip Joshua Warner (Feb 18 2025 at 02:24):

@Sam Mohr Why do you need to just parse the header? My intent would be to make parsing fast enough that parsing the whole file is plenty fast.

view this post on Zulip Sam Mohr (Feb 18 2025 at 02:25):

We'd want to parse just the header of the root main.roc of each app/package/platform to get their dep packages to discover the file trees of each package

view this post on Zulip Sam Mohr (Feb 18 2025 at 02:25):

But if it's fast enough to just parse the whole thing, then no need

view this post on Zulip Sam Mohr (Feb 18 2025 at 02:26):

Also, if we persist said list of imported packages in the CanIR, then we can just pull it from the cache and keep around the CanIR until later

view this post on Zulip Joshua Warner (Feb 18 2025 at 02:29):

Cool. We can explore that optimization in the future. that does add a non-trivial amount of complexity though so I'd like to do the simple thing for now

view this post on Zulip Sam Mohr (Feb 18 2025 at 02:30):

Yeah, totally agreed

view this post on Zulip Joshua Warner (Feb 18 2025 at 03:37):

Ok, I think this test should get you going:

test "canonicalize" {
    const source =
        \\app [name] { pf: platform "main.roc" }
        \\
        \\name = "Luke"
    ;

    var parse_ir = parse.parse(std.testing.allocator, source);
    var can_ir = IR.init(std.testing.allocator);
    parse_ir.store.emptyScratch();
    canonicalize(&can_ir, &parse_ir, std.testing.allocator);
}

view this post on Zulip Joshua Warner (Feb 18 2025 at 03:38):

The primary problem with the test you posted is just that module headers are not currently supported... and the particular way error recovery happens right now is looping forever (whoops!), which made that harder than expected to figure out.

view this post on Zulip Joshua Warner (Feb 18 2025 at 03:39):

(whoops left some debug stuff in; now removed)

view this post on Zulip Sam Mohr (Feb 18 2025 at 03:56):

Okay, this is a good start!

view this post on Zulip Joshua Warner (Feb 18 2025 at 04:32):

If I wanted to start looking at intern'ing for symbols in the tokenizer - is there an existing structure you're expecting that data to go into?

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:34):

The interners in the ModuleEnv https://github.com/roc-lang/roc/blob/26f9416929aa0cd52ca732fc533b4a94a690de04/src/base/ModuleEnv.zig#L22

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:36):

Ideally, we'd put text in the "right" bucket:

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:37):

But if that turns out to be difficult/not possible to do consistently during tokenization, then maybe we reduce down to just Ident.Store

view this post on Zulip Joshua Warner (Feb 18 2025 at 04:39):

I can split lower and upper idents, that's about it.

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:39):

I figured

view this post on Zulip Joshua Warner (Feb 18 2025 at 04:39):

At the layer of hashing, do they need to be in separate buckets?

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:39):

Not sure what you mean by the layer of hashing

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:40):

I think we should just make 2 buckets, upper_idents and lower_idents

view this post on Zulip Joshua Warner (Feb 18 2025 at 04:42):

I'm thinking we can do something like:

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:42):

That works!

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:43):

We'd want those IDs to be pretty much as granular as possible, within reason

view this post on Zulip Joshua Warner (Feb 18 2025 at 04:43):

Can you expand on that? Do you mean for the purpose of types? Or is it beneficial to have the id space be compact?

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:44):

Which should mean:

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:44):

For the purpose of types

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:44):

Using a single bucket makes the most sense

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:50):

I'm not actually sure how much benefit there is for having these distinct types compared to just LowerIdent.Idx and UpperIdent.Idx

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:50):

The TagName and FieldName stuff is an artifact from the specialize_types prototype from @Agus Zubiaga

view this post on Zulip Sam Mohr (Feb 18 2025 at 04:52):

I think we can just start with a single SmallStringInterner that we have UpperIndex.Idx and LowerIdent.Idx both point into

view this post on Zulip Joshua Warner (Feb 19 2025 at 03:27):

In Ident.Store:

pub fn insert(self: *Store, ident: Ident, region: Region, problems: *std.ArrayList(Problem)) Idx {

... I'm a little worried about the overhead of alloc'ing an ArrayList of Problem for every ident.

I'm also somewhat concerned at doing the interning in the tokenizer based on this Ident type, since these:

/// Attributes of the identifier such as if it is effectful, ignored, or reassignable.
attributes: Attributes,

... are things we don't really know at this point in compiling and I'm hesitant to create a lie; I think that'll lead to issues down the road.

Furthermore:

/// Problems with the identifier
/// e.g. if it has two underscores in a row
/// or if it starts with a lowercase then it shouldn't be `lowerCamelCase`, it must be `snake_case`
problems: Problems,

These are things that IMO should be reported as diagnostics in the tokenizer, and none of the rest of the compiler should really need to track (or care about at all).

I'm thinking instead I'd like to do interning on a lower-level type that _only_ represents the text, giving a simple intern'd string id (u32) and nothing else.

Thoughts @Sam Mohr / @Luke Boswell ?

view this post on Zulip Richard Feldman (Feb 19 2025 at 03:47):

I assumed the point of passing in problems like that would be to just have one for the whole pass that gets passed around, and each call may or may not push problems onto it

view this post on Zulip Richard Feldman (Feb 19 2025 at 03:47):

(as opposed to allocating a new one each time)

view this post on Zulip Joshua Warner (Feb 19 2025 at 03:50):

Possibly

view this post on Zulip Joshua Warner (Feb 19 2025 at 03:51):

But given what's currently in Problems, I don't see why the design would warrant that

view this post on Zulip Joshua Warner (Feb 19 2025 at 03:52):

There is no need to dynamically allocate these. and the ones that are there right now probably ought to be just tokenizer diagnostics.

view this post on Zulip Joshua Warner (Feb 19 2025 at 03:52):

So I guess, the question is what might be in there in the future that would justify this?

view this post on Zulip Joshua Warner (Feb 19 2025 at 03:53):

I expect there was some discussion about this that I just was not part of and want to make sure I'm not missing anything

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:13):

I don't think we've really discussed it in any detail. Sam has just been developing these data types and structures as a best effort based on what we know so far, so we have something to start with. We expect them to evolve a lot as we go.

view this post on Zulip Sam Mohr (Feb 19 2025 at 04:18):

Richard Feldman said:

I assumed the point of passing in problems like that would be to just have one for the whole pass that gets passed around, and each call may or may not push problems onto it

This

view this post on Zulip Sam Mohr (Feb 19 2025 at 04:19):

The reason I put the problems as a mutable reference was so that you knew as a caller that your problems on the Ident would be reported on interning

view this post on Zulip Sam Mohr (Feb 19 2025 at 04:19):

It prevents someone from forgetting to intern the problems

view this post on Zulip Sam Mohr (Feb 19 2025 at 04:20):

We could alternatively pass the problem reference to the Ident.Store, but that gets us closer to pointer jungle

view this post on Zulip Sam Mohr (Feb 19 2025 at 04:20):

So this seems less tangled from bird's eye view

view this post on Zulip Brendan Hansknecht (Feb 19 2025 at 04:22):

Minor note, this may be the wanted type for storing problems: https://ziglang.org/documentation/master/std/#std.BoundedArray

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:22):

Do you think we're just going with UpperString and LowerString for interners, or are we having all the different variants?

view this post on Zulip Brendan Hansknecht (Feb 19 2025 at 04:23):

as opposed to slices and counts or growing arraylists

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:42):

What's the use case for some later compiler stage knowing that an identifier has a subsequent_underscores problem? That should have already been reported in the tokenizer.

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:45):

So we can report that to the user as a warning?

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:46):

I'm not sure why we store a list of problems in the Ident though

view this post on Zulip Brendan Hansknecht (Feb 19 2025 at 04:46):

What's the use case for some later compiler stage knowing that an identifier has a subsequent_underscores problem?

I think the goal is to collect all problems and then dispatch in one place to decide exactly how everything is reported to the end users.

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:46):

I thought all the Problems would be stored in the ModuleEnv

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:47):

I'm guessing this field can be removed, because we store all the problems in ModuleEnv

/// Problems with the identifier
/// e.g. if it has two underscores in a row
/// or if it starts with a lowercase then it shouldn't be `lowerCamelCase`, it must be `snake_case`
problems: Problems,

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:53):

Makes sense

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:53):

How is this intended to work?

/// Identifier attributes such as if it is effectful, ignored, or reassignable packed into 3-bits.
pub const Attributes = packed struct(u3) {
    effectful: bool,
    ignored: bool,
    reassignable: bool,
};

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:53):

Those are things that will be different for different instantiations of the same identifier

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:54):

e.g. the same name used in as different local variables in two different functions

view this post on Zulip Luke Boswell (Feb 19 2025 at 04:54):

This is a very good point...

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:54):

Actually

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:55):

I guess reassignable would be the _ suffix, effectful would be the ! suffix, and ignored would be the _ prefix?

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:55):

Maybe that is actually a textual property of the identifier

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:59):

The tokenizer will actually know those already, so we can just pass those in pre-computed

view this post on Zulip Joshua Warner (Feb 19 2025 at 04:59):

Easy enough

view this post on Zulip Sam Mohr (Feb 19 2025 at 05:48):

Joshua Warner said:

Maybe that is actually a textual property of the identifier

This was the plan yes

view this post on Zulip Joshua Warner (Feb 19 2025 at 06:13):

How do those attributes apply to things like uppercase idents, field names, etc?

view this post on Zulip Sam Mohr (Feb 19 2025 at 06:17):

They don't, which is part of the reason why they used to be separate

view this post on Zulip Sam Mohr (Feb 19 2025 at 06:17):

But it seems to be cheap enough to just set the attributes to all false, AKA zero, and leave them in there

view this post on Zulip Sam Mohr (Feb 19 2025 at 06:21):

However, if those attributes are easier to parse unilaterally, then having them for non idents, e.g. for ignored field names in record builders, they'll just get ignored

view this post on Zulip Joshua Warner (Feb 19 2025 at 06:39):

Ok cool

view this post on Zulip Joshua Warner (Feb 19 2025 at 06:39):

Initial PR for integrating interning in the tokenizer: https://github.com/roc-lang/roc/pull/7624

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:20):

PR for parsing the new match expression and some more patterns: https://github.com/roc-lang/roc/pull/7626

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:21):

Make sure to keep an eye on this test, as it is the "everything implemented right now" test:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| {
        \\    match a {
        \\        Blue -> 47,
        \\        Green -> 19,
        \\        Red -> 12,
        \\        lower -> 1,
        \\        [1, 2, 3, .. as rest] -> 123,
        \\        3.14 -> 314,
        \\    }
        \\}
        \\
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Let me know if anything looks wrong here.

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:23):

I'll try to get Tags with payloads and record patterns parsing this afternoon (and probably Tuple patterns)

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:24):

My plan after that is Type decls and Type annotations

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:24):

And then the hard stuff - Tuples, Records, and BinOps

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:25):

If someone needs/wants something done sooner (like other headers), let me know - or feel free to put in a PR yourself and send it to me!

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:26):

The hold-up with Records is I am going to have to refactor Body/Block to be just another type of expression and not it's own distinct type of Node

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:28):

@Joshua Warner remind me, did we decide to implement a NoSpaceColon token that will be required for (expression) Record Fields?

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:28):

I think Record Fields in type annotations can be more forgiving

view this post on Zulip Anthony Bullard (Feb 20 2025 at 14:30):

I think the only real alternative would be something more drastic like inline type annotations which has been ruled out in the past by Richard

view this post on Zulip Richard Feldman (Feb 20 2025 at 15:26):

what if we made match consistent with if and have it just use curly braces instead of a special -> and , thing?

view this post on Zulip Richard Feldman (Feb 20 2025 at 15:26):

e.g. instead of this:

match_time = |a| {
    match a {
        Blue -> 47,
        Green -> 19,
        Red -> 12,
        lower -> 1,
        [1, 2, 3, .. as rest] -> 123,
        3.14 -> 314,
    }
}

view this post on Zulip Richard Feldman (Feb 20 2025 at 15:26):

...we do this:

match_time = |a| {
    match a {
        Blue { 47 }
        Green { 19 }
        Red { 12 }
        lower { 1 }
        [1, 2, 3, .. as rest] { 123 }
        3.14 { 314 }
    }
}

view this post on Zulip Richard Feldman (Feb 20 2025 at 15:27):

seems like it's just as easy to read, it's one less piece of syntax to learn, and if you want a multiline branch you're already all set up with the curly braces

view this post on Zulip Richard Feldman (Feb 20 2025 at 15:33):

it always annoys me in Rust having to switch back and forth between , and { :stuck_out_tongue:

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 15:45):

Hmm... I guess it is just a bit less distinct, but once you have mutiline blocks is common anyway

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 15:46):

Also, will lead to more common fake record syntax

view this post on Zulip Joshua Warner (Feb 20 2025 at 16:08):

@Anthony Bullard I think we do want NoSpaceColon for now. We can always have the parser accept either token in cases where it’s not ambiguous.

view this post on Zulip Richard Feldman (Feb 20 2025 at 16:51):

@Anthony Bullard how about we try the "no arrows" style above (as in, we just parse a pattern followed by an expression, and the formatter always chooses to put braces around that expression, just like with if) but if someone puts in -> or => or , we treat them like semicolons (warn, and then the formatter drops them)

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:14):

That’s syntax is great!

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:14):

Richard Feldman said:

it always annoys me in Rust having to switch back and forth between , and { :stuck_out_tongue:

I also hate this

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:17):

Hmm, I'm a little worried about ambiguities without -> or similar

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:18):

Blue { foo } could either be Blue -> {foo:foo}, or it could be Blue {foo} (i.e. a tag with a record pattern)... and then we need to keep parsing to find out what the branch body is.

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:21):

Actually no, it's not the first thing. (brain fart)

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:22):

It's ambiguous between {foo} being the body and {foo} being a record pattern with the body to come later.

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:22):

You can keep going with that logic indefinitely

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:23):

Err actually wait nvm ignore me; that'd be Blue{(foo})
EDIT, nope: Blue({foo})

view this post on Zulip Joshua Warner (Feb 20 2025 at 17:23):

:man_facepalming:

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:30):

beautifully cursed brackets {(})

view this post on Zulip Isaac Van Doren (Feb 20 2025 at 17:35):

I like that the syntax requires the braces so that you never have to decide if they should be included or not

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:37):

It's a bit inconsistent with lambdas because it's still optional there

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:37):

but I think I prefer it remains optional

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:48):

To remain optional would just be?

match_time = |a| {
    match a {
        Blue 47
        Green 19
        Red 12
        lower 1
        [1, 2, 3, .. as rest] 123
        3.14 314
    }

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:48):

yeah, that's what I think it should be

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:48):

but then the formatter adds braces for clarity

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:48):

it's conceptually simple: you just alternate patterns and expressions, that's it

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:48):

but not in lambdas

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:49):

(the formatter doesn't add braces)

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:49):

right, just in match

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:49):

match { 1 2 1 2 1 2 } would technically be valid :stuck_out_tongue:

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:49):

at least from a parsing perspective

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:49):

with 1s being patterns and 2s being expressions

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:49):

that might be nice too for typing speed

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:49):

yeah that's the symmetry with if

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:50):

where like if True False else True is valid

edit: was originally if True False True - forgot the else!

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:50):

but the formatter adds braces for clarity

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:50):

if True False True how is this valid?

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:50):

so if you want to take the shortcut when typing, you can

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:50):

No else of else if

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:50):

oops, fixed!

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:50):

ah

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:51):

I kinda hate the lack of grouping in all of this, but I guess the formatter readds it, so maybe ok

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:51):

in general the theme is "braces are never required, but the formatter may add them"

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:51):

That said, I'm really not a fan of same line things without grouping

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:51):

we could soft enforce them with a warning

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:52):

could, but probably unnecessary

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:52):

Even

if True
    False

and

if True False

Feel like they shouldn't be allowed.

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:52):

I like the conceptual simplicity though - braces are a special case of expressions

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:52):

I know

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:52):

so anywhere you're using them as an expression, you could of course omit them

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:53):

But feels like conceptional simplicity being traded for allowing messy code

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:53):

Extra symbols and splitting things up can definitely help with readability

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:53):

I hear that, but there's all sorts of messy code you can write if you don't use the formatter :laughing:

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:53):

and I think we could reasonably add a warning if it seems like people are actually doing it in practice

view this post on Zulip Richard Feldman (Feb 20 2025 at 17:54):

my assumption is that they wouldn't

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:54):

idk...a lot of people like being terse way too much

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 17:54):

but yeah, probably just a limited few

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:55):

I like the idea of allowing a lot of common mistakes to parse/run, but still discourage them with a warning, and have the formatter fix them automatically

view this post on Zulip Agus Zubiaga (Feb 20 2025 at 17:56):

it's the best of both worlds

view this post on Zulip Brendan Hansknecht (Feb 20 2025 at 18:46):

As long as the formatter never fails and can format all bad code consistently

view this post on Zulip Anthony Bullard (Feb 20 2025 at 21:39):

So basically I should just remove the need for the arrow, and in the formatter force braces?

view this post on Zulip Anthony Bullard (Feb 20 2025 at 21:39):

And remove the need for commas? @Richard Feldman

view this post on Zulip Anthony Bullard (Feb 20 2025 at 21:39):

I personally think we should keep commas between branches

view this post on Zulip Anthony Bullard (Feb 20 2025 at 21:40):

As it is an unbounded list of things

view this post on Zulip Anthony Bullard (Feb 20 2025 at 21:41):

Lists, Tuples, Record Fields, Function Args, Lambda Args, Exposes items, package entries, they all require , between the "items"

view this post on Zulip Anthony Bullard (Feb 20 2025 at 21:42):

Joshua Warner said:

Anthony Bullard I think we do want NoSpaceColon for now. We can always have the parser accept either token in cases where it’s not ambiguous.

How then are we going to distinguish between a record or a block starting with a type annotation with backtracking?

view this post on Zulip Luke Boswell (Feb 20 2025 at 21:59):

Should this be spun out into an ideas thread? #compiler development > zig compiler - parser @ 💬

I feel like we should put together some larger examples. My concern is the strangeness budget with removing the arrows which seem to be pretty universal in other languages. Not saying I couldn't get used to it, but just not sure discussion in the middle of a parser thread really meets our usual standard for these things.

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:01):

The braces on a single line look a little strange to me { 4 } ... I like the braces for multiple lines though.

view this post on Zulip Richard Feldman (Feb 20 2025 at 22:03):

Anthony Bullard said:

Lists, Tuples, Record Fields, Function Args, Lambda Args, Exposes items, package entries, they all require , between the "items"

yeah, but switch statements don't, and when they're all on different lines, it just feels to me like a chore that doesn't add value

view this post on Zulip Richard Feldman (Feb 20 2025 at 22:04):

this does make me realize the formatter could add missing commas in lists and records though! :smiley:

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:05):

I'm not sure if these are formative ideas, or decisions from our BDFN

view this post on Zulip Richard Feldman (Feb 20 2025 at 22:06):

fair point on starting a thread about -> in match, I'll write one up later

view this post on Zulip Richard Feldman (Feb 20 2025 at 22:07):

mainly I just saw the example code, thought "oh we definitely don't need commas" and then thought I'd mention that idea while I was at it - but it does deserve its own thread

view this post on Zulip Richard Feldman (Feb 20 2025 at 22:08):

the formatter being able to add missing commas in lists and records (and tuples!) is just an observation about a convenience we could add

view this post on Zulip Richard Feldman (Feb 20 2025 at 22:08):

not a syntax change, just error recovery

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:14):

I think we could do the same here

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:15):

Allow commas, but not require

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:15):

And the formatter add them

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:15):

This is what is currently implemented:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue { 47 },
        \\    Green { 19 },
        \\    Red { 12 },
        \\    lower { 1 },
        \\    [1, 2, 3, .. as rest] { 123 },
        \\    3.14 { 314 },
        \\}
        \\
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:15):

I can't as a human parse the branches without at least the commas

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:16):

switch gets away without them because it has case keyword before each branch

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:17):

And usually in C-like languages : between the "pattern" and the block (and also many times a near-mandatory break; statements at the end of a case block.)

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:18):

Sometimes syntax that is unnecessary for machine parsing is very much necessary for human visual parsing

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:19):

To your point though Richard, I think we at a point where for machine parsing there is no need for commas in ANY collection-like syntactic construct

view this post on Zulip Anthony Bullard (Feb 20 2025 at 22:19):

They are there for the humans :-)

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:33):

Richard Feldman said:

what if we made match consistent with if and have it just use curly braces instead of a special -> and , thing?

I made a PR for realword to see what this looks like https://github.com/rtfeldman/roc-realworld/pull/1

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:33):

It's based on @Anthony Bullard's current implementation of match above. I'm not sure if the commas are optional but I included them.

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:34):

I really like with zig how adding the comma lets the formatter know to make it multi-line or not... if we can borrow that feature I think that would be nice.

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:35):

I think I missed a couple of matches ... I'll add those (done)

view this post on Zulip Luke Boswell (Feb 20 2025 at 22:44):

From making this PR.. I would say removing -> and just using braces feels good.

match client.query!(cmd) {
    Ok([row]) { Article.fromRow(row).(Ok) },
    Ok([]) { Err(NotFound) },
    Err(db_err) { Err(InternalErr(db_err.inspect())) },
}

view this post on Zulip Richard Feldman (Feb 20 2025 at 23:03):

Anthony Bullard said:

And the formatter add them

we could, I just don't see a benefit :sweat_smile:

view this post on Zulip Richard Feldman (Feb 20 2025 at 23:04):

if we're already adding braces on every branch, it's super clear where each one begins and ends

view this post on Zulip Richard Feldman (Feb 20 2025 at 23:04):

to put it another way, I don't see why }, is a better delimiter for every branch than just }

view this post on Zulip Anthony Bullard (Feb 20 2025 at 23:05):

That's fair

view this post on Zulip Anthony Bullard (Feb 20 2025 at 23:08):

Pushed a change to make commas optional, and not included in the formatted output

view this post on Zulip Anthony Bullard (Feb 20 2025 at 23:09):

match_time = |a| match a {
    Blue { 47 }
    Green { 19 }
    Red { 12 }
    lower { 1 }
    [1, 2, 3, .. as rest] { 123 }
    3.14 { 314 }
}

view this post on Zulip Richard Feldman (Feb 20 2025 at 23:16):

I made #ideas > `match` without `->` to discuss the design question!

view this post on Zulip Anthony Bullard (Feb 20 2025 at 23:43):

Tuple patterns are implemented in my PR, thought I'd get records done also in this little 30 minute stretch - but I hit a weird bug trying to fix an obvious tokenizer bug.

@Joshua Warner Currently when we read a (, we have whether we emit a NoSpaceOpenRound or OpenRound token flipped from how it should be. But when I fixed that, we get an infinite loop. These are impossible to debug in Zig with print debugging because the output with std.debug.print just never gets flushed. Could you take a look at that sometime? Not a blocker or high priority, but should be fixed I think

view this post on Zulip Anthony Bullard (Feb 20 2025 at 23:45):

Here's the currently implemented syntax...focusing on solely match here:

test "Syntax grab bag" {
    try statementFmtsSame( // This is a made-up function that doesn't actually exist :-)
        \\match_time = |a| match a {
        \\    Blue { 47 }
        \\    Green { 19 }
        \\    Red { 12 }
        \\    lower { 1 }
        \\    [1, 2, 3, .. as rest] { 123 }
        \\    3.14 { 314 }
        \\    (1, 2, 3) { 123 }
        \\}
    );
}

view this post on Zulip Joshua Warner (Feb 21 2025 at 04:31):

@Anthony Bullard I repro'd one infinite loop at least, and doing this works for me:

diff --git a/src/check/parse/Parser.zig b/src/check/parse/Parser.zig
index 5ba9d25af4..aec25d5179 100644
--- a/src/check/parse/Parser.zig
+++ b/src/check/parse/Parser.zig
@@ -577,7 +577,7 @@ pub fn parseExpr(self: *Parser) IR.NodeStore.ExprIdx {
     if (expr) |e| {
         var expression = e;
         // Check for an apply...
-        if (self.peek() == .OpenRound) {
+        if (self.peek() == .NoSpaceOpenRound) {
             const scratch_top = self.store.scratch_exprs.items.len;
             self.advance();
             while (self.peek() != .CloseRound) {
diff --git a/src/check/parse/tokenize.zig b/src/check/parse/tokenize.zig
index c831a5ef7a..e1e180aab7 100644
--- a/src/check/parse/tokenize.zig
+++ b/src/check/parse/tokenize.zig
@@ -992,7 +992,7 @@ pub const Tokenizer = struct {
                 '(' => {
                     self.cursor.pos += 1;
                     self.stack.append(.Round) catch exitOnOom();
-                    self.output.pushTokenNormal(if (sp) .NoSpaceOpenRound else .OpenRound, start, 1);
+                    self.output.pushTokenNormal(if (sp) .OpenRound else .NoSpaceOpenRound, start, 1);
                 },
                 '[' => {
                     self.cursor.pos += 1;

view this post on Zulip Joshua Warner (Feb 21 2025 at 04:33):

The trick I've taken to doing to debug hanging tests is:

I assume there must be a better way to do this tho. Maybe @Andrew Kelley knows some tricks?

view this post on Zulip Anthony Bullard (Feb 21 2025 at 11:42):

I think you are right about the debugging process. I guess I need to improve my lldb skills. (Or just set up DAP in my Neovim)

view this post on Zulip Anthony Bullard (Feb 21 2025 at 11:43):

And I'm going to add the above patch to my PR

view this post on Zulip Anthony Bullard (Feb 21 2025 at 15:20):

Current syntax supported in my latest PR:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    {foo: 1, bar: 2} -> 12
        \\    {foo: 1, bar: 2 | 7} -> 12
        \\}
        \\
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:40):

^ UPDATED THE ABOVE

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:40):

Only thing left in patterns is Tags with payloads

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:40):

Can we move to => for match? Seemed like @Richard Feldman was pushing for that, but I may be mistaken

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:40):

Oh yeah! Also

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:41):

....?

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:41):

.. as rest might need to be allowable, but I think we were trying to prefer ..rest

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:41):

Oh

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:41):

Ok, so as is dead?

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:41):

No, which is a problem...

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:41):

And are we 100% on fat arrows for match?

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:41):

No

view this post on Zulip Joshua Warner (Feb 21 2025 at 23:42):

I would prefer landing the current diff and iterating on it later :)

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:42):

Good idea

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:42):

Yeah

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:42):

I've already implemented 5 syntaxes for match :rofl:

view this post on Zulip Luke Boswell (Feb 21 2025 at 23:42):

I thought I might spend a couple hours today tinkering with the Parse IR and see if I can make it generate an SExpr for a snapshot.

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:43):

We actually probably don't want to support ..<pattern> in record destructs because we already have .. as <ident> and there shouldn't be two ways to do the same thing

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:46):

Wait .. is followed by a pattern?

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:46):

That doesn't seem right...

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:46):

Yeah, I agree

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:46):

I think we should use ..<ident> in both lists and records

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:46):

Well, if we support that in records, then we can do { foo: 123, ..rest } and also { foo: 123, .. as rest }

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:47):

Gotta go eat dinner, I'll try to pop back on and take care of that conflict

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:47):

Maybe we can put a warning on the second

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:47):

I would rather just remove the second

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:47):

There is no advantage to it

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:47):

Agreed, except for the fact that there's now a specific use of as that doesn't work

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:48):

I actually haven't even implemented Record Pattern rest

view this post on Zulip Anthony Bullard (Feb 21 2025 at 23:48):

Which is?

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:48):

as is usually

pub const AsPattern = struct {
    pattern: Pattern,
    ident: Ident,
    region: Region,
};

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:48):

But it can't be just Pattern anymore

view this post on Zulip Sam Mohr (Feb 21 2025 at 23:49):

If you can make it work though, go for it

view this post on Zulip Anthony Bullard (Feb 22 2025 at 00:28):

Yeah, I removed this in my PR and made as a part of rest

view this post on Zulip Anthony Bullard (Feb 22 2025 at 00:44):

My PR is rebased and ready for review

view this post on Zulip Anthony Bullard (Feb 22 2025 at 00:44):

Next up for me is Type Annotations and Declarations

view this post on Zulip Richard Feldman (Feb 22 2025 at 01:41):

Sam Mohr said:

Can we move to => for match? Seemed like Richard Feldman was pushing for that, but I may be mistaken

let's do it for now just to address the parsing ambiguity on Ok(a) if a->b->c

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:14):

No parsing ambiguity as I haven't even implemented record access / static dispatch, let alone -> (which we should have a name for)

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:14):

:smile:

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:15):

But I can put it with my current change, which actually I tried to sneak into the last PR but was too late

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:15):

(Tags with payload patterns)

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:15):

@Anthony Bullard you may need to rebase any PR's, we just merged a change that changed a few things across the compiler

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:16):

I have no outstanding PRs, but thanks I'll rebase now!

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:16):

Did we land interning?

view this post on Zulip Sam Mohr (Feb 22 2025 at 02:16):

Yes

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:16):

Holy hell, 8 commits?

view this post on Zulip Sam Mohr (Feb 22 2025 at 02:16):

Josh landed a primitive version

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:16):

Ok, I've rebased....I'm holding on to my seat as I rebase

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:16):

I'm currently trying to figure out how to get the top-level nodes out of the parse AST

test "example s-expr" {
    const source =
        \\module []
        \\
        \\foo = "bar"
    ;
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var env = base.ModuleEnv.init(&arena);
    var parse_ast = parse(&env, testing.allocator, source);
    defer parse_ast.deinit();

    var iter = parse_ast.store.nodes.iterIndices();
    while (iter.next()) |node| {
        std.debug.print("{}", .{parse_ast.store.getStatement(node)});
    }
}

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:17):

Use parse_ast.store.getFile(), and then follow on from there

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:17):

The file will have the header and statements, and if you read the types it should be easy to figure out what store method to use to get children

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:18):

I can talk later if you need help

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:18):

Actually, look at fmt.zig

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:18):

From formatFile on

view this post on Zulip Sam Mohr (Feb 22 2025 at 02:18):

We need mental institutionalization, not just help

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:18):

Your code should be similar in how you walk the AST

view this post on Zulip Joshua Warner (Feb 22 2025 at 02:18):

BTW I'm currently reworking string tokenizing to make a little more sense

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:18):

Sam Mohr said:

We need mental institutionalization, not just help

Some more than others :-)

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:19):

I have a 4-year-old

view this post on Zulip Sam Mohr (Feb 22 2025 at 02:19):

*You have a 4-year-old and mental problems

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:22):

thread 12285665 panic: reached unreachable code --- :man_facepalming:

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:23):

I shouldn't be allowed to touch these things

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:23):

Errr

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:23):

What’s the sourxe

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:23):

And your code

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:23):

And the error?

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:24):

@Luke Boswell

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:24):

See your DM's

view this post on Zulip Agus Zubiaga (Feb 22 2025 at 02:25):

omg I didn’t take a screenshot of it, but Apple Intelligence summarized the notifications in this channel as “Anthony has mental problems; Luke panicked”

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:26):

Fair

view this post on Zulip Agus Zubiaga (Feb 22 2025 at 02:26):

That’s one way to make me read Zulip

view this post on Zulip Sam Mohr (Feb 22 2025 at 02:27):

More crazy talk, got it

view this post on Zulip Anthony Bullard (Feb 22 2025 at 02:28):

I think it’s a fair summary

view this post on Zulip Richard Feldman (Feb 22 2025 at 02:32):

nobody else here knows it, but Agus went HAM on the recent Zed release, it was really impressive!

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:33):

Big W for Zed

view this post on Zulip Richard Feldman (Feb 22 2025 at 02:33):

people who joined after him probably assume he's been at Zed for years based on that :laughing:

view this post on Zulip Sam Mohr (Feb 22 2025 at 02:33):

We are thankful for your efforts! It's been powering my collab with Luke for the last week or two

view this post on Zulip Luke Boswell (Feb 22 2025 at 02:33):

Cant say if this has improved productivity...

view this post on Zulip Anthony Bullard (Feb 23 2025 at 22:40):

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

Syntax supported with my latest parser+formatter PR

view this post on Zulip Anthony Bullard (Feb 23 2025 at 22:41):

Next up: Record literal expressions

view this post on Zulip Anthony Bullard (Feb 23 2025 at 22:42):

After that: BinOps ( :scared: )

view this post on Zulip Richard Feldman (Feb 24 2025 at 00:04):

for binops, is Pratt Parsing on your radar already?

view this post on Zulip Anthony Bullard (Feb 24 2025 at 00:22):

you've mentioned it

view this post on Zulip Anthony Bullard (Feb 24 2025 at 00:22):

I'll read up on it

view this post on Zulip Sam Mohr (Feb 24 2025 at 00:42):

https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html

view this post on Zulip Andrew Kelley (Feb 24 2025 at 02:01):

Joshua Warner said:

The trick I've taken to doing to debug hanging tests is:

I assume there must be a better way to do this tho. Maybe Andrew Kelley knows some tricks?

That workflow seems reasonable to me. Is there a way you can imagine it being improved with some zig tooling changes?

view this post on Zulip Joshua Warner (Feb 24 2025 at 02:09):

Couple thoughts: (1) It would be convenient if the binary name was stable so I don’t need to do the ctrl+z switcheroo. Or perhaps it could print the full path of the currently running binary. (2) it would be useful for the runner to do something like print the test name that’s running (perhaps after some minimum timeout). (3) it could also be useful to be able to run only a single test via passing the name of that test on the command line (or perhaps the name of the file, if the test is anonymous)

view this post on Zulip Andrew Kelley (Feb 24 2025 at 05:36):

(1) you can get if you drop down into zig build-exe CLI with -femit-bin=foobar
(2) you can get with --verbose
(3) is --test-filter with the direct CLI, and you can introduce a -D build option into your build script and set the test filter there

if you combine (1) and (2) you can get what you want I think

one more random tip, always delete the --listen=- arg when running manually, unless you want to speak the Build Runner Protocol

view this post on Zulip Andrew Kelley (Feb 24 2025 at 05:37):

so to be clear you can use --verbose to get the zig build-exe or zig test command, then you can copy paste that, delete --listen=- add -femit-bin=foobar and then you have yourself the desired workflow

view this post on Zulip Andrew Kelley (Feb 24 2025 at 05:37):

as for --test-filter note that the filter is applied very early, so you can use this when doing large refactorings to only test a subset of unit tests even if the rest of your app is not compiling

view this post on Zulip Anthony Bullard (Feb 24 2025 at 14:48):

I'm running into a spot where Zig just doesn't enjoy passing functions. I want to have this helper:

fn parseCollection(self: *Parser, comptime T: type, end_token: Token.Tag, scratch: *std.ArrayList(T), parser: fn (*Parser) T) ExpectError!usize {
    const scratch_top = scratch.items.len;
    while (self.peek() != end_token) {
        scratch.append(parser(self)) catch exitOnOom();
        self.expect(.Comma) catch {
            break;
        };
    }
    self.expect(end_token) catch {
        return ExpectError.expected_not_found;
    };
    return scratch_top;
}

Since that exact kind of code happens at least 19 times in the parser. But if one of the parsers takes an argument, I understand I need to create a new top-level function that can be called with zero args here (calling the real function with a specific value for it's arg). But when you have recursion, usually the function is parametric on that argument, so we need to be able to call the correct parser when we recurse. Like this:

const parser: fn (*Parser) IR.NodeStore.PatternIdx = if (alternatives == .alternatives_allowed) parsePatternWithAlts else parsePatternNoAlts;

But no matter what I do, that gives me the following error:

check/parse/Parser.zig:449:58: error: value with comptime-only type 'fn (*check.parse.Parser) check.parse.IR.NodeStore.PatternIdx' depends on runtime control flow
    const parser: fn (*Parser) IR.NodeStore.PatternIdx = if (alternatives == .alternatives_allowed) parsePatternWithAlts else parsePatternNoAlts;
                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I guess i don't understand the argument here. The type of parser is always the same, and can be verified correctly at comptime. So why is this not allowed?

view this post on Zulip Anthony Bullard (Feb 24 2025 at 14:49):

Now obviously if we went with #ideas > ✔ `or` instead of `|` in `when` branches (or one of the similar proposals) then this problem goes away because there is no longer contention between pattern alternatives and |...| lambda args.

view this post on Zulip Richard Feldman (Feb 24 2025 at 14:53):

let's just try that

view this post on Zulip Richard Feldman (Feb 24 2025 at 14:53):

if we don't like or for alternatives in practice, we can always reconsider, but it seems like a reasonable design in its own right

view this post on Zulip Anthony Bullard (Feb 24 2025 at 14:57):

If you are OK with it, I will do it

view this post on Zulip Richard Feldman (Feb 24 2025 at 15:05):

go for it

view this post on Zulip Richard Feldman (Feb 24 2025 at 15:06):

worst-case scenario is that we find out we don't like it, which would justify the more involved implementation

view this post on Zulip Joshua Warner (Feb 24 2025 at 15:55):

@Anthony Bullard I wonder if instead of a function pointer, you could do a comptime enum for all the collection variants? Or perhaps you can pass in the desired return type as comptime, and you call ::parse() on that?

view this post on Zulip Brendan Hansknecht (Feb 24 2025 at 16:20):

The issue is that a function is a comptime only type

view this post on Zulip Brendan Hansknecht (Feb 24 2025 at 16:21):

You want to use a function pointer if the value can change at runtime

view this post on Zulip Brendan Hansknecht (Feb 24 2025 at 16:21):

Zig does not have first class functions or lambdas or closures

view this post on Zulip Brendan Hansknecht (Feb 24 2025 at 16:22):

I think the original code just needs *const fn ...

view this post on Zulip Andrew Kelley (Feb 24 2025 at 22:28):

I think you're gonna have a better time if you always pass functions comptime unless you're doing the vtable pattern

view this post on Zulip Andrew Kelley (Feb 24 2025 at 22:28):

the optimizer will definitely have a better time

view this post on Zulip Andrew Kelley (Feb 24 2025 at 22:30):

also I recognize this is more of a functional/imperative preference thing, but I reworked a contributor's code that used that parser function pointer pattern in zig's parser to simply not do that, and felt like the result was better. yeah it technically is less DRY

view this post on Zulip Andrew Kelley (Feb 24 2025 at 22:30):

anyway the comptime suggestion is irrelevant to that though

view this post on Zulip Anthony Bullard (Feb 24 2025 at 23:08):

Brendan Hansknecht said:

I think the original code just needs *const fn ...

Interesting .. why did I think that Zig did not support function pointers? I think moving towards or for pattern alternation will make a lot of things clearer in OUR grammar anyway, open up more places where it can be used, and also make this code a lot easier to understand.

view this post on Zulip Richard Feldman (Feb 24 2025 at 23:15):

yeah like in my mind both are totally reasonable options as an end user, just with some different tradeoffs, and since we've only tried one but have an implementation reason to try out the other, seems like a fine excuse to try it out and see how we like it in practice

view this post on Zulip Anthony Bullard (Feb 25 2025 at 12:12):

#7638 Parsing Records and Tuples

view this post on Zulip Anthony Bullard (Feb 25 2025 at 12:12):

Support syntax:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    tag = Blue
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    # New!
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    # New!
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
    );
}

view this post on Zulip Anthony Bullard (Feb 25 2025 at 12:13):

Next up: BinOps (via Pratt Parsing)

view this post on Zulip Anthony Bullard (Feb 25 2025 at 12:14):

(Note: I will come back to replace | with or, I want to push the supported syntax forward right now, and I think that BinOps are just critical to getting that done)

view this post on Zulip Anthony Bullard (Feb 25 2025 at 12:15):

After BinOps, I think you should be able to parse some simple programs

view this post on Zulip Andrew Kelley (Feb 25 2025 at 23:12):

That unit test looks familiar :)

view this post on Zulip Anthony Bullard (Feb 26 2025 at 13:52):

BinOps have landed (in my PR) - using Pratt Parsing which was soooooo easy (thanks @Richard Feldman ):

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\expect blah == 1
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1
        \\    blah = 1
        \\    blah == foo
        \\}
    );
}

view this post on Zulip Anthony Bullard (Feb 26 2025 at 13:56):

This test for the binops may also be helpful for people to correct any precedence or associativity errors I may have made:

test "BinOp omnibus" {
    const expr = "Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5";
    const expr_sloppy = "Err(foo)??12>5*5 or 13+2<5 and 10-1>=16 or 12<=3/5";
    const formatted = "((((Err(foo) ?? 12) > (5 * 5)) or (((13 + 2) < 5) and ((10 - 1) >= 16))) or (12 <= (3 / 5)))";
    try exprFmtsSame(expr, .no_debug);
    try exprFmtsTo(expr_sloppy, expr, .no_debug);
    try exprFmtsTo(expr, formatted, .debug_binop);
    try exprFmtsTo(expr_sloppy, formatted, .debug_binop);
}

view this post on Zulip Anthony Bullard (Feb 26 2025 at 13:57):

Not that the ()-laden formatted variable is using a debug flag in the formatter to show the boundaries of the individual operations (until we have full SExpr IR support here)

view this post on Zulip Sam Mohr (Feb 26 2025 at 17:48):

Anthony, I'm gonna have to break your legs

view this post on Zulip Sam Mohr (Feb 26 2025 at 17:48):

You're going too fast

view this post on Zulip Anton (Feb 26 2025 at 17:52):

Drastic actions :p

view this post on Zulip Anthony Bullard (Feb 26 2025 at 18:26):

Sam Mohr said:

You're going too fast

I’m only doing this an hour (or two) a day! I’ve felt like I’m moving at a glacial pace, so thanks for the pick-me-up

view this post on Zulip Anthony Bullard (Feb 26 2025 at 18:31):

But don’t feel bad - adding expect, crash, and return statements is causing a lot of problems

view this post on Zulip Sam Mohr (Feb 26 2025 at 18:41):

If there's problems from requiring return to not have statements after, then feel free to make the canonicalize code handle it

view this post on Zulip Sam Mohr (Feb 26 2025 at 18:41):

I presume that's not the main issue, though

view this post on Zulip Anthony Bullard (Feb 26 2025 at 18:52):

No I haven’t figured it out, hit it right before work so I’ll find out tonight or in the morning

view this post on Zulip Richard Feldman (Feb 26 2025 at 19:35):

oh yeah canonicalize should definitely handle that imo :big_smile:

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:56):

Updated the syntax for my PR above to include expect, crash, and return

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:56):

I think import exposing and the rest of the headers should be next

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:57):

And then we need to talk about static dispatch and record access

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:57):

I want to treat it similar to binop, but not being a binop

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:58):

It's <expr><apply args><try suffix><dot access><binop....> in terms of (not real) binding power

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:59):

Or maybe try suffix goes last...

view this post on Zulip Anthony Bullard (Feb 26 2025 at 23:59):

Do we want to support some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?

view this post on Zulip Brendan Hansknecht (Feb 27 2025 at 00:00):

That code looks like it should be valid

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:11):

Brendan Hansknecht said:

That code looks like it should be valid

And it is so....

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\expect blah == 1
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    # NEW!!!
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1
        \\    blah = 1
        \\    blah == foo
        \\}
    );
}

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:15):

I forgot I have unary ops (easy), and record updaters and record builders. What are the current status and agreed-upon syntax for them? The same as in the current alpha?

view this post on Zulip Luke Boswell (Feb 27 2025 at 03:15):

Yeah I think there's no change to Record Builder

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:17):

Are we doing something with .. for record update?

view this post on Zulip Luke Boswell (Feb 27 2025 at 03:17):

See https://github.com/roc-lang/roc/issues/7091

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:18):

Ah...ok, so it's like List rest

view this post on Zulip Luke Boswell (Feb 27 2025 at 03:18):

Or it's parent Issue https://github.com/roc-lang/roc/issues/7106 "Syntax Changes"

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:18):

Oooo, the ellipsis keyword! Almost forgot about that!

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:19):

Ok, I'll make sure this all gets done

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:19):

That'll leave me with Custom Types and the replacement syntax for abilities

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:19):

Do you have the issue handy for that? Or is there one?

view this post on Zulip Luke Boswell (Feb 27 2025 at 03:20):

https://github.com/roc-lang/roc/issues/7458

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:20):

I guess I just look at the static dispatch doc...

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:24):

And this is the doc for Custom Types...

https://docs.google.com/document/d/10OFeNl9KAYAErajE0Wio4AAR66yM2u13bku0mTUawVk/edit?tab=t.0

view this post on Zulip Anthony Bullard (Feb 27 2025 at 03:26):

If there has been any meaningful changes to this proposal (the thread had no concrete takeaways that were scannable), let me know

view this post on Zulip Luke Boswell (Feb 27 2025 at 03:32):

I'm wondering if we still need the * in module [User.*, SomethingElse, …etc]. Didn't we discuss somewhere that fields in a Custom Record would always be public.

view this post on Zulip Richard Feldman (Feb 27 2025 at 03:39):

for nominal types, the only one we're going to support is tag unions (not records or tuples after all)

view this post on Zulip Richard Feldman (Feb 27 2025 at 03:39):

and they do still need the .* syntax for making the tags optionally public

view this post on Zulip Luke Boswell (Feb 27 2025 at 03:47):

That should be the only use of * then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451

view this post on Zulip Anthony Bullard (Feb 27 2025 at 14:09):

Luke Boswell said:

That should be the only use of * then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451

PR for this: https://github.com/roc-lang/roc/pull/7642

view this post on Zulip Anthony Bullard (Feb 27 2025 at 14:20):

Anthony Bullard said:

Luke Boswell said:

That should be the only use of * then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451

PR for this: https://github.com/roc-lang/roc/pull/7642

Added ... expression to the above

view this post on Zulip Anthony Bullard (Feb 27 2025 at 14:21):

Update from the above PR:

test "Syntax grab bag" {
    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\
        \\Foo : (Bar, Baz)
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg some_func()
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |a| match a {
        \\    Blue | Green | Red -> {
        \\        x = 12
        \\        x
        \\    }
        \\    lower -> 1
        \\    "foo" -> 100
        \\    "foo" | "bar" -> 200
        \\    [1, 2, 3, .. as rest] -> 123
        \\    [1, 2 | 5, 3, .. as rest] -> 123
        \\    3.14 -> 314
        \\    3.14 | 6.28 -> 314
        \\    (1, 2, 3) -> 123
        \\    (1, 2 | 5, 3) -> 123
        \\    { foo: 1, bar: 2, ..rest } -> 12
        \\    { foo: 1, bar: 2 | 7 } -> 12
        \\    Ok(123) -> 123
        \\    Ok(Some(dude)) -> dude
        \\    TwoArgs("hello", Some("world")) -> 1000
        \\}
        \\
        \\expect blah == 1
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    # NEW!!!
        \\    ...
        \\    match_time(...)
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [add_one(number), 456, 789]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1
        \\    blah = 1
        \\    blah == foo
        \\}
    );
}

view this post on Zulip Luke Boswell (Mar 07 2025 at 07:13):

Ok... so what should be happening with malformed nodes?

For example say I have a roc file that is literally

modZZZ [foo]

foo = 1

I have a malformed header here and so expect there to be an error pushed into the list of Diagnostics and a malformed node added to the AST.

Now if I want to format this AST ... what should happen when I get to this point?

pub fn getHeader(store: *NodeStore, header: HeaderIdx) Header {
    const node = store.nodes.get(@enumFromInt(header.id));
    switch (node.tag) {
        .app_header => { ... },
        .module_header => { ... },
        else => {
            std.debug.panic("Expected a valid header tag, got {s}", .{@tagName(node.tag)});
        },
    }
}

We call getHeader and then it blows up because we don't have a malformed node.

Maybe we might actually want to add a malformed node here... so I gave that a try. Heres a PR that does that for just the header https://github.com/roc-lang/roc/pull/7672

Am I on the right track?

view this post on Zulip Luke Boswell (Mar 07 2025 at 07:14):

Side note -- I think I found a memory leak in the error diagnostics here by accident.

$ zig build snapshot
warning: file /Users/luke/Documents/GitHub/roc/src/snapshots/003.txt: contained 1 errors, skipping
info: processed 2 snapshots in 2 ms.
error(gpa): memory address 0x102928700 leaked:
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1081:62: 0x102873d1f in ensureTotalCapacityPrecise (snapshot)
                const new_memory = try allocator.alignedAlloc(T, alignment, new_capacity);
                                                             ^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1058:51: 0x1028628e3 in ensureTotalCapacity (snapshot)
            return self.ensureTotalCapacityPrecise(allocator, better_capacity);
                                                  ^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1111:41: 0x102847daf in addOne (snapshot)
            try self.ensureTotalCapacity(allocator, newlen);
                                        ^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:848:49: 0x10281b687 in append (snapshot)
            const new_item_ptr = try self.addOne(allocator);
                                                ^
/Users/luke/Documents/GitHub/roc/src/check/parse/Parser.zig:121:28: 0x1028462ef in pushMalformed__anon_11980 (snapshot)
    self.diagnostics.append(self.gpa, .{
                           ^
/Users/luke/Documents/GitHub/roc/src/check/parse/Parser.zig:943:38: 0x10281a103 in parseExprWithBp (snapshot)
            return self.pushMalformed(IR.NodeStore.ExprIdx, .unexpected_token);
                                     ^

view this post on Zulip Anthony Bullard (Mar 07 2025 at 15:59):

This is a great question for @Joshua Warner since he designed the malformed node system. Ideally it would just be something aggressive like (MALFORMED <tag>)

view this post on Zulip Anthony Bullard (Mar 07 2025 at 16:04):

Need some alignment on syntax supported in exposes/exposing list(both in headers and in imports)

import SomeModule exposing [
    lowerIdent!,
    UpperIdent,
    something as something_else, # ???
    Foo.*,
    Bar.{ fn_one, fn_two },
    Baz.{ fn_one as function_one, fn_two as function_two }, #???
]

view this post on Zulip Anthony Bullard (Mar 07 2025 at 16:04):

And are there differences worth worrying about between headers and imports?

view this post on Zulip Anthony Bullard (Mar 07 2025 at 16:05):

(At least for parsing)

view this post on Zulip Anthony Bullard (Mar 07 2025 at 16:05):

@Richard Feldman since you are the keeper of the syntax :-)

view this post on Zulip Sam Mohr (Mar 07 2025 at 17:30):

Those last two don't exist anymore, right?

view this post on Zulip Richard Feldman (Mar 07 2025 at 17:42):

also .* should not be allowed in imports

view this post on Zulip Richard Feldman (Mar 07 2025 at 17:43):

only in the specific case of a module header that's exposing a nominal tag union

view this post on Zulip Sam Mohr (Mar 07 2025 at 17:43):

You're thinking of glob exposes for custom unions, Anthony

view this post on Zulip Anthony Bullard (Mar 07 2025 at 18:44):

Probably. I think I forgot custom types became a tag union rather than a record

view this post on Zulip Anthony Bullard (Mar 07 2025 at 18:46):

So

import SomeModule exposing [
    lowerIdent!,
    UpperIdent,
    something as something_else,
]

And

module exposing [
    lowerIdent!,
    UpperIdent,
    something as something_else,
    Foo.*,
]

view this post on Zulip Anthony Bullard (Mar 07 2025 at 18:47):

I think I’ll parse the same and let Can throw an error on .* in an import

view this post on Zulip Sam Mohr (Mar 07 2025 at 18:57):

That all looks right to me, except for module exposing [a as b], not sure if that's planned for support

view this post on Zulip Joshua Warner (Mar 07 2025 at 18:59):

Anthony Bullard said:

This is a great question for Joshua Warner since he designed the malformed node system. Ideally it would just be something aggressive like (MALFORMED <tag>)

Yep this is exactly what I was thinking

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:53):

Sam Mohr said:

That all looks right to me, except for module exposing [a as b], not sure if that's planned for support

If it's not, it should :-)

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 00:55):

Renaming on expose? Why?

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 00:55):

Feels like that is a sign the function was simply named wrong

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:56):

It's not uncommon to encounter a situation where more than one import exposes a function/type that has the same name that you may want to use without namespacing

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:57):

But now that I think of it, maybe that mostly disappears with SD

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:57):

I think of multiple types having a map function for instance

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:58):

Ok, so the only different thing available here is the .* syntax in headers

view this post on Zulip Sam Mohr (Mar 08 2025 at 00:58):

It would make SD more complicated than "the module needs a function named xyz" to allow renaming two things to the same name in the same module

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:58):

Man that seems like such a silly thing to have to create a new node type for

view this post on Zulip Sam Mohr (Mar 08 2025 at 00:58):

And besides that, there's no value to doing it there besides at the module name

view this post on Zulip Sam Mohr (Mar 08 2025 at 00:59):

So I vote actively not adding that feature

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 00:59):

My specific comment is rename on exposing. Rename on import makes sense to me

view this post on Zulip Sam Mohr (Mar 08 2025 at 00:59):

Anthony Bullard said:

Man that seems like such a silly thing to have to create a new node type for

copium

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:59):

Brendan Hansknecht said:

My specific comment is rename on exposing. Rename on import makes sense to me

I agree with you on the header

view this post on Zulip Anthony Bullard (Mar 08 2025 at 00:59):

Sam Mohr said:

Anthony Bullard said:

Man that seems like such a silly thing to have to create a new node type for

copium

:stuck_out_tongue_closed_eyes:

view this post on Zulip Richard Feldman (Mar 08 2025 at 05:44):

the benefit is to avoid shadowing

view this post on Zulip Richard Feldman (Mar 08 2025 at 05:45):

let's say I have a module named Parser and I want to expose Parser.str but I also want to use the name str all over the place in Parser.roc

view this post on Zulip Richard Feldman (Mar 08 2025 at 05:46):

as in the module header's exposing means I can locally name it inner_str (or whatever) and then expose inner_str as str so everyone outside the module gets the nice name of Parser.str but inside Parser.roc I still get to use the name str for argument names etc.

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 06:04):

I guess

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 06:04):

I feel like that is rare

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 06:05):

And if you really need it, you can make Parser.roc that reexports things form InnerParser.roc

view this post on Zulip Brendan Hansknecht (Mar 08 2025 at 06:05):

No strong feels, just feel unnecessary.

view this post on Zulip Anthony Bullard (Mar 08 2025 at 15:28):

I feel like, reading @Richard Feldman 's message, that I should implement as in exposing

view this post on Zulip Luke Boswell (Mar 08 2025 at 22:07):

I think I can parse a crash at the top level.. but not in a def.

This blows up because it makes that a malformed node

module []

thing =
    crash "something"

But this is fine

module []

crash "something"

view this post on Zulip Luke Boswell (Mar 08 2025 at 22:11):

Actually I think it's the same for expect and return too

view this post on Zulip Sam Mohr (Mar 08 2025 at 22:24):

That shouldn't break, yes, but I think in this case it's looking for braces

view this post on Zulip Sam Mohr (Mar 08 2025 at 22:25):

Which I think is good, because it keeps statement-only behaviors from working as expressions

view this post on Zulip Sam Mohr (Mar 08 2025 at 22:25):

Though a proper error message would be nice

view this post on Zulip Luke Boswell (Mar 08 2025 at 22:26):

I thought we decided those wouldn't have braces -- edit err parens (?

view this post on Zulip Luke Boswell (Mar 08 2025 at 22:27):

Interesting... so this parses correctly

module []

foo = { crash "something" }

But the formatter removes the braces {, and then it blows up

view this post on Zulip Sam Mohr (Mar 08 2025 at 22:28):

Richard Feldman said:

here is a concrete proposal that I'm happy with. If anyone would rather we didn't do this, please say why!

  1. Change the way ?? return "" formats to have the formatter add braces so it becomes ?? { return "" } and same with crash. No changes to the semantics of anything involved in this; it's purely to address the concern over it being not visually obvious enough what the code does.
  2. Introduce if Ok(path) = paths.first() { pattern matching. We explored a bunch of alternatives and this still feels like the best solution to the problem at the top of the thread.

that's it, no other changes. Both of these are addressing specific ergonomics concerns with the status quo, and are not trying to go back to the drawing board and reconsider everything.

view this post on Zulip Sam Mohr (Mar 08 2025 at 22:28):

Okay, I was changing remembered history in bias of my preferences

view this post on Zulip Sam Mohr (Mar 08 2025 at 22:28):

Classic

view this post on Zulip Anthony Bullard (Mar 08 2025 at 23:48):

Luke Boswell said:

Interesting... so this parses correctly

module []

foo = { crash "something" }

But the formatter removes the braces {, and then it blows up

This is my fault. Kind of. Crash is a statement but not an expression right now, and statements are only parsed in the top level and inside of blocks

But the formatter removes the curlies form single statement blocks - assuming they are just an expression

view this post on Zulip Anthony Bullard (Mar 08 2025 at 23:49):

The only way to resolve this in the parser would be to lookup what kind of statement is in the single statement blocks to ensure it is a Expr

view this post on Zulip Anthony Bullard (Mar 08 2025 at 23:50):

But I can’t push anything today due to a day long outage by Comcast in near-west Chicago

view this post on Zulip Luke Boswell (Mar 08 2025 at 23:57):

It's alg. I'm just noting things as I find them.

I'm often not sure if its me, the parser impl or what the intended syntax is.

It's not urgent or anything.

view this post on Zulip Luke Boswell (Mar 09 2025 at 22:13):

I've just been looping with the fuzzer and fixing bugs one by one in https://github.com/roc-lang/roc/pull/7672

I'm not 100% my approach is right so I'm just keeping it in draft until I get some feedback or we have a clear direction for handling errors.

I'm just picking the first fuzz crash, fixing that and making a snapshot test for it, then moving on to the next.

I'm expecting there will be changes required and happy to do that later. But I figure it doesn't hurt to just keep going for now.

view this post on Zulip Luke Boswell (Mar 11 2025 at 22:05):

I just noticed we have two different Region types. One has indexes into the token buffer, so token indexes. The other has indexes into the source code bytes.

Should we rename one of these, e.g. TRegion or something to reflect it's spanning the tokens instead?

view this post on Zulip Luke Boswell (Mar 11 2025 at 22:07):

The other idea I have, is to move the Region that indexes tokens, under Token.. so it's a Token.Region instead

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:21):

It would be nice to unify them, yes. I think the other suggestion from @Joshua Warner was to make Region be a Node.Idx. That would allow us to use half the memory (we only need one u32 value instead of a start and end that are both u32), though some diagnostics might need to store more Regions when they refer to complex regions.

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:22):

At the very least, I agree that renaming one of the two Regions in our codebase would mitigate some confusion.

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:38):

I think the other suggestion from @Joshua Warner was to make Region be a Node.Idx.

Does this work in general? Won't a region in the parser be a range of tokens rather than a single token?

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:39):

Like the region of an expression will include many many tokens

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:43):

Yes, it would hold many tokens

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:44):

The tradeoff is that we can store everything we need a Region to do, which is display/highlight code for diagnostics, in 32 bits

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:44):

But at the cost of needing the parse AST to figure out what a Region refers to

view this post on Zulip Luke Boswell (Mar 11 2025 at 22:45):

If there is a tradeoff here, I lean towards keeping it brain dead simple for now... until we have a working compiler

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:45):

Agreed

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:45):

Though they both seem simple to me

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:47):

Option 1: two Region types, Region = { start: u32, len: u32 } and refers to a slice of the source code, TokenRegion is separate for the tokenizer

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:49):

Okay, looking now at the Parse.IR.Region type, they can definitely be the same thing

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:49):

/// The first and last token consumed by a Node
pub const Region = struct {
    start: TokenIdx,
    end: TokenIdx,
};

defined here

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:51):

I think we should either use Region = Parse.IR.Region or Region = Node.Idx

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:54):

Both of them assume that tokenizing/parsing has happened, but unless we either:

Even the Region = { start: u32, len: u32 } solution is fragile to source-code changes

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:55):

If we get diagnostics, we could rip out the relevant text before closing the file Nvm, late stages might have diagnostics...was just thinking about tokenize and parse diagnostics

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:55):

What about diagnostics (which I'm using interchangeably with errors/warnings) for type errors?

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:55):

yep

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:56):

I think we just reopen the file and if it changed, oh well

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:56):

Yeah, not worth making our compiler more memory-bloated to handle this

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:57):

Also, keeping original source ranges enables use to just drop the tokenized buffer. So I lean towards that.

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:57):

The plan was to just tokenize/parse a second time

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:57):

If we do that, then we can use a single u32

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:58):

The start offset to tokenize/parse from?

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:58):

The Parse.IR.Store.Node.Idx

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 22:59):

Oh, and just assume that reparsing will generate the exact same node. Then from the node get the start and end token and from those get the region start and end bytes

view this post on Zulip Sam Mohr (Mar 11 2025 at 22:59):

Yep!

view this post on Zulip Sam Mohr (Mar 11 2025 at 23:00):

I don't believe that would be much harder than the current approach, and neither solution is implemented yet since error rendering is entirely untouched

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 23:00):

Just trying to think about the implications of llvm debug info generation (and interpreter single stepping and printing surrounding context).

view this post on Zulip Sam Mohr (Mar 11 2025 at 23:01):

The interpreter would probably want to keep a parse AST for every file, or even the source of the file at time of interpretation

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 23:01):

For llvm debug info means that we will reparse every single file (but this reduce max memory requirement of the compiler and llvm is really just for optimized builds so reparsing should be really cheap).

view this post on Zulip Sam Mohr (Mar 11 2025 at 23:02):

I was suggesting to @Joshua Warner that we used to parse just the header of files when looking for package dependencies, and he said "I'm hoping parsing is so fast that we don't need to worry about partial parsing"

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 23:02):

Sam Mohr said:

The interpreter would probably want to keep a parse AST for every file, or even the source of the file at time of interpretation

Yeah, quite possibly. Though don't want to ruin memory use due to a single gigantic file. So might want to reload only if needed. Cause execution flow doesn't need any line info, only repl and debug flows.

view this post on Zulip Sam Mohr (Mar 11 2025 at 23:02):

I think parsing can be super fast

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 23:03):

yeah, all sounds reasonable

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 23:03):

I'll be really curious to bench parsing against the rust compiler

view this post on Zulip Sam Mohr (Mar 11 2025 at 23:03):

Benchmarking of just parsing and also eventually typechecking as well would be awesome to see side-by-side!

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 23:05):

Yeah, full roc check flow once it is working

view this post on Zulip Luke Boswell (Mar 11 2025 at 23:05):

since error rendering is entirely untouched

That's not entirely true anymore

view this post on Zulip Luke Boswell (Mar 11 2025 at 23:06):

https://github.com/roc-lang/roc/blob/13c5152cb736d7d46a599fe4624e4ddd58d8a1a5/src/problem.zig#L82

view this post on Zulip Sam Mohr (Mar 11 2025 at 23:24):

Good to know!

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:12):

Holy cow

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:13):

Don't have enough time right now to read all of this, but I have a new PR that gets us down the path towards multiline and commented code formatting: https://github.com/roc-lang/roc/pull/7695

Here's a preview of the supported syntax:

    try moduleFmtsSame(
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\... # Elided for clarity and terseness - I love that this is parsed as valid code :-)
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| {
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return tag
        \\    ...
        \\    match_time(...)
        \\    crash "Unreachable!"
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [
        \\        add_one(number), # Comment one
        \\        456, # Comment two
        \\        789, # Comment three
        \\    ]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    multiline_tuple = (
        \\        123,
        \\        "World",
        \\        tag1,
        \\        Ok(world), # This one has a comment
        \\        (nested, tuple),
        \\        [1, 2, 3],
        \\    )
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!("How about ${Num.toStr(number)} as a string?")
        \\}
        \\
        \\expect {
        \\    foo = 1 # This should work too
        \\    blah = 1
        \\    blah == foo
        \\}
    );

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:13):

From here it'll be pretty mechanical

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:14):

I need to add a function to get comments at the start of a multiline construct, and do some massaging of the comment to collapse excessive newlines

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:15):

And of course adopt my new formatCollection function in about 12 places, adjusting some bad Region calculations along the way

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:16):

The hardest part will definitely be the headers more than likely

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:22):

And at the end I'll try to incorporate "trailing comma multiline forcing"

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:24):

Sam Mohr said:

Benchmarking of just parsing and also eventually typechecking as well would be awesome to see side-by-side!

I'm also interested in this as well (obviously), but it won't be quite a fair fight since the grammar is in many ways much easier to parse now without WSS

view this post on Zulip Anthony Bullard (Mar 12 2025 at 14:24):

And also, we are tokenizing before parsing now

view this post on Zulip Brendan Hansknecht (Mar 12 2025 at 15:48):

I don't think it needs to be fair. Part of the rewrite of the compiler was improving the grammar

view this post on Zulip Brendan Hansknecht (Mar 12 2025 at 15:49):

It is just important to remember that it is a multifaceted comparison, not just rust vs zig or DOD vs not or etc.

view this post on Zulip Brendan Hansknecht (Mar 12 2025 at 15:49):

Cause at the end of the day, users will care about the compiler perf and won't particularly care about any of those other details.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 07:57):

These are only super preliminary numbers... I'm really hoping I didn't mess anything up. I don't think I did, but the numbers feel too good to be true:

M1 Mac

X86 Linux

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 07:59):

Note, the old compiler does have a big disadvantage that it is way larger and loads slower. That said, this is parsing and formatting 100 files with ~1000 lines (24kb each), so I don't think load time should be a big factor.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:00):

more stats via poop

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:02):

Aside, despite using 99% less memory to parse a file, I am still stunned how much memory we use. In an allocation tracking profiler, I see ~10x memory usage compared to source file size.... Actually, that makes a lot of sense. All tokens take up 12bytes. Which is ~10x more than the single byte a token takes up in the original source file.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:04):

I guess this is way we may not want to store the tag or length of tokens. Saves a ton of memory and retokenizing is theoretically super fast.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:05):

Also cc @Andrew Kelley cause I know you were interesting in what perf numbers we get with the new compiler. As said, very preliminary numbers, but the parse to format loop looks to be 5 to 10x faster with 500x less memory usage.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:15):

Ah, I think I found some of the discrepancy. The old compiler has extra checks to ensure that formatting is stable and not bugged. I need to disable those and see what cost savings it gets.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:26):

Ok, disabled all the extra reformatting and validation logic. So both codebases are just doing their parse to format loop. So should be more apples to apples now. We are still 3 to 5x faster with the new compiler (and 200x less memory).

M1 Mac

X86 Linux

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:27):

And detailed stats with poop still look great (this is with hot cache unlike the commands above).

stats

view this post on Zulip Sam Mohr (Mar 15 2025 at 08:27):

And we haven't even looked at SIMD yet!

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 08:28):

Also, about half of execution time is spent in tokenizing, quarter in parsing, and quarter in formatting.

view this post on Zulip Richard Feldman (Mar 15 2025 at 13:06):

wowwww, this is great!

view this post on Zulip Richard Feldman (Mar 15 2025 at 13:07):

I'm personally excited that half is tokenizing - means the simd style might actually have a benefit beyond being fun to try implementing :grinning_face_with_smiling_eyes:

view this post on Zulip Richard Feldman (Mar 15 2025 at 13:11):

RIIZ :high_voltage:

view this post on Zulip Richard Feldman (Mar 15 2025 at 13:14):

are we missing any parsing features that might make it apples-to-oranges?

view this post on Zulip Richard Feldman (Mar 15 2025 at 13:15):

(aside from deprecated things not needing to be parsed of course)

view this post on Zulip Richard Feldman (Mar 15 2025 at 13:15):

or is parsing feature-complete at this point?

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 15:23):

We definitely aren't at parity in terms of edge cases, error handling, and comments, but the syntax grab bag looked featureful enough that it seemed worth testing:
https://github.com/roc-lang/roc/blob/327647a6161b96d06f6524ee393ab675c2fc1335/src/fmt.zig#L938

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 19:22):

Extra note, with a tiny bit of allocation tuning (initCapacity instead of starting empty), we can get another ~9% faster. Not allocating and copying during tokenization by allocating a large array can make tokenization 1.5x faster.

view this post on Zulip Brendan Hansknecht (Mar 15 2025 at 19:24):

That said, this is with overallocating to ensure space for all tokens...might not be worth it memory wise. Really depends on average number of characters per token as to what the default should be. And that depends on how many comments someone has among other things like variable name length.

view this post on Zulip Anthony Bullard (Mar 17 2025 at 11:20):

I didn't realize this thing I'm writing is THAT fast

view this post on Zulip Anthony Bullard (Mar 17 2025 at 11:21):

I haven't even optimized for performance besides the memory-locality stuff

view this post on Zulip Anthony Bullard (Mar 17 2025 at 11:21):

I would love to see us do SIMD tokenization

view this post on Zulip Anthony Bullard (Mar 17 2025 at 11:22):

Let's hope we can get similar speed-ups in the more compute heavy parts of the compiler!

view this post on Zulip Anthony Bullard (Mar 17 2025 at 11:30):

You can really see the advantages of the SoA architecture in the HUGE delta in cache misses. That's three orders of magnitude fewer!

view this post on Zulip Brendan Hansknecht (Mar 17 2025 at 16:34):

Yeah, simply using less memory and allocating less does a metric ton.

The old compiler does ~30x more allocations (presumably of many tiny IR nodes). I'm actually a bit surprised it is so bad in the old compiler. I thought we put all that stuff in arenas, but apparently tons of stuff is not in the arenas. Then as you mention, DOD + SOA leads to way less memory usage in general which equates to way more cache hits.

And yeah, we have room for many optimizations.

view this post on Zulip Joshua Warner (Mar 17 2025 at 17:09):

In the old compiler, Defs has a bunch of Vecs that are not in the arenas

view this post on Zulip Joshua Warner (Mar 17 2025 at 17:09):

In fact, I'm pretty confident that was a memory leak

view this post on Zulip Joshua Warner (Mar 20 2025 at 01:59):

The new parser currently expects this syntax:

Map a b : List(a), (a -> b) -> List(b)

... but with PNC I would expect that to be:

Map(a, b) : List(a), (a -> b) -> List(b)

... no?

view this post on Zulip Andrew Kelley (Mar 20 2025 at 20:48):

Brendan Hansknecht said:

Also cc Andrew Kelley cause I know you were interesting in what perf numbers we get with the new compiler. As said, very preliminary numbers, but the parse to format loop looks to be 5 to 10x faster with 500x less memory usage.

very cool, thanks for sharing!

view this post on Zulip Joshua Warner (Mar 20 2025 at 20:49):

FWIW, part of the difference there is we're actually parsing a different (simpler) language now

view this post on Zulip Andrew Kelley (Mar 20 2025 at 20:52):

I like to think that there's a relationship between syntax being simpler for the computer and also being simpler for the human to parse

view this post on Zulip Brendan Hansknecht (Mar 20 2025 at 22:25):

Joshua Warner said:

FWIW, part of the difference there is we're actually parsing a different (simpler) language now

Yeah, it is a whole mix of things. I would guess the biggest gain is in terms of memory usage and allocations. The sheer amount of data being generated and processed by the old compiler almost certainly is the bottleneck. Even if we changed the grammar, I would expect the allocations, data movement, and cache misses to dominate.

view this post on Zulip Anthony Bullard (Mar 27 2025 at 11:17):

After #7695 lands, this shows all of the syntax that will be parsed and formatted correctly (in this exact style):

try moduleFmtsSame(
        \\# This is a module comment!
        \\app [main!] { pf: platform "../basic-cli/platform.roc" }
        \\
        \\import pf.Stdout exposing [line!, write!]
        \\
        \\import # Comment after import keyword
        \\    pf # Comment after qualifier
        \\        .StdoutMultiline # Comment after ident
        \\        exposing [ # Comment after exposing open
        \\            line!, # Comment after exposed item
        \\            write!, # Another after exposed item
        \\        ] # Comment after exposing close
        \\
        \\import pkg.Something exposing [func as function, Type as ValueCategory, Custom.*]
        \\
        \\import BadName as GoodName
        \\import
        \\    BadNameMultiline
        \\        as
        \\        GoodNameMultiline
        \\
        \\Map a b : List(a), (a -> b) -> List(b)
        \\MapML # Comment here
        \\    a # And here
        \\    b # And after the last arg
        \\        : # And after the colon
        \\            List( # Inside Tag args
        \\                a, # After tag arg
        \\            ),
        \\            (a -> b) -> # After arrow
        \\                List( # Inside tag args
        \\                    b,
        \\                ) # And after the type decl
        \\
        \\Foo : (Bar, Baz)
        \\
        \\FooMultiline : ( # Comment after pattern tuple open
        \\    Bar, # Comment after pattern tuple item
        \\    Baz, # Another after pattern tuple item
        \\) # Comment after pattern tuple close
        \\
        \\Some a : { foo : Ok(a), bar : Something }
        \\SomeMl a : { # After record open
        \\    foo : Ok(a), # After field
        \\    bar : Something, # After last field
        \\}
        \\
        \\SomeMultiline a : { # Comment after pattern record open
        \\    foo # After field name
        \\        : # Before field anno
        \\            Ok(a), # Comment after pattern record field
        \\    bar : Something, # Another after pattern record field
        \\} # Comment after pattern record close
        \\
        \\Maybe a : [Some(a), None]
        \\
        \\MaybeMultiline a : [ # Comment after tag union open
        \\    Some(a), # Comment after tag union member
        \\    None, # Another after tag union member
        \\] # Comment after tag union close
        \\
        \\SomeFunc a : Maybe(a), a -> Maybe(a)
        \\
        \\add_one_oneline = |num| if num 2 else 5
        \\
        \\add_one : (U64 -> U64)
        \\add_one = |num| {
        \\    other = 1
        \\    if num {
        \\        dbg # After debug
        \\            some_func() # After debug expr
        \\        0
        \\    } else {
        \\        dbg 123
        \\        other
        \\    }
        \\}
        \\
        \\match_time = |
        \\    a, # After arg
        \\    b,
        \\| # After args
        \\    match a {
        \\        Blue | Green | Red -> {
        \\            x = 12
        \\            x
        \\        }
        \\        Blue # After pattern in alt
        \\        | # Before pattern in alt
        \\            Green
        \\        | Red # After alt pattern
        \\            -> {
        \\                x = 12
        \\                x
        \\            }
        \\        lower # After pattern comment
        \\            -> 1
        \\        "foo" -> # After arrow comment
        \\            100
        \\        "foo" | "bar" -> 200
        \\        [1, 2, 3, .. as rest] # After pattern comment
        \\            -> # After arrow comment
        \\                123 # After branch comment
        \\
        \\        # Just a random comment
        \\
        \\        [1, 2 | 5, 3, .. as rest] -> 123
        \\        [
        \\            1,
        \\            2 | 5,
        \\            3,
        \\            .. # After DoubleDot
        \\                as # Before alias
        \\                    rest, # After last pattern in list
        \\        ] -> 123
        \\        3.14 -> 314
        \\        3.14 | 6.28 -> 314
        \\        (1, 2, 3) -> 123
        \\        (1, 2 | 5, 3) -> 123
        \\        { foo: 1, bar: 2, ..rest } -> 12
        \\        { # After pattern record open
        \\            foo # After pattern record field name
        \\                : # Before pattern record field value
        \\                    1, # After pattern record field
        \\            bar: 2,
        \\            .. # After spread operator
        \\                rest, # After last field
        \\        } -> 12
        \\        { foo: 1, bar: 2 | 7 } -> 12
        \\        {
        \\            foo: 1,
        \\            bar: 2 | 7, # After last record field
        \\        } -> 12
        \\        Ok(123) -> 123
        \\        Ok(Some(dude)) -> dude
        \\        TwoArgs("hello", Some("world")) -> 1000
        \\    }
        \\
        \\expect # Comment after expect keyword
        \\    blah == 1 # Comment after expect statement
        \\
        \\main! : List(String) -> Result({}, _)
        \\main! = |_| { # Yeah I can leave a comment here
        \\    world = "World"
        \\    number = 123
        \\    expect blah == 1
        \\    tag = Blue
        \\    return # Comment after return keyword
        \\        tag # Comment after return statement
        \\
        \\    # Just a random comment!
        \\
        \\    ...
        \\    match_time(
        \\        ..., # Single args with comment
        \\    )
        \\    some_func(
        \\        dbg # After debug
        \\            42, # After debug expr
        \\    )
        \\    crash # Comment after crash keyword
        \\        "Unreachable!" # Comment after crash statement
        \\    tag_with_payload = Ok(number)
        \\    interpolated = "Hello, ${world}"
        \\    list = [
        \\        add_one(
        \\            dbg # After dbg in list
        \\                number, # after dbg expr as arg
        \\        ), # Comment one
        \\        456, # Comment two
        \\        789, # Comment three
        \\    ]
        \\    record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
        \\    tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
        \\    multiline_tuple = (
        \\        123,
        \\        "World",
        \\        tag1,
        \\        Ok(world), # This one has a comment
        \\        (nested, tuple),
        \\        [1, 2, 3],
        \\    )
        \\    bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
        \\    static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
        \\    Stdout.line!(interpolated)?
        \\    Stdout.line!(
        \\        "How about ${ # Comment after string interpolation open
        \\            Num.toStr(number) # Comment after string interpolation expr
        \\        } as a string?",
        \\    )
        \\} # Comment after top-level decl
        \\
        \\expect {
        \\    foo = 1 # This should work too
        \\    blah = 1
        \\    blah == foo
        \\}
    );

view this post on Zulip Anthony Bullard (Mar 27 2025 at 11:28):

One of the next things that need to be done is to start transcribing the builtins to the v0.1 syntax. Where should they live? I'll start doing that and finishing the parser by parsing hosted, platform, and package headers

view this post on Zulip Brendan Hansknecht (Mar 27 2025 at 13:57):

Preferably, we would make the old compiler able to migrate them to the new syntax

view this post on Zulip Brendan Hansknecht (Mar 27 2025 at 13:57):

That would ensure folks have an easy way to migrate when the zig compiler comes out

view this post on Zulip Brendan Hansknecht (Mar 27 2025 at 13:58):

It also makes it easier to benchmark against the old compiler if they both support the exact same syntax

view this post on Zulip Joshua Warner (Mar 27 2025 at 15:14):

I’be been working on a syntax migration tool in the old compiler, and it’s getting reasonably close. I’d say we should just use that tool to auto-translate the builtins and anything else we want to migrate, with a focus on improving the tool and rerunning it rather than doing hand fix ups to the migrated code

view this post on Zulip Brendan Hansknecht (Mar 27 2025 at 16:15):

100% agree

view this post on Zulip Notification Bot (Apr 08 2025 at 14:00):

9 messages were moved from this topic to #ideas > Needed Function signature and lambda expr change by Anton.

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:00):

Reading through the Static Dispatch document for the 27th time, am I right to understand that where are only a part of a function's type?

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:01):

The where is saying that a function uses variables that have constraints

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:02):

This is just a parsing question, actually

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:02):

You can also use them in aliases

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:02):

Yes, but it would only appear as part of the type annotation of a function type?

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:02):

We haven't ironed out the links there

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:02):

Oh yeah, and in aliases

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:02):

I think I'll just be implementing them as part of function types for now

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:03):

But if anyone (ahem @Richard Feldman ) has a nice summary of the overall plan here (as well as changes in syntax), please let me know

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:03):

We want to be able to say sort_list : List(a) -> List(a) where Sortable(a)

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:04):

Though a.Sortable was also thrown around

view this post on Zulip Sam Mohr (Apr 13 2025 at 22:04):

So starting with just functions is fine for now

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:04):

Yeah, I think the big issue is the current syntax - outside of aliases - is likely going to cause some parsing issues

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:06):

Dict.insert : Dict k v, k, v -> Dict k v
    where k.Hash, k.Eq

Hash a : a
    where
        a.hash(hasher, a -> hasher),
        hasher.Hasher,

I'm actually wondering if I should attach these instead to annotations and type definitions

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:07):

Because this being used inside of an aggregate type would be .... complicated

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:17):

Just thinking out loud here...

so today we have

Hasher implements
    add_bytes : a, List U8 -> a where a implements Hasher
    add_u8 : a, U8 -> a where a implements Hasher
    add_u16 : a, U16 -> a where a implements Hasher
    add_u32 : a, U32 -> a where a implements Hasher
    add_u64 : a, U64 -> a where a implements Hasher
    add_u128 : a, U128 -> a where a implements Hasher
    complete : a -> U64 where a implements Hasher

and I think we would have with v0.1

Hasher a: a where
    a.add_bytes(List U8) -> a ,
    a.add_u8(U8) -> a,
    a.add_u16(U16) -> a,
    a.add_u32(U32) -> a,
    a.add_u64(U64) -> a,
    a.add_u128(U128) -> a,
    a.complete() -> U64

Does that look correct?

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:19):

And that allows for us to then have something like:

# In `Dict.roc`
insert : Dict k v, k, v -> Dict k v
    where
        k.Hash,
        k.Eq,

# In `Str.roc`
hash :  Str, Hash.Hasher -> Hash.Hasher
hash = {
    ...# Implementation
}

# In `Hash.roc`
Hash a : a
        where
            a.hash(hasher) -> hasher,
            hasher.Hasher,
Hasher a : a
    where
        a.add_bytes(List U8) -> a ,
        a.add_u8(U8) -> a,
        a.add_u16(U16) -> a,
        a.add_u32(U32) -> a,
        a.add_u64(U64) -> a,
        a.add_u128(U128) -> a,
        a.complete() -> U64

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:23):

And then a Dict(Str, Str) is valid (assuming similar is implemented for Eq)

view this post on Zulip Anthony Bullard (Apr 13 2025 at 22:28):

So each clause in a where is one of:

  1. LowerIdent, NoSpaceDotLowerIdent, NoSpaceOpenParen, <type args>, CloseParens, OpArrow, <type arg>
  2. LowerIdent, NoSpaceDotUpperIdent

view this post on Zulip Brendan Hansknecht (Apr 13 2025 at 22:58):

Anthony Bullard said:

Reading through the Static Dispatch document for the 27th time, am I right to understand that where are only a part of a function's type?

Yep

view this post on Zulip Brendan Hansknecht (Apr 13 2025 at 22:59):

Anthony Bullard said:

Just thinking out loud here...

so today we have

Hasher implements
    add_bytes : a, List U8 -> a where a implements Hasher
    add_u8 : a, U8 -> a where a implements Hasher
    add_u16 : a, U16 -> a where a implements Hasher
    add_u32 : a, U32 -> a where a implements Hasher
    add_u64 : a, U64 -> a where a implements Hasher
    add_u128 : a, U128 -> a where a implements Hasher
    complete : a -> U64 where a implements Hasher

and I think we would have with v0.1

Hasher a: a where
    a.add_bytes(List U8) -> a ,
    a.add_u8(U8) -> a,
    a.add_u16(U16) -> a,
    a.add_u32(U32) -> a,
    a.add_u64(U64) -> a,
    a.add_u128(U128) -> a,
    a.complete() -> U64

Does that look correct?

Yep

view this post on Zulip Brendan Hansknecht (Apr 13 2025 at 23:01):

Anthony Bullard said:

Dict.insert : Dict k v, k, v -> Dict k v
    where k.Hash, k.Eq

Hash a : a
    where
        a.hash(hasher, a -> hasher),
        hasher.Hasher,

I'm actually wondering if I should attach these instead to annotations and type definitions

Hash looks to be defined wrong. I think it would be a.hash(hasher) -> hasher. That or it needs to be defined differently to be module based instead of standard static dispatch.

view this post on Zulip Brendan Hansknecht (Apr 13 2025 at 23:02):

The other way to define it would be module(a).hash(hasher, a) -> hasher

view this post on Zulip Anthony Bullard (Apr 13 2025 at 23:15):

What does the module(a) buy us over just a?

view this post on Zulip Anthony Bullard (Apr 13 2025 at 23:27):

Is that for when the argument of type a is not the first arg (in the "receiver" position)?

view this post on Zulip Brendan Hansknecht (Apr 14 2025 at 00:25):

Yes

view this post on Zulip Brendan Hansknecht (Apr 14 2025 at 00:25):

Need for decode

view this post on Zulip Brendan Hansknecht (Apr 14 2025 at 00:25):

Cause decode is List(U8) -> a

view this post on Zulip Brendan Hansknecht (Apr 14 2025 at 00:26):

Probably will a decoding format as well or something like that

view this post on Zulip Anthony Bullard (Apr 14 2025 at 17:32):

Decode a : a where a.decode(List(U8)) -> a
Encode a : a where a.encode(a) -> List(U8)

view this post on Zulip Brendan Hansknecht (Apr 14 2025 at 19:23):

How would that definition of decode work? You won't have a if you are trying to decode an a.

I think it has to be
Decode a : a where module(a).decode(List(U8)) -> a

view this post on Zulip Anthony Bullard (Apr 14 2025 at 19:31):

Sorry

view this post on Zulip Anthony Bullard (Apr 14 2025 at 19:31):

I was just typing out syntax

view this post on Zulip Anthony Bullard (Apr 14 2025 at 19:31):

That wasn’t meant to be real (or actually submitted)

view this post on Zulip Anthony Bullard (Apr 16 2025 at 01:50):

The PR for parsing and formatting is here: https://github.com/roc-lang/roc/pull/7745

view this post on Zulip Anthony Bullard (Apr 16 2025 at 02:13):

My absolute rough draft, first take re-implementation of Hash.roc from the builtins based on all of the syntax for v0.1 that is available today:

module [
    Hash,
    Hasher,
    hash_list,
    hash_unordered,
]

import Num exposing [
    U8,
    U16,
    U32,
    U64,
    U128,
]

## A value that can be hashed.
Hash a : a
    where
        ## Hashes a value into a [Hasher].
        ## Note that [hash] does not produce a hash value itself; the hasher must be
        ## [complete]d in order to extract the hash value.
        a.hash(hasher) -> hasher,
        hasher.Hasher,

## Describes a hashing algorithm that is fed bytes and produces an integer hash.
##
## The [Hasher] ability describes general-purpose hashers. It only allows
## emission of 64-bit unsigned integer hashes. It is not suitable for
## cryptographically-secure hashing.
Hasher a : a
    where
        ## Adds a list of bytes to the hasher.
        a.add_bytes(List(U8)) -> a,

        ## Adds a single U8 to the hasher.
        a.add_u8(U8) -> a,

        ## Adds a single U16 to the hasher.
        a.add_u16(U16) -> a,

        ## Adds a single U32 to the hasher.
        a.add_u32(U32) -> a,

        ## Adds a single U64 to the hasher.
        a.add_u64(U64) -> a,

        ## Adds a single U128 to the hasher.
        a.add_u128(U128) -> a,

        ## Completes the hasher, extracting a hash value from its
        ## accumulated hash state.
        a.complete() -> U64,

## Adds a list of [Hash]able elements to a [Hasher] by hashing each element.
hash_list : hasher, List a -> hasher
    where
        a.Hash,
        hasher.Hasher,
hash_list = |hasher, lst|
    lst.walk(
        hasher,
        |accum_hasher, elem|
            elem.hash(accum_hasher),
    )

HashFunction a : hasher, a -> hasher where a.Hash, hasher.Hasher
HashWalker container elem -> hasher, container, HashFunction(elem) -> hasher
    where
        elem.Hash,
        hasher.Hasher,

## Adds a container of [Hash]able elements to a [Hasher] by hashing each element.
## The container is iterated using the walk method passed in.
## The order of the elements does not affect the final hash.
hash_unordered : hasher, container, HashWalker(container, elem) -> hasher
    where
        elem.Hash,
        hasher.Hasher,
hash_unordered = |hasher, container, walk| {
    acc = walk(
        container,
        0,
        |accum, elem| {
            x =
                # Note, we intentionally copy the hasher in every iteration.
                # Having the same base state is required for unordered hashing.
                elem
                .hash(hasher)
                .complete()
            next_accum = accum.add_wrap(x)

            if next_accum < accum {
                # we don't want to lose a bit of entropy on overflow, so add it back in.
                next_accum.add_wrap(1)
            } else {
                next_accum
            }
        },
    )
    hasher.add_u64(acc)
}

view this post on Zulip Anthony Bullard (Apr 16 2025 at 02:15):

If that looks correct to everyone, I'll add a snapshot for it

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 02:25):

I assume a.add_u8(U8) -> hasher, should be a.add_u8(U8) -> a, same for other functions in Hasher

Also, is it now Hash(a) and Hasher(a) instead of Hash a and Hasher a

(hasher, b, (hasher, a -> hasher) -> hasher) could use a anlias to be less confusing. That or at least change the name b to container and a to elem.


Yeah, looks roughly correct.

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 02:26):

Also, I wonder if we need a better way to apply aliases. hasher.Hasher, doesn't look good.

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 02:27):

Like it doesn't feel like it is saying that hasher implements the Hasher ability. I wonder if it would be clearer to just do Hasher(hasher) or something similar....not sure

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 02:28):

Hey @Richard Feldman, do we have any plans on fixing byte hashing for static dispatch? As in, a List U8 will by default hash via hash_list instead add_bytes. As such, it leaves a lot of performance on the table by default.

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 02:29):

Really in a perfect world, all number types would hash via add_bytes and only complex types would use hash_list that walks an element at a time.

view this post on Zulip Richard Feldman (Apr 16 2025 at 03:02):

hm, I'm not familiar with the distinction :sweat_smile:

view this post on Zulip Richard Feldman (Apr 16 2025 at 03:02):

I assume hash_list is hashing each byte individually, which of course is suboptimal

view this post on Zulip Richard Feldman (Apr 16 2025 at 03:02):

but what does add_bytes do?

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 03:36):

Yeah, hash_list is the one element at a time thing

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 03:37):

And add_bytes leaves it up to the hasher and thus can run way faster algorithms

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 03:38):

The issue is that we cannot specialize hash on List U8 only on List a. So we get stuck hashing List U8 with hash_list unless a user manually overrides it in a custom type hash implementation.

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 03:40):

As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes but to mask out the padding, but that is another level of complexity to even orchestrate.

view this post on Zulip Anthony Bullard (Apr 16 2025 at 11:08):

Brendan Hansknecht said:

I assume a.add_u8(U8) -> hasher, should be a.add_u8(U8) -> a, same for other functions in Hasher

Also, is it now Hash(a) and Hasher(a) instead of Hash a and Hasher a

(hasher, b, (hasher, a -> hasher) -> hasher) could use a anlias to be less confusing. That or at least change the name b to container and a to elem.


Yeah, looks roughly correct.

I fixed the couple of things that were obviously wrong, and made some type aliases to make list_unordered's signature readable.

Currently a Type Header still uses space separated args. Did this change @Richard Feldman ?

view this post on Zulip Anthony Bullard (Apr 16 2025 at 11:09):

This is why it would be very nice if there were a Grammar specification - outside of the implementation in the parser - that we could use to track the parser's compliance (and therefore completion)

view this post on Zulip Anthony Bullard (Apr 16 2025 at 11:10):

As good as Zulip is, trying to find the answer to every such question is very difficult

view this post on Zulip Anthony Bullard (Apr 16 2025 at 11:13):

In #ideas > ✔ Using parens for types @ 💬 we decided to use parens for type arguments in the annotation itself, but no real direction of type alias headers

view this post on Zulip Richard Feldman (Apr 16 2025 at 12:56):

good question! I think we should match what patterns do

view this post on Zulip Richard Feldman (Apr 16 2025 at 12:57):

so for example:

Alias(a, b) : ...

view this post on Zulip Richard Feldman (Apr 16 2025 at 13:06):

Brendan Hansknecht said:

The issue is that we cannot specialize hash on List U8 only on List a. So we get stuck hashing List U8 with hash_list unless a user manually overrides it in a custom type hash implementation.

ah, yeah static dispatch doesn't actually help with this; the List type is still in the List module.

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:07):

Good point! The hash_list function should be in the list module

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:07):

And ok on the change to type alias headers

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:07):

I can do that real quick before work

view this post on Zulip Richard Feldman (Apr 16 2025 at 13:08):

Brendan Hansknecht said:

As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes but to mask out the padding, but that is another level of complexity to even orchestrate.

:thinking: we could make it so that all the non-pointers in structural types get hashed using add_bytes. They should all end up right next to each other anyway because of alignment, and we can't do better than that because we need to chase the pointers anyway (and not include the addresses in the hash)

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:10):

And then we need:

  1. Whatever we call [-2, 0, 2].map(.abs().sub(1)). Tearoffs?
  2. var statements
  3. for statements

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:11):

And then v0.1 parsing will be complete

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:12):

Obviously the formatter will need quite a bit of iteration to dial in the style over time

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:13):

@Richard Feldman have we ever finalized the syntax and semantics for a) Tearoffs and b) local function application in a static dispatch chain?

view this post on Zulip Richard Feldman (Apr 16 2025 at 13:29):

a) let's hold off on that for now. I'd like to see how often people actually want to reach for it in practice once we have static dispatch. Might turn out to not be worth doing.
b) yeah, we settled on a->b in https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/static.20dispatch.20-.20pass_to.20alternative

view this post on Zulip Richard Feldman (Apr 16 2025 at 13:35):

more specifically:

view this post on Zulip Anthony Bullard (Apr 16 2025 at 13:37):

Just to clarify about b - We decided on -> for local dispatch / "Arrow calls", but there was some suggestion (not really touched upon) that match clauses would drop ->. Is that a thing?

view this post on Zulip Anthony Bullard (Apr 16 2025 at 14:03):

Well, this change was a tad more difficult than I expected - it creates some contention with a plain applied Tag expression. Have to make statements differentiate between top-level and inside of a block.

view this post on Zulip Anthony Bullard (Apr 16 2025 at 14:04):

So I'll try to finish it tomorrow morning :-)

view this post on Zulip Anthony Bullard (Apr 16 2025 at 14:04):

Now off to writing AI Agents...

view this post on Zulip Richard Feldman (Apr 16 2025 at 16:16):

Anthony Bullard said:

Just to clarify about b - We decided on -> for local dispatch / "Arrow calls", but there was some suggestion (not really touched upon) that match clauses would drop ->. Is that a thing?

we decided to use => with match for now

view this post on Zulip Anton (Apr 16 2025 at 16:37):

Anthony Bullard said:

Now off to writing AI Agents...

Cool, what are you working on specifically?

view this post on Zulip Brendan Hansknecht (Apr 16 2025 at 18:05):

Richard Feldman said:

Brendan Hansknecht said:

As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes but to mask out the padding, but that is another level of complexity to even orchestrate.

:thinking: we could make it so that all the non-pointers in structural types get hashed using add_bytes. They should all end up right next to each other anyway because of alignment, and we can't do better than that because we need to chase the pointers anyway (and not include the addresses in the hash)

Ah yeah, structural types without pointers can do what I said when in a list. If you have pointers, you need to do things an element at a time

view this post on Zulip Anthony Bullard (Apr 16 2025 at 18:47):

Anton said:

Anthony Bullard said:

Now off to writing AI Agents...

Cool, what are you working on specifically?

Wish I could be specific, but let’s just exploration around UI Generation when the UI is not created from source code

view this post on Zulip Anthony Bullard (Apr 17 2025 at 11:24):

Type alias header using parens for args: https://github.com/roc-lang/roc/pull/7749

view this post on Zulip Anthony Bullard (Apr 17 2025 at 11:32):

My next steps:

  1. Move match clauses to using fat arrow =>
  2. Implement "arrow calls" as described in #compiler development > zig compiler - parser @ 💬
  3. Implement var statements
  4. Implement for construct

I'll probably do 1 & 2 in a single PR, hopefully by EOW.
Then 3 & 4 should be relatively straightforward so I'm hoping to have everything done before the next contributor meet up.

view this post on Zulip Anthony Bullard (Apr 18 2025 at 10:56):

Move match clauses to fat arrow: https://github.com/roc-lang/roc/pull/7751

view this post on Zulip Anthony Bullard (May 11 2025 at 13:10):

Parse and format "Local dispatch arrow calls": https://github.com/roc-lang/roc/pull/7780

view this post on Zulip Anthony Bullard (May 12 2025 at 13:59):

Parse and format var and for statements: https://github.com/roc-lang/roc/pull/7783

view this post on Zulip Anthony Bullard (May 12 2025 at 14:01):

Unless something has been missed, the above should functionally complete the parser, parse IR, and formatter for the v0.1 syntax

view this post on Zulip Richard Feldman (May 12 2025 at 14:08):

yoooooooo :heart_eyes::heart_eyes::heart_eyes:

view this post on Zulip Anthony Bullard (May 12 2025 at 14:11):

now to find something difficult to work on :thinking:

view this post on Zulip Anthony Bullard (May 12 2025 at 14:12):

(and i want to make it very clear that the formatter style is still a WIP but it works, is stable, and loses no information)

view this post on Zulip Richard Feldman (Jun 03 2025 at 21:39):

Brendan Hansknecht said:

Ok, disabled all the extra reformatting and validation logic. So both codebases are just doing their parse to format loop. So should be more apples to apples now. We are still 3 to 5x faster with the new compiler (and 200x less memory).

M1 Mac

X86 Linux


I'm going on the Changelog podcast tomorrow, and it occurred to me that it would be cool to be able to talk about redoing these benchmarks now that we have a feature-complete parser and formatter!

view this post on Zulip Richard Feldman (Jun 03 2025 at 21:39):

would be sweet to be able to have not only a comparison with the old compiler, but also something like "we can now parse the equivalent of _____ lines of code per second"

view this post on Zulip Richard Feldman (Jun 03 2025 at 21:41):

I won't have time between now and then, but if anyone has time to run those benchmarks, would be great to have them! would be awesome to inspire some people to get involved if they're interested in helping out with a performance-focused compiler :smiley:

view this post on Zulip Anthony Bullard (Jun 03 2025 at 22:19):

i would love to see those numbers as well! i don't have a lot of time at the moment but i'd love to brag about how much better we've made things

view this post on Zulip Anthony Bullard (Jun 03 2025 at 22:47):

if the existing benchmarks hold up it's 3.9M LOC/s

view this post on Zulip Anthony Bullard (Jun 03 2025 at 22:50):

oh and that's parse and formatting across 100 files

view this post on Zulip Anthony Bullard (Jun 03 2025 at 22:51):

That portends wonderful things for the LS as well

view this post on Zulip Richard Feldman (Jun 03 2025 at 23:19):

Anthony Bullard said:

if the existing benchmarks hold up it's 3.9M LOC/s

is that for parse and format?

view this post on Zulip Richard Feldman (Jun 03 2025 at 23:19):

or just parse

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:20):

It says parse and format loop

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:21):

which means that includes at least a couple hundred sys calls for reading and writing files

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:21):

correct me if that's wrong @Brendan Hansknecht

view this post on Zulip Brendan Hansknecht (Jun 03 2025 at 23:49):

Yeah, parse and format. If I recall correctly, only about 20% is formatting....

view this post on Zulip Brendan Hansknecht (Jun 03 2025 at 23:50):

I probably could pull more updated numbers. Biggest caveat being I don't have real meaningful source code in both old and new parser syntax. Last time I was using a modified syntax grab bag which is not that much like real world code.

view this post on Zulip Luke Boswell (Jun 03 2025 at 23:51):

Could Claude generate some realistic source code?

view this post on Zulip Brendan Hansknecht (Jun 03 2025 at 23:51):

I know we wanted to make the old parser auto update, but feature like abilities make the standard library fail to port over.

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:51):

is the rust compiler v0.1 migrator finished?

view this post on Zulip Luke Boswell (Jun 03 2025 at 23:51):

@Joshua Warner

view this post on Zulip Brendan Hansknecht (Jun 03 2025 at 23:52):

I think it is mostly there but has edge cases and clearly can't handle things like abilities.

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:52):

we could try to just find a number of source files in different projects that succeed

view this post on Zulip Brendan Hansknecht (Jun 03 2025 at 23:52):

@Richard Feldman when do you need the numbers by?

view this post on Zulip Luke Boswell (Jun 03 2025 at 23:53):

Richard Feldman said:

I'm going on the Changelog podcast tomorrow...

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:54):

I think we can give the existing numbers as showing at least the order of magnitude

view this post on Zulip Anthony Bullard (Jun 03 2025 at 23:54):

and caveat it

view this post on Zulip Richard Feldman (Jun 04 2025 at 00:02):

@Brendan Hansknecht noon Pacific is when the recording starts

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:33):

Just testing the formatter now to see if I can get more meaningful code migrated (like some of the builtins).

Definitely a few pieces missing for migration:

  1. abilities (no fix cause they just don't exist)
  2. wildcard types (those can just become an abitrary letter)
  3. pizza op (that should be able to turn into static dispatch)

I still will try to manually convert some things and see how it goes.

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:38):

hmm...something also definitely goes wrong with closures and braces.

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:48):

Also @Anthony Bullard, found a parser bug. This definitely should parse:

expect {
    foo : U32
    foo = 1 # This should work too
    blah = 1
    blah == foo
}

Specifically the foo : U32 line breaks parsing.

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:50):

surprising that it doesn't

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:50):

i thought i had a similar test

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:50):

is it the U32?

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:53):

I don't think so. I hit it with List(a) first.

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:53):

Some reason it seems to expect the expect to end right after that line.

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:55):

ah it must think it's a record literal

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:56):

that shouldn't happen. i check for this and bail out of a record and parse it as a block

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:56):

I can take a look in the morning but that's the best i can do

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:56):

Yeah, I don't need it fixed, just noting it

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:57):

ok sorry about that

view this post on Zulip Anthony Bullard (Jun 04 2025 at 01:57):

i'll add the above as a snapshot and makes sure it parses

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 01:58):

One other question for you, does the new zig compiler attempt to parse thing like List a and convert to List(a)?

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:07):

I don't think we should try to do that honestly

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:07):

I think the amount of complexity wouldn't be worth it, considering LLMs can now do that sort of migration well enough :smile:

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:08):

and I remember a conversation where we discussed that if we knew the new parser didn't need to support whitespace application at all, it would significantly simplify several areas

view this post on Zulip Anthony Bullard (Jun 04 2025 at 02:08):

yeah no white space application at all

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:13):

love the idea of parsing the stdlib as a point of comparison btw! :smiley:

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 02:25):

Yeah, just was curious cause in the half migrated list.roc file I have it has some cases where it doesn't seem to care about List a but others where it does care....no idea why.

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 02:30):

love the idea of parsing the stdlib as a point of comparison btw!

Yeah, just not sure if I can port the stdlib in time to get meaningful numbers. Theoretically an llm might be able to help, but they definitely don't know roc currently.

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 02:31):

Doesn't have to be a fully functional port, but at least needs to parse and format.

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:31):

ah yeah I guess we don't have an equivalent of https://www.roc-lang.org/builtins/llms.txt for the new syntax

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:31):

could try giving it some basic examples but maybe not enough

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:31):

or like tests in the source code maybe?

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 02:40):

Oh, one other parser bug found. The new parser doesn't understand doc comments ## ...

view this post on Zulip Anthony Bullard (Jun 04 2025 at 02:41):

that's super easy to add. i thought the tokenizer handled that

view this post on Zulip Anthony Bullard (Jun 04 2025 at 02:42):

we definitely don't have a concept of a doc comment in the parser because the parser doesn't know about comments at all

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 02:43):

Ah yeah, fair

view this post on Zulip Anthony Bullard (Jun 04 2025 at 02:43):

either way should be very straight forward

view this post on Zulip Anthony Bullard (Jun 04 2025 at 02:44):

there's a function called chompTrivia that handles it i believe

view this post on Zulip Anthony Bullard (Jun 04 2025 at 02:44):

in the tokenizer

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 02:51):

Also, to note, I think it is may just be the formatting that needs to be fixed. Outputs # # ...

view this post on Zulip Richard Feldman (Jun 04 2025 at 02:53):

we may need to start keeping doc comments (although not regular ones) around for parsing so we can parse their markdown but yeah

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 04:33):

Hmm, looks like we are missing record destructuring in the new syntax:
{ start, end, step } = rec
expr_unexpected_token, at token OpAssign at ...

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 04:40):

And maybe missing guards on match statements.
[x, .. as rest] if x == delimiter =>
expr_unexpected_token, at token OpFatArrow at ...

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 05:17):

Ok, so I ported over List.roc. I think it is a reasonably representative file for well documented roc code. It is about ~800 LOC and ~800 lines of comments/blank/bracket only.
The new version is about 100 lines of brackets more than the old version. That said, total byte count is pretty similar. I padded the old roc file with an extra comment to make them the exact same number of bytes.

All numbers on M1 mac

Formatting 1000 files that are all List.roc

Formatting List.roc single file with 1000x the body size

Also, definitely room for more gains like making chomptrivia faster.

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 05:20):

@Richard Feldman here are some rough numbers just based on List.roc.

Clearly we leak memory with every new file in the old rust compiler.

Also, big numbers that matter:
~3 to 4x faster than rust compiler (parse and format combined)
~2 to 3 millions lines of code parsed per second
~5 to 6 millions lines per second when including comments, bracket only, and blank lines

view this post on Zulip Luke Boswell (Jun 04 2025 at 05:29):

making chomptrivia faster

Nice, sounds like a fun project :racecar:

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 05:35):

Yeah, chomp trivia and string interning are the two things that stick out most.

view this post on Zulip Luke Boswell (Jun 04 2025 at 05:36):

The plan is to do some SIMD stuff there? It sounds like it might be a really contained side project. Would it be too early to explore that now?

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 05:44):

Someone could definitely do this now if they are interested.

view this post on Zulip Luke Boswell (Jun 04 2025 at 07:17):

I had an initial stab at it... https://github.com/roc-lang/roc/pull/7815

@Brendan Hansknecht can you share your List.roc file so I can test this.

view this post on Zulip Anthony Bullard (Jun 04 2025 at 10:48):

I don't know enough about SIMD to really review your implementation @Luke Boswell but it looks good to me, thank you for adding tests to check correctness!

view this post on Zulip Anthony Bullard (Jun 04 2025 at 10:57):

Brendan Hansknecht said:

Hmm, looks like we are missing record destructuring in the new syntax:
{ start, end, step } = rec
expr_unexpected_token, at token OpAssign at ...

Can you share the overall file and I'll add it as a snapshot that I can fill in gaps. To be clear, this is supposed to be supported. I can see from even syntax grab bag (and verified by looking at Parser.parseStmt) that I must have just forgotten to look for OpenCurly and OpenSquare in the statement position to check them being used as a pattern before sending it over to parseExpr.

view this post on Zulip Anthony Bullard (Jun 04 2025 at 10:57):

And the match guards were also just for some reason not on my check list

view this post on Zulip Richard Feldman (Jun 04 2025 at 23:11):

the Changelog episode recording went great, and I used the numbers in it - thanks for doing those @Brendan Hansknecht and thanks for getting the parser and formatter to the point where we could actually compile a whole builtin module with them @Anthony Bullard! :heart_eyes:

view this post on Zulip Richard Feldman (Jun 04 2025 at 23:11):

episode should be out in a week or two

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 23:58):

@Luke Boswell I ran it on a folder with 1000 copies of this file
file_1.roc

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 23:59):

and a version of it that just duplicated the body 1000 times

view this post on Zulip Brendan Hansknecht (Jun 04 2025 at 23:59):

New-List.roc

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 00:02):

Can you share the overall file and I'll add it as a snapshot that I can fill in gaps.

I ended up removing everything that caused a problem in order to get it compiling.... I can try to find and re-add all the things that broke the tokenizer/parser. Might not have time to do so tonight though. Should just be the two issues I mentioned above and then many commented out type annotations.

view this post on Zulip Luke Boswell (Jun 05 2025 at 00:04):

This is good. Thank you. I'm currently rewriting a bunch of things. I've been experimenting with the LLM workflow, and it's amazing how useful but also unhelpful the AI can be. It gives code that looks pretty good but can be totally useless. I'm rebuilding my test cases to ensure it's working correctly.

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 00:13):

Might not have time to do so tonight though

Actually, I am confused on what day it is. I should have time tonight, don't have time tomorrow night. I'll try to skim through the file and re-add the various pieces I had to remove.

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 01:35):

@Anthony Bullard I think this file should work for your snapshot test. Also, beyond just making parse, definitely needs some formatting cleanup.
Edit: newer version of file with another fix.
List.roc

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 01:35):

For example:

walk_backwards_help = |list, state, f, index_plus_one|
    if index_plus_one == 0
        state
            else {
                index = Num.sub_wrap(index_plus_one, 1)
                next_state = f(state, get_unsafe(list, index))

                walk_backwards_help(list, next_state, f, index)
            }

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 03:17):

I keep remembering ...oh, and one more thing for this.... In the new compiler we don't allow _intionally_named_but_ignored. It has to just be _. Was this intentional? I like named ignored variables, but maybe not everyone does and we decided to remove it?

view this post on Zulip Luke Boswell (Jun 05 2025 at 03:23):

I think its meant to be the same, i.e. _named_ignored_thing is still valid. It's probably just been overlooked/not implemented yet.

view this post on Zulip Anton (Jun 06 2025 at 09:44):

Luke Boswell said:

The plan is to do some SIMD stuff there? It sounds like it might be a really contained side project. Would it be too early to explore that now?

I followed the updates and PR on this, but I think it's too early for SIMD, I would not add serious complexity to code that is still expected to change and not a bottleneck.

view this post on Zulip Luke Boswell (Jun 06 2025 at 09:52):

I agree. It was fun to explore a little, but turned out to be a little more complicated than I thought.

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 14:33):

Anton said:

I followed the updates and PR on this, but I think it's too early for SIMD, I would not add serious complexity to code that is still expected to change and not a bottleneck.

Luckily, chomp trivia is not still expected to change and is a bottleneck.

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 14:34):

It also is such a small chunk of code that it would be trivial to switch back if we need

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 14:34):

I think small localized optimizations like this should be totally fine.

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 14:37):

I say this more in terms of, people should feel free to explore optimizations and we should definitely accept them if they are small and self contained. Don't want to block that energy.

view this post on Zulip Anton (Jun 06 2025 at 15:07):

I agree, I forgot that this was about chompTrivia specifically

view this post on Zulip Anton (Jun 06 2025 at 15:24):

For the bottleneck I was thinking about parsing not being a bottleneck in compilation but yeah small, self-contained and not expected to change sounds good

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 15:29):

Yeah, for dev with an interpreter, will be interesting to see the bottlenecks

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 15:30):

Also, I guess formatting a ton of files is another workflow we want to be super fast

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 15:30):

But yeah, in general I agree

view this post on Zulip Anthony Bullard (Jun 08 2025 at 14:27):

Richard Feldman said:

the Changelog episode recording went great, and I used the numbers in it - thanks for doing those Brendan Hansknecht and thanks for getting the parser and formatter to the point where we could actually compile a whole builtin module with them Anthony Bullard! :heart_eyes:

Can't wait to see it myself!

view this post on Zulip Luke Boswell (Jun 20 2025 at 08:24):

I've noticed we don't parse tuple patterns... are these planned but just not implemented yet?

Analysis from Claude

The parser only looks for pattern assignments when it sees a LowerIdent token. For tuple patterns starting with (, it falls through to the default case where it parses an expression statement instead of a pattern assignment.

To fix this, we would need to:

  1. Add a case for .OpenRound and .NoSpaceOpenRound in parseStmtByType
  2. Try to parse a pattern, then check if it's followed by OpAssign
  3. If so, create a .decl statement with the pattern

However, this approach has a problem: we'd need to look ahead quite a bit to determine if we have a pattern assignment vs just a tuple expression. A better approach might be to:

  1. In the default case (where expressions are parsed), after parsing an expression, check if it's followed by OpAssign
  2. If so, try to convert the expression to a pattern (or report an error if it can't be converted)
  3. Then parse the RHS and create a .decl statement

This is likely why the current parser implementation only supports simple identifier patterns in assignments - handling complex patterns requires either significant lookahead or the ability to convert expressions to patterns after parsing.

The snapshot shows this limitation clearly - tuple "patterns" are being parsed as tuple expressions, leading to the "expr_unexpected_token" error when the parser encounters the = sign.

Snapshot

I thought these would be supported, but I think they haven't been implemented in the Parser yet

~~~META
description=Tuple pattern matching tests
type=expr
~~~SOURCE
{

    # Simple tuple destructuring
    (x, y) = (1, 2)

    # Nested tuple patterns
    ((a, b), (c, d)) = ((10, 20), (30, 40))

    # Mixed patterns with literals
    (first, second, third) = (100, 42, 200)

    # Tuple with string and tag patterns
    (name, string, boolean) = ("Alice", "fixed", True)

    # Tuple with list pattern
    (list, hello) = ([1, 2, 3], "hello")

    {}
}

view this post on Zulip Anthony Bullard (Jun 20 2025 at 20:28):

check the syntax grab bag snapshot

view this post on Zulip Anthony Bullard (Jun 20 2025 at 20:29):

that's all of the implemented syntax

view this post on Zulip Anthony Bullard (Jun 20 2025 at 20:29):

if tuples aren't there i think it was an oversight

view this post on Zulip Brendan Hansknecht (Jun 20 2025 at 21:22):

I think it is specifically for pattern matching that they aren't implemented.

view this post on Zulip Brendan Hansknecht (Jun 20 2025 at 21:22):

I mentioned this a while ago in this thread when porting List.roc over

view this post on Zulip Anthony Bullard (Jun 20 2025 at 21:31):

should be trivial to fix, but it would be MUCH easier for all of this if decls started with a keyword :rolling_on_the_floor_laughing:

view this post on Zulip Luke Boswell (Jun 20 2025 at 21:53):

Like let? What are you thinking

view this post on Zulip Anthony Bullard (Jun 20 2025 at 21:53):

yeah

view this post on Zulip Richard Feldman (Jun 20 2025 at 22:11):

this is how we end up with nice syntax: we do the thing that's nicer for the end user but harder for the compiler authors :grinning_face_with_smiling_eyes:

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:16):

oh totally! not suggesting we SHOULD

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:18):

just stating the simple fact that patterns in decls with no keywords that are almost indistinguishable from a expr of the same shape until you find the equal sign is complicated specifically for things like lists and tuples

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:18):

with a record i know after two tokens

view this post on Zulip Richard Feldman (Jun 20 2025 at 22:19):

ah for sure!

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:19):

but lists and tuples patterns can mean a theoretically unbounded lookahead

view this post on Zulip Luke Boswell (Jun 20 2025 at 22:24):

What do you think of Claude's suggestion

handling complex patterns requires either significant lookahead or the ability to convert expressions to patterns after parsing.

Could we convert it from an expression to pattern in Can?

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:41):

possibly

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:42):

FWIW I’d actually like to have the same underlying ast nodes be reused for exprs and patterns

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:42):

but i think it would be a pain, but i can look at it

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:42):

That way we don’t have to convert (only verify)

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:43):

Joshua Warner said:

FWIW I’d actually like to have the same underlying ast nodes be reused for exprs and patterns

there'd have to be near perfect symmetry between them for me to roll that way. i can take a look at that tomorrow

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:43):

it would unlock some nice characteristics

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:43):

really removing the need for lookahead at all

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:51):

the only issue i can think of off top is alternates

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:51):

There’s not a perfect symmetry, but it’s close enough to be useful IMO. The only alternative is allowing conversions, which I think will end up being a bit costly from a perf perspective

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:52):

like Foo(1 | 2)

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:52):

Alternates can’t occur in expr position anyway (at the start of a statement)

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:53):

yeah but i wouldn't want to underfit the node to support something for this kind of pattern that's also used for exprs

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:54):

Oh to be clear I’m taking about the internal node types being shared, not the types exposed to the rest of the compiler

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:55):

i mean all nodes internally are identical

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:55):

Right, but expressions and patterns should use the same node type IDs 

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:56):

We can then translate them into different public enumeration types, depending on the context

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:56):

it's just the different tags and how those tags affect the interpretation of the data and extra data

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:56):

Yep, so the same internal tags, but different external enumerations 

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:56):

so all use expr ids

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:56):

but you get / set with specific functions for pattern or expr?

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:57):

I was matching we would also have a pattern ID that sneaky uses the same ID space as expression IDs

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:57):

Sneakily

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:57):

they all have the same address space

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:57):

they are all just integer offsets into a single array list

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:58):

Errrr… right, I forgot about that.

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:58):

so this shouldn't be two hard at all

view this post on Zulip Joshua Warner (Jun 20 2025 at 22:59):

I guess the only critical thing then is making sure that if you were to take a pattern ID and cast it to an expression ID, then decoded as an expression, it should “just work”

view this post on Zulip Anthony Bullard (Jun 20 2025 at 22:59):

yes

view this post on Zulip Anthony Bullard (Jun 20 2025 at 23:00):

actually that might not be that much work

view this post on Zulip Anthony Bullard (Jun 20 2025 at 23:00):

(famous last words)

view this post on Zulip Joshua Warner (Jun 20 2025 at 23:00):

Naturally, these code paths to convert to the external types will act as correctness assertions on the purse parser algorithms that are deciding what is and is not a valid expression (or pattern)

view this post on Zulip Joshua Warner (Jun 20 2025 at 23:00):

Yeah hopefully

view this post on Zulip Joshua Warner (Jun 20 2025 at 23:01):

lol dictating the word parser is impossibrl

view this post on Zulip Anthony Bullard (Jun 20 2025 at 23:02):

wow you are right!

view this post on Zulip Anthony Bullard (Jun 20 2025 at 23:02):

just tried myself

view this post on Zulip Jared Ramirez (Jul 01 2025 at 20:38):

Does the parser currently support multiple patterns in match? ie A | B => ...

view this post on Zulip Jared Ramirez (Jul 01 2025 at 20:40):

The type looks like in parse/AST.zig looks like:

pub const MatchBranch = struct {
    pattern: Pattern.Idx,
    body: Expr.Idx,
    region: TokenizedRegion,
}

So I'm guessing either it's not yet implemented, or when we parse we split A and B into their own branches pointing to the same expr?

view this post on Zulip Luke Boswell (Jul 01 2025 at 21:06):

I did think about that. I haven't implemented it (or at least not deliberately) in my PR

view this post on Zulip Luke Boswell (Jul 01 2025 at 21:07):

My guess is we'll split that into multiple branches... but the Parser should be where that is done I think. Actually I'm not sure...

view this post on Zulip Luke Boswell (Jul 01 2025 at 21:20):

We dont want to desugar in this version of the compiler so we probably need a different approach to ha dle this.

view this post on Zulip Jared Ramirez (Jul 01 2025 at 21:26):

In CIR, it has a span. Can the parse AST do the same?

pub const Branch = struct {
  patterns: Match.BranchPattern.Span,
  ...
}

view this post on Zulip Luke Boswell (Jul 01 2025 at 21:50):

I think so, it's a fair change to the parser. We may want to wait until after @Anthony Bullard has done his thing before we change it.

view this post on Zulip Richard Feldman (Jul 06 2025 at 01:31):

seems like higher-order function type annotations don't parse without parens right now :sweat_smile:

this parses:

foo : ((Str -> Str) -> Str)

...but without the outer parens, this fails to parse:

foo : (Str -> Str) -> Str

view this post on Zulip Anthony Bullard (Jul 06 2025 at 02:13):

that's definitely a bug

view this post on Zulip Anthony Bullard (Jul 06 2025 at 02:13):

if you add a snapshot for it i'll fix it

view this post on Zulip Richard Feldman (Jul 06 2025 at 03:00):

thanks! I pushed a repro snapshot to higher-order-annotations


Last updated: Jul 06 2025 at 12:14 UTC