zig compiler - parsing exprs into patterns · compiler development

Stream: compiler development

Topic: zig compiler - parsing exprs into patterns

Richard Feldman (Feb 02 2025 at 18:40):

this was a parsing idea I had a long time ago, and wanted to write down here because I think Zig will make it a lot easier!

Richard Feldman (Feb 02 2025 at 18:42):

we have this common pattern where the parser encounters something like this:

{ a, b }

in isolation, this looks like an expression. But after parsing some more, it might turn out to be a pattern:

{ a, b } =

{ a, b } ->

Richard Feldman (Feb 02 2025 at 18:42):

there is a ton of overlap between our pattern and expr nodes

Richard Feldman (Feb 02 2025 at 18:43):

not full overlap though - as and | are only allowed in patterns, function application is only allowed in exprs, etc.

Richard Feldman (Feb 02 2025 at 18:44):

but let's suppose we ignored those differences at the parsing stage, and said we had one node type called like ExprOrPattern (ideally with a better name, but let's run with it for now)

Richard Feldman (Feb 02 2025 at 18:44):

and it supported as, |, function application, etc. - just a superset of all the things exprs and patterns can have

Richard Feldman (Feb 02 2025 at 18:45):

and then I believe there's a way in Zig (haven't looked into specifically how this would look) to customize the discriminant of the tag union, such that we can say both:

here's what type of node it is (e.g. "record destructure" in this example)
a single bit for "this is an expr" vs "this is a pattern"

Richard Feldman (Feb 02 2025 at 18:46):

at that point, we can have all of these nodes begin life as an expr, and then later on when we discover it's a pattern (e.g. we encounter an =) we just flip the bit on the node we've been working on

Richard Feldman (Feb 02 2025 at 18:46):

so we don't have to go back and reassemble anything, it just goes from being an ExprOrPattern with the expr bit set to the same node but with the pattern bit set

Richard Feldman (Feb 02 2025 at 18:47):

then, later on in canonicalization, we can report errors like "you can't have | in an expr" or "you can't have function application in a pattern"

Richard Feldman (Feb 02 2025 at 18:47):

this is separately consistent with our desire to have an error-tolerant parser

Richard Feldman (Feb 02 2025 at 18:47):

e.g. just because I have as in an expr where it's not allowed, that shouldn't break the formatter

Richard Feldman (Feb 02 2025 at 18:47):

the formatter should still be able to know what to do there, we just need to report an error during canonicalization of "you can't do that here, maybe you want _________ instead?"

Richard Feldman (Feb 02 2025 at 18:48):

so this seems like it would simplify a bunch of logic we have now around discovering later that what looked like an expr actually turned out to be a pattern

Richard Feldman (Feb 02 2025 at 18:49):

bc we wouldn't have to build a Pattern from an Expr, we just flip a bit

Richard Feldman (Feb 02 2025 at 19:10):

anyway, curious what anyone thinks of this!

cc @Joshua Warner @Anthony Bullard

Joshua Warner (Feb 02 2025 at 19:26):

Including | for patterns in the same parser as |args| body closures will cause problems

Richard Feldman (Feb 02 2025 at 19:38):

hm true

Richard Feldman (Feb 02 2025 at 19:39):

:thinking: what if we changed those to or?

Richard Feldman (Feb 02 2025 at 19:39):

like Foo or Bar(baz) ->

Richard Feldman (Feb 02 2025 at 19:39):

seems very self-descriptive!

Joshua Warner (Feb 02 2025 at 19:42):

I'm not sure if we actually need to include | in the expression parser. I believe that's only used for when branches, where we can just call the expr parser with a flag saying to disallow closures, and then handle the | in the when branch code.

Joshua Warner (Feb 02 2025 at 19:42):

That said, I would be fine with or

Anthony Bullard (Feb 02 2025 at 21:05):

I know you would be against it more than likely Richard, but if the current | were replaced with or, then it could be repurposed to introducing a when branch and then there would be no amibiguity

Anthony Bullard (Feb 02 2025 at 21:48):

Do you have thoughts here @Joshua Warner ?

Joshua Warner (Feb 02 2025 at 21:49):

That does make when parsing easier.

Joshua Warner (Feb 02 2025 at 21:50):

Looks weird to my eyes tho. (Subjective opinion!)

Anthony Bullard (Feb 02 2025 at 21:50):

Pretty common in ML, though

Joshua Warner (Feb 02 2025 at 21:50):

Guess which language family I’ve spent the least time using :wink:

Joshua Warner (Feb 02 2025 at 21:52):

If pipe introduces branches, I’d have that also separate branch patterns

Joshua Warner (Feb 02 2025 at 21:52):

I wouldn’t mix pipe and or for that

Anthony Bullard (Feb 02 2025 at 21:53):

Yeah, that's what other MLs do

Joshua Warner (Feb 02 2025 at 21:53):

I don’t think the pipe ambiguity will be a problem in practice here

Anthony Bullard (Feb 02 2025 at 21:53):

But not Haskell (which is actually Miranda derived)

Joshua Warner (Feb 02 2025 at 21:54):

Huh; what does that use?

Anthony Bullard (Feb 02 2025 at 21:54):

WSS

Anthony Bullard (Feb 02 2025 at 21:54):

Or dear old friend

Anthony Bullard (Feb 02 2025 at 21:54):

And to my memory (it's been a couple of years) it is NOT forgiving

Anthony Bullard (Feb 02 2025 at 21:55):

    case (x, y) of
        (a, b) -> print ("x: " ++ show a ++ ", y: " ++ show b)

But

    case (x, y) of
    (a, b) -> print ("x: " ++ show a ++ ", y: " ++ show b)

WILL NOT parse if I remember correctly

Joshua Warner (Feb 02 2025 at 21:57):

Ahhh for some reason I thought you meant it used something else besides '|' to separate patterns

Joshua Warner (Feb 02 2025 at 21:58):

Yes that much matches my memory, given my (very limited) haskell experience

Last updated: Aug 17 2025 at 12:14 UTC