Stream: compiler development

Topic: zig compiler - parsing exprs into patterns


view this post on Zulip Richard Feldman (Feb 02 2025 at 18:40):

this was a parsing idea I had a long time ago, and wanted to write down here because I think Zig will make it a lot easier!

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:42):

we have this common pattern where the parser encounters something like this:

{ a, b }

in isolation, this looks like an expression. But after parsing some more, it might turn out to be a pattern:

{ a, b } =
{ a, b } ->

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:42):

there is a ton of overlap between our pattern and expr nodes

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:43):

not full overlap though - as and | are only allowed in patterns, function application is only allowed in exprs, etc.

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:44):

but let's suppose we ignored those differences at the parsing stage, and said we had one node type called like ExprOrPattern (ideally with a better name, but let's run with it for now)

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:44):

and it supported as, |, function application, etc. - just a superset of all the things exprs and patterns can have

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:45):

and then I believe there's a way in Zig (haven't looked into specifically how this would look) to customize the discriminant of the tag union, such that we can say both:

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:46):

at that point, we can have all of these nodes begin life as an expr, and then later on when we discover it's a pattern (e.g. we encounter an =) we just flip the bit on the node we've been working on

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:46):

so we don't have to go back and reassemble anything, it just goes from being an ExprOrPattern with the expr bit set to the same node but with the pattern bit set

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:47):

then, later on in canonicalization, we can report errors like "you can't have | in an expr" or "you can't have function application in a pattern"

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:47):

this is separately consistent with our desire to have an error-tolerant parser

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:47):

e.g. just because I have as in an expr where it's not allowed, that shouldn't break the formatter

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:47):

the formatter should still be able to know what to do there, we just need to report an error during canonicalization of "you can't do that here, maybe you want _________ instead?"

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:48):

so this seems like it would simplify a bunch of logic we have now around discovering later that what looked like an expr actually turned out to be a pattern

view this post on Zulip Richard Feldman (Feb 02 2025 at 18:49):

bc we wouldn't have to build a Pattern from an Expr, we just flip a bit

view this post on Zulip Richard Feldman (Feb 02 2025 at 19:10):

anyway, curious what anyone thinks of this!

cc @Joshua Warner @Anthony Bullard

view this post on Zulip Joshua Warner (Feb 02 2025 at 19:26):

Including | for patterns in the same parser as |args| body closures will cause problems

view this post on Zulip Richard Feldman (Feb 02 2025 at 19:38):

hm true

view this post on Zulip Richard Feldman (Feb 02 2025 at 19:39):

:thinking: what if we changed those to or?

view this post on Zulip Richard Feldman (Feb 02 2025 at 19:39):

like Foo or Bar(baz) ->

view this post on Zulip Richard Feldman (Feb 02 2025 at 19:39):

seems very self-descriptive!

view this post on Zulip Joshua Warner (Feb 02 2025 at 19:42):

I'm not sure if we actually need to include | in the expression parser. I believe that's only used for when branches, where we can just call the expr parser with a flag saying to disallow closures, and then handle the | in the when branch code.

view this post on Zulip Joshua Warner (Feb 02 2025 at 19:42):

That said, I would be fine with or

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:05):

I know you would be against it more than likely Richard, but if the current | were replaced with or, then it could be repurposed to introducing a when branch and then there would be no amibiguity

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:48):

Do you have thoughts here @Joshua Warner ?

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:49):

That does make when parsing easier.

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:50):

Looks weird to my eyes tho. (Subjective opinion!)

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:50):

Pretty common in ML, though

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:50):

Guess which language family I’ve spent the least time using :wink:

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:52):

If pipe introduces branches, I’d have that also separate branch patterns

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:52):

I wouldn’t mix pipe and or for that

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:53):

Yeah, that's what other MLs do

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:53):

I don’t think the pipe ambiguity will be a problem in practice here

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:53):

But not Haskell (which is actually Miranda derived)

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:54):

Huh; what does that use?

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:54):

WSS

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:54):

Or dear old friend

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:54):

And to my memory (it's been a couple of years) it is NOT forgiving

view this post on Zulip Anthony Bullard (Feb 02 2025 at 21:55):

    case (x, y) of
        (a, b) -> print ("x: " ++ show a ++ ", y: " ++ show b)

But

    case (x, y) of
    (a, b) -> print ("x: " ++ show a ++ ", y: " ++ show b)

WILL NOT parse if I remember correctly

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:57):

Ahhh for some reason I thought you meant it used something else besides '|' to separate patterns

view this post on Zulip Joshua Warner (Feb 02 2025 at 21:58):

Yes that much matches my memory, given my (very limited) haskell experience


Last updated: Jul 06 2025 at 12:14 UTC