Joshua Warner said:
Sam Mohr the reason I'd like not to just directly have a header/statements is I want this to be able to directly parse individual expressions both for testing and for repl evaluation
Okay, then what's the best way to:
A message was moved here from #compiler development > zig compiler - spike by Joshua Warner.
@Sam Mohr Why do you need to just parse the header? My intent would be to make parsing fast enough that parsing the whole file is plenty fast.
We'd want to parse just the header of the root main.roc
of each app/package/platform to get their dep packages to discover the file trees of each package
But if it's fast enough to just parse the whole thing, then no need
Also, if we persist said list of imported packages in the CanIR
, then we can just pull it from the cache and keep around the CanIR
until later
Cool. We can explore that optimization in the future. that does add a non-trivial amount of complexity though so I'd like to do the simple thing for now
Yeah, totally agreed
Ok, I think this test should get you going:
test "canonicalize" {
const source =
\\app [name] { pf: platform "main.roc" }
\\
\\name = "Luke"
;
var parse_ir = parse.parse(std.testing.allocator, source);
var can_ir = IR.init(std.testing.allocator);
parse_ir.store.emptyScratch();
canonicalize(&can_ir, &parse_ir, std.testing.allocator);
}
The primary problem with the test you posted is just that module headers are not currently supported... and the particular way error recovery happens right now is looping forever (whoops!), which made that harder than expected to figure out.
(whoops left some debug stuff in; now removed)
Okay, this is a good start!
If I wanted to start looking at intern'ing for symbols in the tokenizer - is there an existing structure you're expecting that data to go into?
The interners in the ModuleEnv
https://github.com/roc-lang/roc/blob/26f9416929aa0cd52ca732fc533b4a94a690de04/src/base/ModuleEnv.zig#L22
Ideally, we'd put text in the "right" bucket:
Ident.Store
List
of List a
) in TagName.Store
FieldName.Store
StringLiteral.Store
But if that turns out to be difficult/not possible to do consistently during tokenization, then maybe we reduce down to just Ident.Store
I can split lower and upper idents, that's about it.
I figured
At the layer of hashing, do they need to be in separate buckets?
Not sure what you mean by the layer of hashing
I think we should just make 2 buckets, upper_idents
and lower_idents
I'm thinking we can do something like:
That works!
We'd want those IDs to be pretty much as granular as possible, within reason
Can you expand on that? Do you mean for the purpose of types? Or is it beneficial to have the id space be compact?
Which should mean:
For the purpose of types
Using a single bucket makes the most sense
I'm not actually sure how much benefit there is for having these distinct types compared to just LowerIdent.Idx
and UpperIdent.Idx
The TagName
and FieldName
stuff is an artifact from the specialize_types
prototype from @Agus Zubiaga
I think we can just start with a single SmallStringInterner
that we have UpperIndex.Idx
and LowerIdent.Idx
both point into
In Ident.Store:
pub fn insert(self: *Store, ident: Ident, region: Region, problems: *std.ArrayList(Problem)) Idx {
... I'm a little worried about the overhead of alloc'ing an ArrayList of Problem for every ident.
I'm also somewhat concerned at doing the interning in the tokenizer based on this Ident type, since these:
/// Attributes of the identifier such as if it is effectful, ignored, or reassignable.
attributes: Attributes,
... are things we don't really know at this point in compiling and I'm hesitant to create a lie; I think that'll lead to issues down the road.
Furthermore:
/// Problems with the identifier
/// e.g. if it has two underscores in a row
/// or if it starts with a lowercase then it shouldn't be `lowerCamelCase`, it must be `snake_case`
problems: Problems,
These are things that IMO should be reported as diagnostics in the tokenizer, and none of the rest of the compiler should really need to track (or care about at all).
I'm thinking instead I'd like to do interning on a lower-level type that _only_ represents the text, giving a simple intern'd string id (u32) and nothing else.
Thoughts @Sam Mohr / @Luke Boswell ?
I assumed the point of passing in problems like that would be to just have one for the whole pass that gets passed around, and each call may or may not push problems onto it
(as opposed to allocating a new one each time)
Possibly
But given what's currently in Problems
, I don't see why the design would warrant that
There is no need to dynamically allocate these. and the ones that are there right now probably ought to be just tokenizer diagnostics.
So I guess, the question is what might be in there in the future that would justify this?
I expect there was some discussion about this that I just was not part of and want to make sure I'm not missing anything
I don't think we've really discussed it in any detail. Sam has just been developing these data types and structures as a best effort based on what we know so far, so we have something to start with. We expect them to evolve a lot as we go.
Richard Feldman said:
I assumed the point of passing in problems like that would be to just have one for the whole pass that gets passed around, and each call may or may not push problems onto it
This
The reason I put the problems as a mutable reference was so that you knew as a caller that your problems on the Ident would be reported on interning
It prevents someone from forgetting to intern the problems
We could alternatively pass the problem reference to the Ident.Store, but that gets us closer to pointer jungle
So this seems less tangled from bird's eye view
Minor note, this may be the wanted type for storing problems: https://ziglang.org/documentation/master/std/#std.BoundedArray
Do you think we're just going with UpperString and LowerString for interners, or are we having all the different variants?
as opposed to slices and counts or growing arraylists
What's the use case for some later compiler stage knowing that an identifier has a subsequent_underscores problem? That should have already been reported in the tokenizer.
So we can report that to the user as a warning?
I'm not sure why we store a list of problems in the Ident though
What's the use case for some later compiler stage knowing that an identifier has a subsequent_underscores problem?
I think the goal is to collect all problems and then dispatch in one place to decide exactly how everything is reported to the end users.
I thought all the Problems would be stored in the ModuleEnv
I'm guessing this field can be removed, because we store all the problems in ModuleEnv
/// Problems with the identifier
/// e.g. if it has two underscores in a row
/// or if it starts with a lowercase then it shouldn't be `lowerCamelCase`, it must be `snake_case`
problems: Problems,
Makes sense
How is this intended to work?
/// Identifier attributes such as if it is effectful, ignored, or reassignable packed into 3-bits.
pub const Attributes = packed struct(u3) {
effectful: bool,
ignored: bool,
reassignable: bool,
};
Those are things that will be different for different instantiations of the same identifier
e.g. the same name used in as different local variables in two different functions
This is a very good point...
Actually
I guess reassignable would be the _
suffix, effectful would be the !
suffix, and ignored would be the _
prefix?
Maybe that is actually a textual property of the identifier
The tokenizer will actually know those already, so we can just pass those in pre-computed
Easy enough
Joshua Warner said:
Maybe that is actually a textual property of the identifier
This was the plan yes
How do those attributes apply to things like uppercase idents, field names, etc?
They don't, which is part of the reason why they used to be separate
But it seems to be cheap enough to just set the attributes to all false, AKA zero, and leave them in there
However, if those attributes are easier to parse unilaterally, then having them for non idents, e.g. for ignored field names in record builders, they'll just get ignored
Ok cool
Initial PR for integrating interning in the tokenizer: https://github.com/roc-lang/roc/pull/7624
PR for parsing the new match
expression and some more patterns: https://github.com/roc-lang/roc/pull/7626
Make sure to keep an eye on this test, as it is the "everything implemented right now" test:
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ other
\\ }
\\}
\\
\\match_time = |a| {
\\ match a {
\\ Blue -> 47,
\\ Green -> 19,
\\ Red -> 12,
\\ lower -> 1,
\\ [1, 2, 3, .. as rest] -> 123,
\\ 3.14 -> 314,
\\ }
\\}
\\
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ tag = Blue
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
);
}
Let me know if anything looks wrong here.
I'll try to get Tags with payloads and record patterns parsing this afternoon (and probably Tuple patterns)
My plan after that is Type decls and Type annotations
And then the hard stuff - Tuples, Records, and BinOps
If someone needs/wants something done sooner (like other headers), let me know - or feel free to put in a PR yourself and send it to me!
The hold-up with Records is I am going to have to refactor Body/Block to be just another type of expression and not it's own distinct type of Node
@Joshua Warner remind me, did we decide to implement a NoSpaceColon token that will be required for (expression) Record Fields?
I think Record Fields in type annotations can be more forgiving
I think the only real alternative would be something more drastic like inline type annotations which has been ruled out in the past by Richard
what if we made match
consistent with if
and have it just use curly braces instead of a special ->
and ,
thing?
e.g. instead of this:
match_time = |a| {
match a {
Blue -> 47,
Green -> 19,
Red -> 12,
lower -> 1,
[1, 2, 3, .. as rest] -> 123,
3.14 -> 314,
}
}
...we do this:
match_time = |a| {
match a {
Blue { 47 }
Green { 19 }
Red { 12 }
lower { 1 }
[1, 2, 3, .. as rest] { 123 }
3.14 { 314 }
}
}
seems like it's just as easy to read, it's one less piece of syntax to learn, and if you want a multiline branch you're already all set up with the curly braces
it always annoys me in Rust having to switch back and forth between ,
and {
:stuck_out_tongue:
Hmm... I guess it is just a bit less distinct, but once you have mutiline blocks is common anyway
Also, will lead to more common fake record syntax
@Anthony Bullard I think we do want NoSpaceColon for now. We can always have the parser accept either token in cases where it’s not ambiguous.
@Anthony Bullard how about we try the "no arrows" style above (as in, we just parse a pattern followed by an expression, and the formatter always chooses to put braces around that expression, just like with if
) but if someone puts in ->
or =>
or ,
we treat them like semicolons (warn, and then the formatter drops them)
That’s syntax is great!
Richard Feldman said:
it always annoys me in Rust having to switch back and forth between
,
and{
:stuck_out_tongue:
I also hate this
Hmm, I'm a little worried about ambiguities without -> or similar
Blue { foo }
could either be Blue -> {foo:foo}
, or it could be Blue {foo}
(i.e. a tag with a record pattern)... and then we need to keep parsing to find out what the branch body is.
Actually no, it's not the first thing. (brain fart)
It's ambiguous between {foo}
being the body and {foo}
being a record pattern with the body to come later.
You can keep going with that logic indefinitely
Err actually wait nvm ignore me; that'd be Blue{(foo})
EDIT, nope: Blue({foo})
:man_facepalming:
beautifully cursed brackets {(})
I like that the syntax requires the braces so that you never have to decide if they should be included or not
It's a bit inconsistent with lambdas because it's still optional there
but I think I prefer it remains optional
To remain optional would just be?
match_time = |a| {
match a {
Blue 47
Green 19
Red 12
lower 1
[1, 2, 3, .. as rest] 123
3.14 314
}
yeah, that's what I think it should be
but then the formatter adds braces for clarity
it's conceptually simple: you just alternate patterns and expressions, that's it
but not in lambdas
(the formatter doesn't add braces)
right, just in match
match { 1 2 1 2 1 2 }
would technically be valid :stuck_out_tongue:
at least from a parsing perspective
with 1
s being patterns and 2
s being expressions
that might be nice too for typing speed
yeah that's the symmetry with if
where like if True False else True
is valid
edit: was originally if True False True
- forgot the else
!
but the formatter adds braces for clarity
if True False True
how is this valid?
so if you want to take the shortcut when typing, you can
No else of else if
oops, fixed!
ah
I kinda hate the lack of grouping in all of this, but I guess the formatter readds it, so maybe ok
in general the theme is "braces are never required, but the formatter may add them"
That said, I'm really not a fan of same line things without grouping
we could soft enforce them with a warning
could, but probably unnecessary
Even
if True
False
and
if True False
Feel like they shouldn't be allowed.
I like the conceptual simplicity though - braces are a special case of expressions
I know
so anywhere you're using them as an expression, you could of course omit them
But feels like conceptional simplicity being traded for allowing messy code
Extra symbols and splitting things up can definitely help with readability
I hear that, but there's all sorts of messy code you can write if you don't use the formatter :laughing:
and I think we could reasonably add a warning if it seems like people are actually doing it in practice
my assumption is that they wouldn't
idk...a lot of people like being terse way too much
but yeah, probably just a limited few
I like the idea of allowing a lot of common mistakes to parse/run, but still discourage them with a warning, and have the formatter fix them automatically
it's the best of both worlds
As long as the formatter never fails and can format all bad code consistently
So basically I should just remove the need for the arrow, and in the formatter force braces?
And remove the need for commas? @Richard Feldman
I personally think we should keep commas between branches
As it is an unbounded list of things
Lists, Tuples, Record Fields, Function Args, Lambda Args, Exposes items, package entries, they all require ,
between the "items"
Joshua Warner said:
Anthony Bullard I think we do want NoSpaceColon for now. We can always have the parser accept either token in cases where it’s not ambiguous.
How then are we going to distinguish between a record or a block starting with a type annotation with backtracking?
Should this be spun out into an ideas thread?
I feel like we should put together some larger examples. My concern is the strangeness budget with removing the arrows which seem to be pretty universal in other languages. Not saying I couldn't get used to it, but just not sure discussion in the middle of a parser thread really meets our usual standard for these things.
The braces on a single line look a little strange to me { 4 }
... I like the braces for multiple lines though.
Anthony Bullard said:
Lists, Tuples, Record Fields, Function Args, Lambda Args, Exposes items, package entries, they all require
,
between the "items"
yeah, but switch
statements don't, and when they're all on different lines, it just feels to me like a chore that doesn't add value
this does make me realize the formatter could add missing commas in lists and records though! :smiley:
I'm not sure if these are formative ideas, or decisions from our BDFN
fair point on starting a thread about ->
in match
, I'll write one up later
mainly I just saw the example code, thought "oh we definitely don't need commas" and then thought I'd mention that idea while I was at it - but it does deserve its own thread
the formatter being able to add missing commas in lists and records (and tuples!) is just an observation about a convenience we could add
not a syntax change, just error recovery
I think we could do the same here
Allow commas, but not require
And the formatter add them
This is what is currently implemented:
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue { 47 },
\\ Green { 19 },
\\ Red { 12 },
\\ lower { 1 },
\\ [1, 2, 3, .. as rest] { 123 },
\\ 3.14 { 314 },
\\}
\\
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ tag = Blue
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
);
}
I can't as a human parse the branches without at least the commas
switch
gets away without them because it has case
keyword before each branch
And usually in C-like languages :
between the "pattern" and the block (and also many times a near-mandatory break;
statements at the end of a case block.)
Sometimes syntax that is unnecessary for machine parsing is very much necessary for human visual parsing
To your point though Richard, I think we at a point where for machine parsing there is no need for commas in ANY collection-like syntactic construct
They are there for the humans :-)
Richard Feldman said:
what if we made
match
consistent withif
and have it just use curly braces instead of a special->
and,
thing?
I made a PR for realword to see what this looks like https://github.com/rtfeldman/roc-realworld/pull/1
It's based on @Anthony Bullard's current implementation of match above. I'm not sure if the commas are optional but I included them.
I really like with zig how adding the comma lets the formatter know to make it multi-line or not... if we can borrow that feature I think that would be nice.
I think I missed a couple of matches ... I'll add those (done)
From making this PR.. I would say removing ->
and just using braces feels good.
match client.query!(cmd) {
Ok([row]) { Article.fromRow(row).(Ok) },
Ok([]) { Err(NotFound) },
Err(db_err) { Err(InternalErr(db_err.inspect())) },
}
Anthony Bullard said:
And the formatter add them
we could, I just don't see a benefit :sweat_smile:
if we're already adding braces on every branch, it's super clear where each one begins and ends
to put it another way, I don't see why },
is a better delimiter for every branch than just }
That's fair
Pushed a change to make commas optional, and not included in the formatted output
match_time = |a| match a {
Blue { 47 }
Green { 19 }
Red { 12 }
lower { 1 }
[1, 2, 3, .. as rest] { 123 }
3.14 { 314 }
}
I made #ideas > `match` without `->` to discuss the design question!
Tuple patterns are implemented in my PR, thought I'd get records done also in this little 30 minute stretch - but I hit a weird bug trying to fix an obvious tokenizer bug.
@Joshua Warner Currently when we read a (
, we have whether we emit a NoSpaceOpenRound or OpenRound token flipped from how it should be. But when I fixed that, we get an infinite loop. These are impossible to debug in Zig with print debugging because the output with std.debug.print
just never gets flushed. Could you take a look at that sometime? Not a blocker or high priority, but should be fixed I think
Here's the currently implemented syntax...focusing on solely match here:
test "Syntax grab bag" {
try statementFmtsSame( // This is a made-up function that doesn't actually exist :-)
\\match_time = |a| match a {
\\ Blue { 47 }
\\ Green { 19 }
\\ Red { 12 }
\\ lower { 1 }
\\ [1, 2, 3, .. as rest] { 123 }
\\ 3.14 { 314 }
\\ (1, 2, 3) { 123 }
\\}
);
}
@Anthony Bullard I repro'd one infinite loop at least, and doing this works for me:
diff --git a/src/check/parse/Parser.zig b/src/check/parse/Parser.zig
index 5ba9d25af4..aec25d5179 100644
--- a/src/check/parse/Parser.zig
+++ b/src/check/parse/Parser.zig
@@ -577,7 +577,7 @@ pub fn parseExpr(self: *Parser) IR.NodeStore.ExprIdx {
if (expr) |e| {
var expression = e;
// Check for an apply...
- if (self.peek() == .OpenRound) {
+ if (self.peek() == .NoSpaceOpenRound) {
const scratch_top = self.store.scratch_exprs.items.len;
self.advance();
while (self.peek() != .CloseRound) {
diff --git a/src/check/parse/tokenize.zig b/src/check/parse/tokenize.zig
index c831a5ef7a..e1e180aab7 100644
--- a/src/check/parse/tokenize.zig
+++ b/src/check/parse/tokenize.zig
@@ -992,7 +992,7 @@ pub const Tokenizer = struct {
'(' => {
self.cursor.pos += 1;
self.stack.append(.Round) catch exitOnOom();
- self.output.pushTokenNormal(if (sp) .NoSpaceOpenRound else .OpenRound, start, 1);
+ self.output.pushTokenNormal(if (sp) .OpenRound else .NoSpaceOpenRound, start, 1);
},
'[' => {
self.cursor.pos += 1;
The trick I've taken to doing to debug hanging tests is:
zig build test
pgrep -lf test
to find the name of the test binary running: something like 5107 /Users/joshw/src/github.com/roc-lang/roc/.zig-cache/o/05e8b24ea2f7133aef9cb6aa47b421e7/test --listen=-
I assume there must be a better way to do this tho. Maybe @Andrew Kelley knows some tricks?
I think you are right about the debugging process. I guess I need to improve my lldb skills. (Or just set up DAP in my Neovim)
And I'm going to add the above patch to my PR
Current syntax supported in my latest PR:
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ lower -> 1
\\ "foo" -> 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] -> 123
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ {foo: 1, bar: 2} -> 12
\\ {foo: 1, bar: 2 | 7} -> 12
\\}
\\
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ tag = Blue
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
);
}
^ UPDATED THE ABOVE
Only thing left in patterns is Tags with payloads
Can we move to => for match? Seemed like @Richard Feldman was pushing for that, but I may be mistaken
Oh yeah! Also
....?
.. as rest
might need to be allowable, but I think we were trying to prefer ..rest
Oh
Ok, so as
is dead?
No, which is a problem...
And are we 100% on fat arrows for match?
No
I would prefer landing the current diff and iterating on it later :)
Good idea
Yeah
I've already implemented 5 syntaxes for match :rofl:
I thought I might spend a couple hours today tinkering with the Parse IR and see if I can make it generate an SExpr for a snapshot.
We actually probably don't want to support ..<pattern>
in record destructs because we already have .. as <ident>
and there shouldn't be two ways to do the same thing
Wait ..
is followed by a pattern?
That doesn't seem right...
Yeah, I agree
I think we should use ..<ident>
in both lists and records
Well, if we support that in records, then we can do { foo: 123, ..rest }
and also { foo: 123, .. as rest }
Gotta go eat dinner, I'll try to pop back on and take care of that conflict
Maybe we can put a warning on the second
I would rather just remove the second
There is no advantage to it
Agreed, except for the fact that there's now a specific use of as
that doesn't work
I actually haven't even implemented Record Pattern rest
Which is?
as
is usually
pub const AsPattern = struct {
pattern: Pattern,
ident: Ident,
region: Region,
};
But it can't be just Pattern
anymore
If you can make it work though, go for it
Yeah, I removed this in my PR and made as
a part of rest
My PR is rebased and ready for review
Next up for me is Type Annotations and Declarations
Sam Mohr said:
Can we move to => for match? Seemed like Richard Feldman was pushing for that, but I may be mistaken
let's do it for now just to address the parsing ambiguity on Ok(a) if a->b->c
No parsing ambiguity as I haven't even implemented record access / static dispatch, let alone -> (which we should have a name for)
:smile:
But I can put it with my current change, which actually I tried to sneak into the last PR but was too late
(Tags with payload patterns)
@Anthony Bullard you may need to rebase any PR's, we just merged a change that changed a few things across the compiler
I have no outstanding PRs, but thanks I'll rebase now!
Did we land interning?
Yes
Holy hell, 8 commits?
Josh landed a primitive version
Ok, I've rebased....I'm holding on to my seat as I rebase
I'm currently trying to figure out how to get the top-level nodes out of the parse AST
test "example s-expr" {
const source =
\\module []
\\
\\foo = "bar"
;
var arena = std.heap.ArenaAllocator.init(testing.allocator);
defer arena.deinit();
var env = base.ModuleEnv.init(&arena);
var parse_ast = parse(&env, testing.allocator, source);
defer parse_ast.deinit();
var iter = parse_ast.store.nodes.iterIndices();
while (iter.next()) |node| {
std.debug.print("{}", .{parse_ast.store.getStatement(node)});
}
}
Use parse_ast.store.getFile(), and then follow on from there
The file will have the header and statements, and if you read the types it should be easy to figure out what store method to use to get children
I can talk later if you need help
Actually, look at fmt.zig
From formatFile
on
We need mental institutionalization, not just help
Your code should be similar in how you walk the AST
BTW I'm currently reworking string tokenizing to make a little more sense
Sam Mohr said:
We need mental institutionalization, not just help
Some more than others :-)
I have a 4-year-old
*You have a 4-year-old and mental problems
thread 12285665 panic: reached unreachable code
--- :man_facepalming:
I shouldn't be allowed to touch these things
Errr
What’s the sourxe
And your code
And the error?
@Luke Boswell
See your DM's
omg I didn’t take a screenshot of it, but Apple Intelligence summarized the notifications in this channel as “Anthony has mental problems; Luke panicked”
Fair
That’s one way to make me read Zulip
More crazy talk, got it
I think it’s a fair summary
nobody else here knows it, but Agus went HAM on the recent Zed release, it was really impressive!
Big W for Zed
people who joined after him probably assume he's been at Zed for years based on that :laughing:
We are thankful for your efforts! It's been powering my collab with Luke for the last week or two
Cant say if this has improved productivity...
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\Map a b : List(a), (a -> b) -> List(b)
\\
\\Foo : (Bar, Baz)
\\
\\Some a : { foo : Ok(a), bar : Something }
\\
\\Maybe a : [Some(a), None]
\\
\\SomeFunc a : Maybe(a), a -> Maybe(a)
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one : (U64 -> U64)
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ lower -> 1
\\ "foo" -> 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] -> 123
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ { foo: 1, bar: 2, ..rest } -> 12
\\ { foo: 1, bar: 2 | 7 } -> 12
\\ Ok(123) -> 123
\\ Ok(Some(dude)) -> dude
\\ TwoArgs("hello", Some("world")) -> 1000
\\}
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ tag = Blue
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
);
}
Syntax supported with my latest parser+formatter PR
Next up: Record literal expressions
After that: BinOps ( :scared: )
for binops, is Pratt Parsing on your radar already?
you've mentioned it
I'll read up on it
https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
Joshua Warner said:
The trick I've taken to doing to debug hanging tests is:
zig build test
- (wait for it to hang in the test)
- hit ctrl+Z
pgrep -lf test
to find the name of the test binary running: something like5107 /Users/joshw/src/github.com/roc-lang/roc/.zig-cache/o/05e8b24ea2f7133aef9cb6aa47b421e7/test --listen=-
- From there I can either run that binary myself, attaching lldb to the process, etc.
I assume there must be a better way to do this tho. Maybe Andrew Kelley knows some tricks?
That workflow seems reasonable to me. Is there a way you can imagine it being improved with some zig tooling changes?
Couple thoughts: (1) It would be convenient if the binary name was stable so I don’t need to do the ctrl+z switcheroo. Or perhaps it could print the full path of the currently running binary. (2) it would be useful for the runner to do something like print the test name that’s running (perhaps after some minimum timeout). (3) it could also be useful to be able to run only a single test via passing the name of that test on the command line (or perhaps the name of the file, if the test is anonymous)
(1) you can get if you drop down into zig build-exe CLI with -femit-bin=foobar
(2) you can get with --verbose
(3) is --test-filter
with the direct CLI, and you can introduce a -D build option into your build script and set the test filter there
if you combine (1) and (2) you can get what you want I think
one more random tip, always delete the --listen=-
arg when running manually, unless you want to speak the Build Runner Protocol
so to be clear you can use --verbose to get the zig build-exe
or zig test
command, then you can copy paste that, delete --listen=-
add -femit-bin=foobar
and then you have yourself the desired workflow
as for --test-filter
note that the filter is applied very early, so you can use this when doing large refactorings to only test a subset of unit tests even if the rest of your app is not compiling
I'm running into a spot where Zig just doesn't enjoy passing functions. I want to have this helper:
fn parseCollection(self: *Parser, comptime T: type, end_token: Token.Tag, scratch: *std.ArrayList(T), parser: fn (*Parser) T) ExpectError!usize {
const scratch_top = scratch.items.len;
while (self.peek() != end_token) {
scratch.append(parser(self)) catch exitOnOom();
self.expect(.Comma) catch {
break;
};
}
self.expect(end_token) catch {
return ExpectError.expected_not_found;
};
return scratch_top;
}
Since that exact kind of code happens at least 19 times in the parser. But if one of the parsers takes an argument, I understand I need to create a new top-level function that can be called with zero args here (calling the real function with a specific value for it's arg). But when you have recursion, usually the function is parametric on that argument, so we need to be able to call the correct parser when we recurse. Like this:
const parser: fn (*Parser) IR.NodeStore.PatternIdx = if (alternatives == .alternatives_allowed) parsePatternWithAlts else parsePatternNoAlts;
But no matter what I do, that gives me the following error:
check/parse/Parser.zig:449:58: error: value with comptime-only type 'fn (*check.parse.Parser) check.parse.IR.NodeStore.PatternIdx' depends on runtime control flow
const parser: fn (*Parser) IR.NodeStore.PatternIdx = if (alternatives == .alternatives_allowed) parsePatternWithAlts else parsePatternNoAlts;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I guess i don't understand the argument here. The type of parser
is always the same, and can be verified correctly at comptime. So why is this not allowed?
Now obviously if we went with #ideas > ✔ `or` instead of `|` in `when` branches (or one of the similar proposals) then this problem goes away because there is no longer contention between pattern alternatives and |...|
lambda args.
let's just try that
if we don't like or
for alternatives in practice, we can always reconsider, but it seems like a reasonable design in its own right
If you are OK with it, I will do it
go for it
worst-case scenario is that we find out we don't like it, which would justify the more involved implementation
@Anthony Bullard I wonder if instead of a function pointer, you could do a comptime enum for all the collection variants? Or perhaps you can pass in the desired return type as comptime, and you call ::parse() on that?
The issue is that a function is a comptime only type
You want to use a function pointer if the value can change at runtime
Zig does not have first class functions or lambdas or closures
I think the original code just needs *const fn ...
I think you're gonna have a better time if you always pass functions comptime unless you're doing the vtable pattern
the optimizer will definitely have a better time
also I recognize this is more of a functional/imperative preference thing, but I reworked a contributor's code that used that parser function pointer pattern in zig's parser to simply not do that, and felt like the result was better. yeah it technically is less DRY
anyway the comptime suggestion is irrelevant to that though
Brendan Hansknecht said:
I think the original code just needs
*const fn ...
Interesting .. why did I think that Zig did not support function pointers? I think moving towards or
for pattern alternation will make a lot of things clearer in OUR grammar anyway, open up more places where it can be used, and also make this code a lot easier to understand.
yeah like in my mind both are totally reasonable options as an end user, just with some different tradeoffs, and since we've only tried one but have an implementation reason to try out the other, seems like a fine excuse to try it out and see how we like it in practice
#7638 Parsing Records and Tuples
Support syntax:
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\Map a b : List(a), (a -> b) -> List(b)
\\
\\Foo : (Bar, Baz)
\\
\\Some a : { foo : Ok(a), bar : Something }
\\
\\Maybe a : [Some(a), None]
\\
\\SomeFunc a : Maybe(a), a -> Maybe(a)
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one : (U64 -> U64)
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ lower -> 1
\\ "foo" -> 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] -> 123
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ { foo: 1, bar: 2, ..rest } -> 12
\\ { foo: 1, bar: 2 | 7 } -> 12
\\ Ok(123) -> 123
\\ Ok(Some(dude)) -> dude
\\ TwoArgs("hello", Some("world")) -> 1000
\\}
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ tag = Blue
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ # New!
\\ record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
\\ # New!
\\ tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
);
}
Next up: BinOps (via Pratt Parsing)
(Note: I will come back to replace |
with or
, I want to push the supported syntax forward right now, and I think that BinOps are just critical to getting that done)
After BinOps, I think you should be able to parse some simple programs
That unit test looks familiar :)
BinOps have landed (in my PR) - using Pratt Parsing which was soooooo easy (thanks @Richard Feldman ):
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\Map a b : List(a), (a -> b) -> List(b)
\\
\\Foo : (Bar, Baz)
\\
\\Some a : { foo : Ok(a), bar : Something }
\\
\\Maybe a : [Some(a), None]
\\
\\SomeFunc a : Maybe(a), a -> Maybe(a)
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one : (U64 -> U64)
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ lower -> 1
\\ "foo" -> 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] -> 123
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ { foo: 1, bar: 2, ..rest } -> 12
\\ { foo: 1, bar: 2 | 7 } -> 12
\\ Ok(123) -> 123
\\ Ok(Some(dude)) -> dude
\\ TwoArgs("hello", Some("world")) -> 1000
\\}
\\
\\expect blah == 1
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ expect blah == 1
\\ tag = Blue
\\ return tag
\\ crash "Unreachable!"
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
\\ tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
\\ bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
\\
\\expect {
\\ foo = 1
\\ blah = 1
\\ blah == foo
\\}
);
}
This test for the binops may also be helpful for people to correct any precedence or associativity errors I may have made:
test "BinOp omnibus" {
const expr = "Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5";
const expr_sloppy = "Err(foo)??12>5*5 or 13+2<5 and 10-1>=16 or 12<=3/5";
const formatted = "((((Err(foo) ?? 12) > (5 * 5)) or (((13 + 2) < 5) and ((10 - 1) >= 16))) or (12 <= (3 / 5)))";
try exprFmtsSame(expr, .no_debug);
try exprFmtsTo(expr_sloppy, expr, .no_debug);
try exprFmtsTo(expr, formatted, .debug_binop);
try exprFmtsTo(expr_sloppy, formatted, .debug_binop);
}
Not that the ()
-laden formatted
variable is using a debug flag in the formatter to show the boundaries of the individual operations (until we have full SExpr IR support here)
Anthony, I'm gonna have to break your legs
You're going too fast
Drastic actions :p
Sam Mohr said:
You're going too fast
I’m only doing this an hour (or two) a day! I’ve felt like I’m moving at a glacial pace, so thanks for the pick-me-up
But don’t feel bad - adding expect, crash, and return statements is causing a lot of problems
If there's problems from requiring return to not have statements after, then feel free to make the canonicalize code handle it
I presume that's not the main issue, though
No I haven’t figured it out, hit it right before work so I’ll find out tonight or in the morning
oh yeah canonicalize should definitely handle that imo :big_smile:
Updated the syntax for my PR above to include expect, crash, and return
I think import exposing and the rest of the headers should be next
And then we need to talk about static dispatch and record access
I want to treat it similar to binop, but not being a binop
It's <expr><apply args><try suffix><dot access><binop....> in terms of (not real) binding power
Or maybe try suffix goes last...
Do we want to support some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field
?
That code looks like it should be valid
Brendan Hansknecht said:
That code looks like it should be valid
And it is so....
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\Map a b : List(a), (a -> b) -> List(b)
\\
\\Foo : (Bar, Baz)
\\
\\Some a : { foo : Ok(a), bar : Something }
\\
\\Maybe a : [Some(a), None]
\\
\\SomeFunc a : Maybe(a), a -> Maybe(a)
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one : (U64 -> U64)
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ lower -> 1
\\ "foo" -> 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] -> 123
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ { foo: 1, bar: 2, ..rest } -> 12
\\ { foo: 1, bar: 2 | 7 } -> 12
\\ Ok(123) -> 123
\\ Ok(Some(dude)) -> dude
\\ TwoArgs("hello", Some("world")) -> 1000
\\}
\\
\\expect blah == 1
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ expect blah == 1
\\ tag = Blue
\\ return tag
\\ crash "Unreachable!"
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
\\ tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
\\ bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
\\ # NEW!!!
\\ static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
\\
\\expect {
\\ foo = 1
\\ blah = 1
\\ blah == foo
\\}
);
}
I forgot I have unary ops (easy), and record updaters and record builders. What are the current status and agreed-upon syntax for them? The same as in the current alpha?
Yeah I think there's no change to Record Builder
Are we doing something with ..
for record update?
See https://github.com/roc-lang/roc/issues/7091
Ah...ok, so it's like List rest
Or it's parent Issue https://github.com/roc-lang/roc/issues/7106 "Syntax Changes"
Oooo, the ellipsis keyword! Almost forgot about that!
Ok, I'll make sure this all gets done
That'll leave me with Custom Types and the replacement syntax for abilities
Do you have the issue handy for that? Or is there one?
https://github.com/roc-lang/roc/issues/7458
I guess I just look at the static dispatch doc...
And this is the doc for Custom Types...
https://docs.google.com/document/d/10OFeNl9KAYAErajE0Wio4AAR66yM2u13bku0mTUawVk/edit?tab=t.0
If there has been any meaningful changes to this proposal (the thread had no concrete takeaways that were scannable), let me know
I'm wondering if we still need the *
in module [User.*, SomethingElse, …etc]
. Didn't we discuss somewhere that fields in a Custom Record would always be public.
for nominal types, the only one we're going to support is tag unions (not records or tuples after all)
and they do still need the .*
syntax for making the tags optionally public
That should be the only use of *
then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451
Luke Boswell said:
That should be the only use of
*
then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451
PR for this: https://github.com/roc-lang/roc/pull/7642
Anthony Bullard said:
Luke Boswell said:
That should be the only use of
*
then. Because we're going to remove it from types and use a variable instead. https://github.com/roc-lang/roc/issues/7451PR for this: https://github.com/roc-lang/roc/pull/7642
Added ...
expression to the above
Update from the above PR:
test "Syntax grab bag" {
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout
\\
\\Map a b : List(a), (a -> b) -> List(b)
\\
\\Foo : (Bar, Baz)
\\
\\Some a : { foo : Ok(a), bar : Something }
\\
\\Maybe a : [Some(a), None]
\\
\\SomeFunc a : Maybe(a), a -> Maybe(a)
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one : (U64 -> U64)
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg some_func()
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |a| match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ lower -> 1
\\ "foo" -> 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] -> 123
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ { foo: 1, bar: 2, ..rest } -> 12
\\ { foo: 1, bar: 2 | 7 } -> 12
\\ Ok(123) -> 123
\\ Ok(Some(dude)) -> dude
\\ TwoArgs("hello", Some("world")) -> 1000
\\}
\\
\\expect blah == 1
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ expect blah == 1
\\ tag = Blue
\\ return tag
\\ # NEW!!!
\\ ...
\\ match_time(...)
\\ crash "Unreachable!"
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [add_one(number), 456, 789]
\\ record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
\\ tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
\\ bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
\\ static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
\\
\\expect {
\\ foo = 1
\\ blah = 1
\\ blah == foo
\\}
);
}
Ok... so what should be happening with malformed nodes?
For example say I have a roc file that is literally
modZZZ [foo]
foo = 1
I have a malformed header here and so expect there to be an error pushed into the list of Diagnostics and a malformed node added to the AST.
Now if I want to format this AST ... what should happen when I get to this point?
pub fn getHeader(store: *NodeStore, header: HeaderIdx) Header {
const node = store.nodes.get(@enumFromInt(header.id));
switch (node.tag) {
.app_header => { ... },
.module_header => { ... },
else => {
std.debug.panic("Expected a valid header tag, got {s}", .{@tagName(node.tag)});
},
}
}
We call getHeader
and then it blows up because we don't have a malformed node.
Maybe we might actually want to add a malformed node here... so I gave that a try. Heres a PR that does that for just the header https://github.com/roc-lang/roc/pull/7672
Am I on the right track?
Side note -- I think I found a memory leak in the error diagnostics here by accident.
$ zig build snapshot
warning: file /Users/luke/Documents/GitHub/roc/src/snapshots/003.txt: contained 1 errors, skipping
info: processed 2 snapshots in 2 ms.
error(gpa): memory address 0x102928700 leaked:
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1081:62: 0x102873d1f in ensureTotalCapacityPrecise (snapshot)
const new_memory = try allocator.alignedAlloc(T, alignment, new_capacity);
^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1058:51: 0x1028628e3 in ensureTotalCapacity (snapshot)
return self.ensureTotalCapacityPrecise(allocator, better_capacity);
^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:1111:41: 0x102847daf in addOne (snapshot)
try self.ensureTotalCapacity(allocator, newlen);
^
/Users/luke/zig-macos-aarch64-0.13.0/lib/std/array_list.zig:848:49: 0x10281b687 in append (snapshot)
const new_item_ptr = try self.addOne(allocator);
^
/Users/luke/Documents/GitHub/roc/src/check/parse/Parser.zig:121:28: 0x1028462ef in pushMalformed__anon_11980 (snapshot)
self.diagnostics.append(self.gpa, .{
^
/Users/luke/Documents/GitHub/roc/src/check/parse/Parser.zig:943:38: 0x10281a103 in parseExprWithBp (snapshot)
return self.pushMalformed(IR.NodeStore.ExprIdx, .unexpected_token);
^
This is a great question for @Joshua Warner since he designed the malformed node system. Ideally it would just be something aggressive like (MALFORMED <tag>)
Need some alignment on syntax supported in exposes/exposing list(both in headers and in imports)
as
(?).
with either *
or curlies around a comma separated list of lower and upper identsimport SomeModule exposing [
lowerIdent!,
UpperIdent,
something as something_else, # ???
Foo.*,
Bar.{ fn_one, fn_two },
Baz.{ fn_one as function_one, fn_two as function_two }, #???
]
And are there differences worth worrying about between headers and imports?
(At least for parsing)
@Richard Feldman since you are the keeper of the syntax :-)
Those last two don't exist anymore, right?
also .*
should not be allowed in imports
only in the specific case of a module header that's exposing a nominal tag union
You're thinking of glob exposes for custom unions, Anthony
Probably. I think I forgot custom types became a tag union rather than a record
So
import SomeModule exposing [
lowerIdent!,
UpperIdent,
something as something_else,
]
And
module exposing [
lowerIdent!,
UpperIdent,
something as something_else,
Foo.*,
]
I think I’ll parse the same and let Can throw an error on .* in an import
That all looks right to me, except for module exposing [a as b]
, not sure if that's planned for support
Anthony Bullard said:
This is a great question for Joshua Warner since he designed the malformed node system. Ideally it would just be something aggressive like (MALFORMED <tag>)
Yep this is exactly what I was thinking
Sam Mohr said:
That all looks right to me, except for
module exposing [a as b]
, not sure if that's planned for support
If it's not, it should :-)
Renaming on expose? Why?
Feels like that is a sign the function was simply named wrong
It's not uncommon to encounter a situation where more than one import exposes a function/type that has the same name that you may want to use without namespacing
But now that I think of it, maybe that mostly disappears with SD
I think of multiple types having a map function for instance
Ok, so the only different thing available here is the .* syntax in headers
It would make SD more complicated than "the module needs a function named xyz" to allow renaming two things to the same name in the same module
Man that seems like such a silly thing to have to create a new node type for
And besides that, there's no value to doing it there besides at the module name
So I vote actively not adding that feature
My specific comment is rename on exposing
. Rename on import
makes sense to me
Anthony Bullard said:
Man that seems like such a silly thing to have to create a new node type for
copium
Brendan Hansknecht said:
My specific comment is rename on
exposing
. Rename onimport
makes sense to me
I agree with you on the header
Sam Mohr said:
Anthony Bullard said:
Man that seems like such a silly thing to have to create a new node type for
copium
:stuck_out_tongue_closed_eyes:
the benefit is to avoid shadowing
let's say I have a module named Parser
and I want to expose Parser.str
but I also want to use the name str
all over the place in Parser.roc
as
in the module header's exposing
means I can locally name it inner_str
(or whatever) and then expose inner_str as str
so everyone outside the module gets the nice name of Parser.str
but inside Parser.roc
I still get to use the name str
for argument names etc.
I guess
I feel like that is rare
And if you really need it, you can make Parser.roc
that reexports things form InnerParser.roc
No strong feels, just feel unnecessary.
I feel like, reading @Richard Feldman 's message, that I should implement as
in exposing
I think I can parse a crash
at the top level.. but not in a def.
This blows up because it makes that a malformed node
module []
thing =
crash "something"
But this is fine
module []
crash "something"
Actually I think it's the same for expect
and return
too
That shouldn't break, yes, but I think in this case it's looking for braces
Which I think is good, because it keeps statement-only behaviors from working as expressions
Though a proper error message would be nice
I thought we decided those wouldn't have braces -- edit err parens (
?
Interesting... so this parses correctly
module []
foo = { crash "something" }
But the formatter removes the braces {
, and then it blows up
Richard Feldman said:
here is a concrete proposal that I'm happy with. If anyone would rather we didn't do this, please say why!
- Change the way
?? return ""
formats to have the formatter add braces so it becomes?? { return "" }
and same withcrash
. No changes to the semantics of anything involved in this; it's purely to address the concern over it being not visually obvious enough what the code does.- Introduce
if Ok(path) = paths.first() {
pattern matching. We explored a bunch of alternatives and this still feels like the best solution to the problem at the top of the thread.that's it, no other changes. Both of these are addressing specific ergonomics concerns with the status quo, and are not trying to go back to the drawing board and reconsider everything.
Okay, I was changing remembered history in bias of my preferences
Classic
Luke Boswell said:
Interesting... so this parses correctly
module [] foo = { crash "something" }
But the formatter removes the braces
{
, and then it blows up
This is my fault. Kind of. Crash is a statement but not an expression right now, and statements are only parsed in the top level and inside of blocks
But the formatter removes the curlies form single statement blocks - assuming they are just an expression
The only way to resolve this in the parser would be to lookup what kind of statement is in the single statement blocks to ensure it is a Expr
But I can’t push anything today due to a day long outage by Comcast in near-west Chicago
It's alg. I'm just noting things as I find them.
I'm often not sure if its me, the parser impl or what the intended syntax is.
It's not urgent or anything.
I've just been looping with the fuzzer and fixing bugs one by one in https://github.com/roc-lang/roc/pull/7672
I'm not 100% my approach is right so I'm just keeping it in draft until I get some feedback or we have a clear direction for handling errors.
I'm just picking the first fuzz crash, fixing that and making a snapshot test for it, then moving on to the next.
I'm expecting there will be changes required and happy to do that later. But I figure it doesn't hurt to just keep going for now.
I just noticed we have two different Region
types. One has indexes into the token buffer, so token indexes. The other has indexes into the source code bytes.
Should we rename one of these, e.g. TRegion
or something to reflect it's spanning the tokens instead?
The other idea I have, is to move the Region that indexes tokens, under Token.. so it's a Token.Region
instead
It would be nice to unify them, yes. I think the other suggestion from @Joshua Warner was to make Region
be a Node.Idx
. That would allow us to use half the memory (we only need one u32 value instead of a start and end that are both u32), though some diagnostics might need to store more Region
s when they refer to complex regions.
At the very least, I agree that renaming one of the two Region
s in our codebase would mitigate some confusion.
I think the other suggestion from @Joshua Warner was to make
Region
be aNode.Idx
.
Does this work in general? Won't a region in the parser be a range of tokens rather than a single token?
Like the region of an expression will include many many tokens
Yes, it would hold many tokens
The tradeoff is that we can store everything we need a Region
to do, which is display/highlight code for diagnostics, in 32 bits
But at the cost of needing the parse AST to figure out what a Region
refers to
If there is a tradeoff here, I lean towards keeping it brain dead simple for now... until we have a working compiler
Agreed
Though they both seem simple to me
Option 1: two Region
types, Region = { start: u32, len: u32 }
and refers to a slice of the source code, TokenRegion
is separate for the tokenizer
Okay, looking now at the Parse.IR.Region
type, they can definitely be the same thing
/// The first and last token consumed by a Node
pub const Region = struct {
start: TokenIdx,
end: TokenIdx,
};
I think we should either use Region = Parse.IR.Region
or Region = Node.Idx
Both of them assume that tokenizing/parsing has happened, but unless we either:
Even the Region = { start: u32, len: u32 }
solution is fragile to source-code changes
If we get diagnostics, we could rip out the relevant text before closing the file Nvm, late stages might have diagnostics...was just thinking about tokenize and parse diagnostics
What about diagnostics (which I'm using interchangeably with errors/warnings) for type errors?
yep
I think we just reopen the file and if it changed, oh well
Yeah, not worth making our compiler more memory-bloated to handle this
Also, keeping original source ranges enables use to just drop the tokenized buffer. So I lean towards that.
The plan was to just tokenize/parse a second time
If we do that, then we can use a single u32
The start offset to tokenize/parse from?
The Parse.IR.Store.Node.Idx
Oh, and just assume that reparsing will generate the exact same node. Then from the node get the start and end token and from those get the region start and end bytes
Yep!
I don't believe that would be much harder than the current approach, and neither solution is implemented yet since error rendering is entirely untouched
Just trying to think about the implications of llvm debug info generation (and interpreter single stepping and printing surrounding context).
The interpreter would probably want to keep a parse AST for every file, or even the source of the file at time of interpretation
For llvm debug info means that we will reparse every single file (but this reduce max memory requirement of the compiler and llvm is really just for optimized builds so reparsing should be really cheap).
I was suggesting to @Joshua Warner that we used to parse just the header of files when looking for package dependencies, and he said "I'm hoping parsing is so fast that we don't need to worry about partial parsing"
Sam Mohr said:
The interpreter would probably want to keep a parse AST for every file, or even the source of the file at time of interpretation
Yeah, quite possibly. Though don't want to ruin memory use due to a single gigantic file. So might want to reload only if needed. Cause execution flow doesn't need any line info, only repl and debug flows.
I think parsing can be super fast
yeah, all sounds reasonable
I'll be really curious to bench parsing against the rust compiler
Benchmarking of just parsing and also eventually typechecking as well would be awesome to see side-by-side!
Yeah, full roc check
flow once it is working
since error rendering is entirely untouched
That's not entirely true anymore
https://github.com/roc-lang/roc/blob/13c5152cb736d7d46a599fe4624e4ddd58d8a1a5/src/problem.zig#L82
Good to know!
Holy cow
Don't have enough time right now to read all of this, but I have a new PR that gets us down the path towards multiline and commented code formatting: https://github.com/roc-lang/roc/pull/7695
Here's a preview of the supported syntax:
try moduleFmtsSame(
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\... # Elided for clarity and terseness - I love that this is parsed as valid code :-)
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| {
\\ world = "World"
\\ number = 123
\\ expect blah == 1
\\ tag = Blue
\\ return tag
\\ ...
\\ match_time(...)
\\ crash "Unreachable!"
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [
\\ add_one(number), # Comment one
\\ 456, # Comment two
\\ 789, # Comment three
\\ ]
\\ record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
\\ tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
\\ multiline_tuple = (
\\ 123,
\\ "World",
\\ tag1,
\\ Ok(world), # This one has a comment
\\ (nested, tuple),
\\ [1, 2, 3],
\\ )
\\ bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
\\ static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
\\ Stdout.line!(interpolated)?
\\ Stdout.line!("How about ${Num.toStr(number)} as a string?")
\\}
\\
\\expect {
\\ foo = 1 # This should work too
\\ blah = 1
\\ blah == foo
\\}
);
From here it'll be pretty mechanical
I need to add a function to get comments at the start of a multiline construct, and do some massaging of the comment to collapse excessive newlines
And of course adopt my new formatCollection
function in about 12 places, adjusting some bad Region calculations along the way
The hardest part will definitely be the headers more than likely
And at the end I'll try to incorporate "trailing comma multiline forcing"
Sam Mohr said:
Benchmarking of just parsing and also eventually typechecking as well would be awesome to see side-by-side!
I'm also interested in this as well (obviously), but it won't be quite a fair fight since the grammar is in many ways much easier to parse now without WSS
And also, we are tokenizing before parsing now
I don't think it needs to be fair. Part of the rewrite of the compiler was improving the grammar
It is just important to remember that it is a multifaceted comparison, not just rust vs zig or DOD vs not or etc.
Cause at the end of the day, users will care about the compiler perf and won't particularly care about any of those other details.
These are only super preliminary numbers... I'm really hoping I didn't mess anything up. I don't think I did, but the numbers feel too good to be true:
M1 Mac
X86 Linux
Note, the old compiler does have a big disadvantage that it is way larger and loads slower. That said, this is parsing and formatting 100 files with ~1000 lines (24kb each), so I don't think load time should be a big factor.
more stats via poop
Aside, despite using 99% less memory to parse a file, I am still stunned how much memory we use. In an allocation tracking profiler, I see ~10x memory usage compared to source file size.... Actually, that makes a lot of sense. All tokens take up 12bytes. Which is ~10x more than the single byte a token takes up in the original source file.
I guess this is way we may not want to store the tag or length of tokens. Saves a ton of memory and retokenizing is theoretically super fast.
Also cc @Andrew Kelley cause I know you were interesting in what perf numbers we get with the new compiler. As said, very preliminary numbers, but the parse to format loop looks to be 5 to 10x faster with 500x less memory usage.
Ah, I think I found some of the discrepancy. The old compiler has extra checks to ensure that formatting is stable and not bugged. I need to disable those and see what cost savings it gets.
Ok, disabled all the extra reformatting and validation logic. So both codebases are just doing their parse to format loop. So should be more apples to apples now. We are still 3 to 5x faster with the new compiler (and 200x less memory).
M1 Mac
X86 Linux
And detailed stats with poop still look great (this is with hot cache unlike the commands above).
stats
And we haven't even looked at SIMD yet!
Also, about half of execution time is spent in tokenizing, quarter in parsing, and quarter in formatting.
wowwww, this is great!
I'm personally excited that half is tokenizing - means the simd style might actually have a benefit beyond being fun to try implementing :grinning_face_with_smiling_eyes:
RIIZ :high_voltage:
are we missing any parsing features that might make it apples-to-oranges?
(aside from deprecated things not needing to be parsed of course)
or is parsing feature-complete at this point?
We definitely aren't at parity in terms of edge cases, error handling, and comments, but the syntax grab bag looked featureful enough that it seemed worth testing:
https://github.com/roc-lang/roc/blob/327647a6161b96d06f6524ee393ab675c2fc1335/src/fmt.zig#L938
Extra note, with a tiny bit of allocation tuning (initCapacity instead of starting empty), we can get another ~9% faster. Not allocating and copying during tokenization by allocating a large array can make tokenization 1.5x faster.
That said, this is with overallocating to ensure space for all tokens...might not be worth it memory wise. Really depends on average number of characters per token as to what the default should be. And that depends on how many comments someone has among other things like variable name length.
I didn't realize this thing I'm writing is THAT fast
I haven't even optimized for performance besides the memory-locality stuff
I would love to see us do SIMD tokenization
Let's hope we can get similar speed-ups in the more compute heavy parts of the compiler!
You can really see the advantages of the SoA architecture in the HUGE delta in cache misses. That's three orders of magnitude fewer!
Yeah, simply using less memory and allocating less does a metric ton.
The old compiler does ~30x more allocations (presumably of many tiny IR nodes). I'm actually a bit surprised it is so bad in the old compiler. I thought we put all that stuff in arenas, but apparently tons of stuff is not in the arenas. Then as you mention, DOD + SOA leads to way less memory usage in general which equates to way more cache hits.
And yeah, we have room for many optimizations.
In the old compiler, Defs has a bunch of Vecs that are not in the arenas
In fact, I'm pretty confident that was a memory leak
The new parser currently expects this syntax:
Map a b : List(a), (a -> b) -> List(b)
... but with PNC I would expect that to be:
Map(a, b) : List(a), (a -> b) -> List(b)
... no?
Brendan Hansknecht said:
Also cc Andrew Kelley cause I know you were interesting in what perf numbers we get with the new compiler. As said, very preliminary numbers, but the parse to format loop looks to be 5 to 10x faster with 500x less memory usage.
very cool, thanks for sharing!
FWIW, part of the difference there is we're actually parsing a different (simpler) language now
I like to think that there's a relationship between syntax being simpler for the computer and also being simpler for the human to parse
Joshua Warner said:
FWIW, part of the difference there is we're actually parsing a different (simpler) language now
Yeah, it is a whole mix of things. I would guess the biggest gain is in terms of memory usage and allocations. The sheer amount of data being generated and processed by the old compiler almost certainly is the bottleneck. Even if we changed the grammar, I would expect the allocations, data movement, and cache misses to dominate.
After #7695 lands, this shows all of the syntax that will be parsed and formatted correctly (in this exact style):
try moduleFmtsSame(
\\# This is a module comment!
\\app [main!] { pf: platform "../basic-cli/platform.roc" }
\\
\\import pf.Stdout exposing [line!, write!]
\\
\\import # Comment after import keyword
\\ pf # Comment after qualifier
\\ .StdoutMultiline # Comment after ident
\\ exposing [ # Comment after exposing open
\\ line!, # Comment after exposed item
\\ write!, # Another after exposed item
\\ ] # Comment after exposing close
\\
\\import pkg.Something exposing [func as function, Type as ValueCategory, Custom.*]
\\
\\import BadName as GoodName
\\import
\\ BadNameMultiline
\\ as
\\ GoodNameMultiline
\\
\\Map a b : List(a), (a -> b) -> List(b)
\\MapML # Comment here
\\ a # And here
\\ b # And after the last arg
\\ : # And after the colon
\\ List( # Inside Tag args
\\ a, # After tag arg
\\ ),
\\ (a -> b) -> # After arrow
\\ List( # Inside tag args
\\ b,
\\ ) # And after the type decl
\\
\\Foo : (Bar, Baz)
\\
\\FooMultiline : ( # Comment after pattern tuple open
\\ Bar, # Comment after pattern tuple item
\\ Baz, # Another after pattern tuple item
\\) # Comment after pattern tuple close
\\
\\Some a : { foo : Ok(a), bar : Something }
\\SomeMl a : { # After record open
\\ foo : Ok(a), # After field
\\ bar : Something, # After last field
\\}
\\
\\SomeMultiline a : { # Comment after pattern record open
\\ foo # After field name
\\ : # Before field anno
\\ Ok(a), # Comment after pattern record field
\\ bar : Something, # Another after pattern record field
\\} # Comment after pattern record close
\\
\\Maybe a : [Some(a), None]
\\
\\MaybeMultiline a : [ # Comment after tag union open
\\ Some(a), # Comment after tag union member
\\ None, # Another after tag union member
\\] # Comment after tag union close
\\
\\SomeFunc a : Maybe(a), a -> Maybe(a)
\\
\\add_one_oneline = |num| if num 2 else 5
\\
\\add_one : (U64 -> U64)
\\add_one = |num| {
\\ other = 1
\\ if num {
\\ dbg # After debug
\\ some_func() # After debug expr
\\ 0
\\ } else {
\\ dbg 123
\\ other
\\ }
\\}
\\
\\match_time = |
\\ a, # After arg
\\ b,
\\| # After args
\\ match a {
\\ Blue | Green | Red -> {
\\ x = 12
\\ x
\\ }
\\ Blue # After pattern in alt
\\ | # Before pattern in alt
\\ Green
\\ | Red # After alt pattern
\\ -> {
\\ x = 12
\\ x
\\ }
\\ lower # After pattern comment
\\ -> 1
\\ "foo" -> # After arrow comment
\\ 100
\\ "foo" | "bar" -> 200
\\ [1, 2, 3, .. as rest] # After pattern comment
\\ -> # After arrow comment
\\ 123 # After branch comment
\\
\\ # Just a random comment
\\
\\ [1, 2 | 5, 3, .. as rest] -> 123
\\ [
\\ 1,
\\ 2 | 5,
\\ 3,
\\ .. # After DoubleDot
\\ as # Before alias
\\ rest, # After last pattern in list
\\ ] -> 123
\\ 3.14 -> 314
\\ 3.14 | 6.28 -> 314
\\ (1, 2, 3) -> 123
\\ (1, 2 | 5, 3) -> 123
\\ { foo: 1, bar: 2, ..rest } -> 12
\\ { # After pattern record open
\\ foo # After pattern record field name
\\ : # Before pattern record field value
\\ 1, # After pattern record field
\\ bar: 2,
\\ .. # After spread operator
\\ rest, # After last field
\\ } -> 12
\\ { foo: 1, bar: 2 | 7 } -> 12
\\ {
\\ foo: 1,
\\ bar: 2 | 7, # After last record field
\\ } -> 12
\\ Ok(123) -> 123
\\ Ok(Some(dude)) -> dude
\\ TwoArgs("hello", Some("world")) -> 1000
\\ }
\\
\\expect # Comment after expect keyword
\\ blah == 1 # Comment after expect statement
\\
\\main! : List(String) -> Result({}, _)
\\main! = |_| { # Yeah I can leave a comment here
\\ world = "World"
\\ number = 123
\\ expect blah == 1
\\ tag = Blue
\\ return # Comment after return keyword
\\ tag # Comment after return statement
\\
\\ # Just a random comment!
\\
\\ ...
\\ match_time(
\\ ..., # Single args with comment
\\ )
\\ some_func(
\\ dbg # After debug
\\ 42, # After debug expr
\\ )
\\ crash # Comment after crash keyword
\\ "Unreachable!" # Comment after crash statement
\\ tag_with_payload = Ok(number)
\\ interpolated = "Hello, ${world}"
\\ list = [
\\ add_one(
\\ dbg # After dbg in list
\\ number, # after dbg expr as arg
\\ ), # Comment one
\\ 456, # Comment two
\\ 789, # Comment three
\\ ]
\\ record = { foo: 123, bar: "Hello", baz: tag, qux: Ok(world), punned }
\\ tuple = (123, "World", tag, Ok(world), (nested, tuple), [1, 2, 3])
\\ multiline_tuple = (
\\ 123,
\\ "World",
\\ tag1,
\\ Ok(world), # This one has a comment
\\ (nested, tuple),
\\ [1, 2, 3],
\\ )
\\ bin_op_result = Err(foo) ?? 12 > 5 * 5 or 13 + 2 < 5 and 10 - 1 >= 16 or 12 <= 3 / 5
\\ static_dispatch_style = some_fn(arg1)?.static_dispatch_method()?.next_static_dispatch_method()?.record_field?
\\ Stdout.line!(interpolated)?
\\ Stdout.line!(
\\ "How about ${ # Comment after string interpolation open
\\ Num.toStr(number) # Comment after string interpolation expr
\\ } as a string?",
\\ )
\\} # Comment after top-level decl
\\
\\expect {
\\ foo = 1 # This should work too
\\ blah = 1
\\ blah == foo
\\}
);
One of the next things that need to be done is to start transcribing the builtins to the v0.1 syntax. Where should they live? I'll start doing that and finishing the parser by parsing hosted
, platform
, and package
headers
Preferably, we would make the old compiler able to migrate them to the new syntax
That would ensure folks have an easy way to migrate when the zig compiler comes out
It also makes it easier to benchmark against the old compiler if they both support the exact same syntax
I’be been working on a syntax migration tool in the old compiler, and it’s getting reasonably close. I’d say we should just use that tool to auto-translate the builtins and anything else we want to migrate, with a focus on improving the tool and rerunning it rather than doing hand fix ups to the migrated code
100% agree
9 messages were moved from this topic to #ideas > Needed Function signature and lambda expr change by Anton.
Reading through the Static Dispatch document for the 27th time, am I right to understand that where
are only a part of a function's type?
The where is saying that a function uses variables that have constraints
This is just a parsing question, actually
You can also use them in aliases
Yes, but it would only appear as part of the type annotation of a function type?
We haven't ironed out the links there
Oh yeah, and in aliases
I think I'll just be implementing them as part of function types for now
But if anyone (ahem @Richard Feldman ) has a nice summary of the overall plan here (as well as changes in syntax), please let me know
We want to be able to say sort_list : List(a) -> List(a) where Sortable(a)
Though a.Sortable
was also thrown around
So starting with just functions is fine for now
Yeah, I think the big issue is the current syntax - outside of aliases - is likely going to cause some parsing issues
Dict.insert : Dict k v, k, v -> Dict k v
where k.Hash, k.Eq
Hash a : a
where
a.hash(hasher, a -> hasher),
hasher.Hasher,
I'm actually wondering if I should attach these instead to annotations and type definitions
Because this being used inside of an aggregate type would be .... complicated
Just thinking out loud here...
so today we have
Hasher implements
add_bytes : a, List U8 -> a where a implements Hasher
add_u8 : a, U8 -> a where a implements Hasher
add_u16 : a, U16 -> a where a implements Hasher
add_u32 : a, U32 -> a where a implements Hasher
add_u64 : a, U64 -> a where a implements Hasher
add_u128 : a, U128 -> a where a implements Hasher
complete : a -> U64 where a implements Hasher
and I think we would have with v0.1
Hasher a: a where
a.add_bytes(List U8) -> a ,
a.add_u8(U8) -> a,
a.add_u16(U16) -> a,
a.add_u32(U32) -> a,
a.add_u64(U64) -> a,
a.add_u128(U128) -> a,
a.complete() -> U64
Does that look correct?
And that allows for us to then have something like:
# In `Dict.roc`
insert : Dict k v, k, v -> Dict k v
where
k.Hash,
k.Eq,
# In `Str.roc`
hash : Str, Hash.Hasher -> Hash.Hasher
hash = {
...# Implementation
}
# In `Hash.roc`
Hash a : a
where
a.hash(hasher) -> hasher,
hasher.Hasher,
Hasher a : a
where
a.add_bytes(List U8) -> a ,
a.add_u8(U8) -> a,
a.add_u16(U16) -> a,
a.add_u32(U32) -> a,
a.add_u64(U64) -> a,
a.add_u128(U128) -> a,
a.complete() -> U64
And then a Dict(Str, Str)
is valid (assuming similar is implemented for Eq)
So each clause in a where is one of:
Anthony Bullard said:
Reading through the Static Dispatch document for the 27th time, am I right to understand that
where
are only a part of a function's type?
Yep
Anthony Bullard said:
Just thinking out loud here...
so today we have
Hasher implements add_bytes : a, List U8 -> a where a implements Hasher add_u8 : a, U8 -> a where a implements Hasher add_u16 : a, U16 -> a where a implements Hasher add_u32 : a, U32 -> a where a implements Hasher add_u64 : a, U64 -> a where a implements Hasher add_u128 : a, U128 -> a where a implements Hasher complete : a -> U64 where a implements Hasher
and I think we would have with v0.1
Hasher a: a where a.add_bytes(List U8) -> a , a.add_u8(U8) -> a, a.add_u16(U16) -> a, a.add_u32(U32) -> a, a.add_u64(U64) -> a, a.add_u128(U128) -> a, a.complete() -> U64
Does that look correct?
Yep
Anthony Bullard said:
Dict.insert : Dict k v, k, v -> Dict k v where k.Hash, k.Eq Hash a : a where a.hash(hasher, a -> hasher), hasher.Hasher,
I'm actually wondering if I should attach these instead to annotations and type definitions
Hash looks to be defined wrong. I think it would be a.hash(hasher) -> hasher
. That or it needs to be defined differently to be module based instead of standard static dispatch.
The other way to define it would be module(a).hash(hasher, a) -> hasher
What does the module(a) buy us over just a?
Is that for when the argument of type a
is not the first arg (in the "receiver" position)?
Yes
Need for decode
Cause decode is List(U8) -> a
Probably will a decoding format as well or something like that
Decode a : a where a.decode(List(U8)) -> a
Encode a : a where a.encode(a) -> List(U8)
How would that definition of decode work? You won't have a
if you are trying to decode an a
.
I think it has to be
Decode a : a where module(a).decode(List(U8)) -> a
Sorry
I was just typing out syntax
That wasn’t meant to be real (or actually submitted)
The PR for parsing and formatting is here: https://github.com/roc-lang/roc/pull/7745
My absolute rough draft, first take re-implementation of Hash.roc from the builtins based on all of the syntax for v0.1 that is available today:
module [
Hash,
Hasher,
hash_list,
hash_unordered,
]
import Num exposing [
U8,
U16,
U32,
U64,
U128,
]
## A value that can be hashed.
Hash a : a
where
## Hashes a value into a [Hasher].
## Note that [hash] does not produce a hash value itself; the hasher must be
## [complete]d in order to extract the hash value.
a.hash(hasher) -> hasher,
hasher.Hasher,
## Describes a hashing algorithm that is fed bytes and produces an integer hash.
##
## The [Hasher] ability describes general-purpose hashers. It only allows
## emission of 64-bit unsigned integer hashes. It is not suitable for
## cryptographically-secure hashing.
Hasher a : a
where
## Adds a list of bytes to the hasher.
a.add_bytes(List(U8)) -> a,
## Adds a single U8 to the hasher.
a.add_u8(U8) -> a,
## Adds a single U16 to the hasher.
a.add_u16(U16) -> a,
## Adds a single U32 to the hasher.
a.add_u32(U32) -> a,
## Adds a single U64 to the hasher.
a.add_u64(U64) -> a,
## Adds a single U128 to the hasher.
a.add_u128(U128) -> a,
## Completes the hasher, extracting a hash value from its
## accumulated hash state.
a.complete() -> U64,
## Adds a list of [Hash]able elements to a [Hasher] by hashing each element.
hash_list : hasher, List a -> hasher
where
a.Hash,
hasher.Hasher,
hash_list = |hasher, lst|
lst.walk(
hasher,
|accum_hasher, elem|
elem.hash(accum_hasher),
)
HashFunction a : hasher, a -> hasher where a.Hash, hasher.Hasher
HashWalker container elem -> hasher, container, HashFunction(elem) -> hasher
where
elem.Hash,
hasher.Hasher,
## Adds a container of [Hash]able elements to a [Hasher] by hashing each element.
## The container is iterated using the walk method passed in.
## The order of the elements does not affect the final hash.
hash_unordered : hasher, container, HashWalker(container, elem) -> hasher
where
elem.Hash,
hasher.Hasher,
hash_unordered = |hasher, container, walk| {
acc = walk(
container,
0,
|accum, elem| {
x =
# Note, we intentionally copy the hasher in every iteration.
# Having the same base state is required for unordered hashing.
elem
.hash(hasher)
.complete()
next_accum = accum.add_wrap(x)
if next_accum < accum {
# we don't want to lose a bit of entropy on overflow, so add it back in.
next_accum.add_wrap(1)
} else {
next_accum
}
},
)
hasher.add_u64(acc)
}
If that looks correct to everyone, I'll add a snapshot for it
I assume a.add_u8(U8) -> hasher,
should be a.add_u8(U8) -> a,
same for other functions in Hasher
Also, is it now Hash(a)
and Hasher(a)
instead of Hash a
and Hasher a
(hasher, b, (hasher, a -> hasher) -> hasher)
could use a anlias to be less confusing. That or at least change the name b
to container
and a
to elem
.
Yeah, looks roughly correct.
Also, I wonder if we need a better way to apply aliases. hasher.Hasher,
doesn't look good.
Like it doesn't feel like it is saying that hasher
implements the Hasher
ability. I wonder if it would be clearer to just do Hasher(hasher)
or something similar....not sure
Hey @Richard Feldman, do we have any plans on fixing byte hashing for static dispatch? As in, a List U8
will by default hash via hash_list
instead add_bytes
. As such, it leaves a lot of performance on the table by default.
Really in a perfect world, all number types would hash via add_bytes
and only complex types would use hash_list
that walks an element at a time.
hm, I'm not familiar with the distinction :sweat_smile:
I assume hash_list
is hashing each byte individually, which of course is suboptimal
but what does add_bytes
do?
Yeah, hash_list
is the one element at a time thing
And add_bytes
leaves it up to the hasher and thus can run way faster algorithms
The issue is that we cannot specialize hash on List U8
only on List a
. So we get stuck hashing List U8
with hash_list
unless a user manually overrides it in a custom type hash implementation.
As an aside, theoretically for most structural types the best hashing would be to hash like add_bytes
but to mask out the padding, but that is another level of complexity to even orchestrate.
Brendan Hansknecht said:
I assume
a.add_u8(U8) -> hasher,
should bea.add_u8(U8) -> a,
same for other functions inHasher
Also, is it now
Hash(a)
andHasher(a)
instead ofHash a
andHasher a
(hasher, b, (hasher, a -> hasher) -> hasher)
could use a anlias to be less confusing. That or at least change the nameb
tocontainer
anda
toelem
.
Yeah, looks roughly correct.
I fixed the couple of things that were obviously wrong, and made some type aliases to make list_unordered
's signature readable.
Currently a Type Header still uses space separated args. Did this change @Richard Feldman ?
This is why it would be very nice if there were a Grammar specification - outside of the implementation in the parser - that we could use to track the parser's compliance (and therefore completion)
As good as Zulip is, trying to find the answer to every such question is very difficult
In
we decided to use parens for type arguments in the annotation itself, but no real direction of type alias headersgood question! I think we should match what patterns do
so for example:
Alias(a, b) : ...
Brendan Hansknecht said:
The issue is that we cannot specialize hash on
List U8
only onList a
. So we get stuck hashingList U8
withhash_list
unless a user manually overrides it in a custom type hash implementation.
ah, yeah static dispatch doesn't actually help with this; the List
type is still in the List
module.
Good point! The hash_list
function should be in the list module
And ok on the change to type alias headers
I can do that real quick before work
Brendan Hansknecht said:
As an aside, theoretically for most structural types the best hashing would be to hash like
add_bytes
but to mask out the padding, but that is another level of complexity to even orchestrate.
:thinking: we could make it so that all the non-pointers in structural types get hashed using add_bytes
. They should all end up right next to each other anyway because of alignment, and we can't do better than that because we need to chase the pointers anyway (and not include the addresses in the hash)
And then we need:
[-2, 0, 2].map(.abs().sub(1))
. Tearoffs?var
statementsfor
statementsAnd then v0.1 parsing will be complete
Obviously the formatter will need quite a bit of iteration to dial in the style over time
@Richard Feldman have we ever finalized the syntax and semantics for a) Tearoffs and b) local function application in a static dispatch chain?
a) let's hold off on that for now. I'd like to see how often people actually want to reach for it in practice once we have static dispatch. Might turn out to not be worth doing.
b) yeah, we settled on a->b
in https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/static.20dispatch.20-.20pass_to.20alternative
more specifically:
a->b(1, 2)->c(3, 4)
desugars to c(b(a, 1, 2), 3, 4)
()
from these when there are no extra args to pass, so for example you can do ->Ok
instead of ->Ok()
, and 4->hours->ago!
instead of 4->hours()->ago!()
()
s that are used after ->
Just to clarify about b - We decided on ->
for local dispatch / "Arrow calls", but there was some suggestion (not really touched upon) that match
clauses would drop ->
. Is that a thing?
Well, this change was a tad more difficult than I expected - it creates some contention with a plain applied Tag expression. Have to make statements differentiate between top-level and inside of a block.
So I'll try to finish it tomorrow morning :-)
Now off to writing AI Agents...
Anthony Bullard said:
Just to clarify about b - We decided on
->
for local dispatch / "Arrow calls", but there was some suggestion (not really touched upon) thatmatch
clauses would drop->
. Is that a thing?
we decided to use =>
with match
for now
Anthony Bullard said:
Now off to writing AI Agents...
Cool, what are you working on specifically?
Richard Feldman said:
Brendan Hansknecht said:
As an aside, theoretically for most structural types the best hashing would be to hash like
add_bytes
but to mask out the padding, but that is another level of complexity to even orchestrate.:thinking: we could make it so that all the non-pointers in structural types get hashed using
add_bytes
. They should all end up right next to each other anyway because of alignment, and we can't do better than that because we need to chase the pointers anyway (and not include the addresses in the hash)
Ah yeah, structural types without pointers can do what I said when in a list. If you have pointers, you need to do things an element at a time
Anton said:
Anthony Bullard said:
Now off to writing AI Agents...
Cool, what are you working on specifically?
Wish I could be specific, but let’s just exploration around UI Generation when the UI is not created from source code
Type alias header using parens for args: https://github.com/roc-lang/roc/pull/7749
My next steps:
match
clauses to using fat arrow =>
var
statementsfor
constructI'll probably do 1 & 2 in a single PR, hopefully by EOW.
Then 3 & 4 should be relatively straightforward so I'm hoping to have everything done before the next contributor meet up.
Move match clauses to fat arrow: https://github.com/roc-lang/roc/pull/7751
Parse and format "Local dispatch arrow calls": https://github.com/roc-lang/roc/pull/7780
Parse and format var
and for
statements: https://github.com/roc-lang/roc/pull/7783
Unless something has been missed, the above should functionally complete the parser, parse IR, and formatter for the v0.1 syntax
yoooooooo :heart_eyes::heart_eyes::heart_eyes:
now to find something difficult to work on :thinking:
(and i want to make it very clear that the formatter style is still a WIP but it works, is stable, and loses no information)
Brendan Hansknecht said:
Ok, disabled all the extra reformatting and validation logic. So both codebases are just doing their parse to format loop. So should be more apples to apples now. We are still 3 to 5x faster with the new compiler (and 200x less memory).
M1 Mac
X86 Linux
I'm going on the Changelog podcast tomorrow, and it occurred to me that it would be cool to be able to talk about redoing these benchmarks now that we have a feature-complete parser and formatter!
would be sweet to be able to have not only a comparison with the old compiler, but also something like "we can now parse the equivalent of _____ lines of code per second"
I won't have time between now and then, but if anyone has time to run those benchmarks, would be great to have them! would be awesome to inspire some people to get involved if they're interested in helping out with a performance-focused compiler :smiley:
i would love to see those numbers as well! i don't have a lot of time at the moment but i'd love to brag about how much better we've made things
if the existing benchmarks hold up it's 3.9M LOC/s
oh and that's parse and formatting across 100 files
That portends wonderful things for the LS as well
Anthony Bullard said:
if the existing benchmarks hold up it's 3.9M LOC/s
is that for parse and format?
or just parse
It says parse and format loop
which means that includes at least a couple hundred sys calls for reading and writing files
correct me if that's wrong @Brendan Hansknecht
Yeah, parse and format. If I recall correctly, only about 20% is formatting....
I probably could pull more updated numbers. Biggest caveat being I don't have real meaningful source code in both old and new parser syntax. Last time I was using a modified syntax grab bag which is not that much like real world code.
Could Claude generate some realistic source code?
I know we wanted to make the old parser auto update, but feature like abilities make the standard library fail to port over.
is the rust compiler v0.1 migrator finished?
@Joshua Warner
I think it is mostly there but has edge cases and clearly can't handle things like abilities.
we could try to just find a number of source files in different projects that succeed
@Richard Feldman when do you need the numbers by?
Richard Feldman said:
I'm going on the Changelog podcast tomorrow...
I think we can give the existing numbers as showing at least the order of magnitude
and caveat it
@Brendan Hansknecht noon Pacific is when the recording starts
Just testing the formatter now to see if I can get more meaningful code migrated (like some of the builtins).
Definitely a few pieces missing for migration:
I still will try to manually convert some things and see how it goes.
hmm...something also definitely goes wrong with closures and braces.
Also @Anthony Bullard, found a parser bug. This definitely should parse:
expect {
foo : U32
foo = 1 # This should work too
blah = 1
blah == foo
}
Specifically the foo : U32
line breaks parsing.
surprising that it doesn't
i thought i had a similar test
is it the U32?
I don't think so. I hit it with List(a)
first.
Some reason it seems to expect the expect
to end right after that line.
ah it must think it's a record literal
that shouldn't happen. i check for this and bail out of a record and parse it as a block
I can take a look in the morning but that's the best i can do
Yeah, I don't need it fixed, just noting it
ok sorry about that
i'll add the above as a snapshot and makes sure it parses
One other question for you, does the new zig compiler attempt to parse thing like List a
and convert to List(a)
?
I don't think we should try to do that honestly
I think the amount of complexity wouldn't be worth it, considering LLMs can now do that sort of migration well enough :smile:
and I remember a conversation where we discussed that if we knew the new parser didn't need to support whitespace application at all, it would significantly simplify several areas
yeah no white space application at all
love the idea of parsing the stdlib as a point of comparison btw! :smiley:
Yeah, just was curious cause in the half migrated list.roc file I have it has some cases where it doesn't seem to care about List a
but others where it does care....no idea why.
love the idea of parsing the stdlib as a point of comparison btw!
Yeah, just not sure if I can port the stdlib in time to get meaningful numbers. Theoretically an llm might be able to help, but they definitely don't know roc currently.
Doesn't have to be a fully functional port, but at least needs to parse and format.
ah yeah I guess we don't have an equivalent of https://www.roc-lang.org/builtins/llms.txt for the new syntax
could try giving it some basic examples but maybe not enough
or like tests in the source code maybe?
Oh, one other parser bug found. The new parser doesn't understand doc comments ## ...
that's super easy to add. i thought the tokenizer handled that
we definitely don't have a concept of a doc comment in the parser because the parser doesn't know about comments at all
Ah yeah, fair
either way should be very straight forward
there's a function called chompTrivia that handles it i believe
in the tokenizer
Also, to note, I think it is may just be the formatting that needs to be fixed. Outputs # # ...
we may need to start keeping doc comments (although not regular ones) around for parsing so we can parse their markdown but yeah
Hmm, looks like we are missing record destructuring in the new syntax:
{ start, end, step } = rec
expr_unexpected_token, at token OpAssign at ...
And maybe missing guards on match statements.
[x, .. as rest] if x == delimiter =>
expr_unexpected_token, at token OpFatArrow at ...
Ok, so I ported over List.roc. I think it is a reasonably representative file for well documented roc code. It is about ~800 LOC and ~800 lines of comments/blank/bracket only.
The new version is about 100 lines of brackets more than the old version. That said, total byte count is pretty similar. I padded the old roc file with an extra comment to make them the exact same number of bytes.
All numbers on M1 mac
Formatting 1000 files that are all List.roc
Formatting List.roc single file with 1000x the body size
Also, definitely room for more gains like making chomptrivia faster.
@Richard Feldman here are some rough numbers just based on List.roc
.
Clearly we leak memory with every new file in the old rust compiler.
Also, big numbers that matter:
~3 to 4x faster than rust compiler (parse and format combined)
~2 to 3 millions lines of code parsed per second
~5 to 6 millions lines per second when including comments, bracket only, and blank lines
making chomptrivia faster
Nice, sounds like a fun project :racecar:
Yeah, chomp trivia and string interning are the two things that stick out most.
The plan is to do some SIMD stuff there? It sounds like it might be a really contained side project. Would it be too early to explore that now?
Someone could definitely do this now if they are interested.
I had an initial stab at it... https://github.com/roc-lang/roc/pull/7815
@Brendan Hansknecht can you share your List.roc
file so I can test this.
I don't know enough about SIMD to really review your implementation @Luke Boswell but it looks good to me, thank you for adding tests to check correctness!
Brendan Hansknecht said:
Hmm, looks like we are missing record destructuring in the new syntax:
{ start, end, step } = rec
expr_unexpected_token, at token OpAssign at ...
Can you share the overall file and I'll add it as a snapshot that I can fill in gaps. To be clear, this is supposed to be supported. I can see from even syntax grab bag (and verified by looking at Parser.parseStmt) that I must have just forgotten to look for OpenCurly and OpenSquare in the statement position to check them being used as a pattern before sending it over to parseExpr.
And the match guards were also just for some reason not on my check list
the Changelog episode recording went great, and I used the numbers in it - thanks for doing those @Brendan Hansknecht and thanks for getting the parser and formatter to the point where we could actually compile a whole builtin module with them @Anthony Bullard! :heart_eyes:
episode should be out in a week or two
@Luke Boswell I ran it on a folder with 1000 copies of this file
file_1.roc
and a version of it that just duplicated the body 1000 times
Can you share the overall file and I'll add it as a snapshot that I can fill in gaps.
I ended up removing everything that caused a problem in order to get it compiling.... I can try to find and re-add all the things that broke the tokenizer/parser. Might not have time to do so tonight though. Should just be the two issues I mentioned above and then many commented out type annotations.
This is good. Thank you. I'm currently rewriting a bunch of things. I've been experimenting with the LLM workflow, and it's amazing how useful but also unhelpful the AI can be. It gives code that looks pretty good but can be totally useless. I'm rebuilding my test cases to ensure it's working correctly.
Might not have time to do so tonight though
Actually, I am confused on what day it is. I should have time tonight, don't have time tomorrow night. I'll try to skim through the file and re-add the various pieces I had to remove.
@Anthony Bullard I think this file should work for your snapshot test. Also, beyond just making parse, definitely needs some formatting cleanup.
Edit: newer version of file with another fix.
List.roc
For example:
walk_backwards_help = |list, state, f, index_plus_one|
if index_plus_one == 0
state
else {
index = Num.sub_wrap(index_plus_one, 1)
next_state = f(state, get_unsafe(list, index))
walk_backwards_help(list, next_state, f, index)
}
I keep remembering ...oh, and one more thing for this.... In the new compiler we don't allow _intionally_named_but_ignored
. It has to just be _
. Was this intentional? I like named ignored variables, but maybe not everyone does and we decided to remove it?
I think its meant to be the same, i.e. _named_ignored_thing
is still valid. It's probably just been overlooked/not implemented yet.
Luke Boswell said:
The plan is to do some SIMD stuff there? It sounds like it might be a really contained side project. Would it be too early to explore that now?
I followed the updates and PR on this, but I think it's too early for SIMD, I would not add serious complexity to code that is still expected to change and not a bottleneck.
I agree. It was fun to explore a little, but turned out to be a little more complicated than I thought.
Anton said:
I followed the updates and PR on this, but I think it's too early for SIMD, I would not add serious complexity to code that is still expected to change and not a bottleneck.
Luckily, chomp trivia is not still expected to change and is a bottleneck.
It also is such a small chunk of code that it would be trivial to switch back if we need
I think small localized optimizations like this should be totally fine.
I say this more in terms of, people should feel free to explore optimizations and we should definitely accept them if they are small and self contained. Don't want to block that energy.
I agree, I forgot that this was about chompTrivia
specifically
For the bottleneck I was thinking about parsing not being a bottleneck in compilation but yeah small, self-contained and not expected to change sounds good
Yeah, for dev with an interpreter, will be interesting to see the bottlenecks
Also, I guess formatting a ton of files is another workflow we want to be super fast
But yeah, in general I agree
Richard Feldman said:
the Changelog episode recording went great, and I used the numbers in it - thanks for doing those Brendan Hansknecht and thanks for getting the parser and formatter to the point where we could actually compile a whole builtin module with them Anthony Bullard! :heart_eyes:
Can't wait to see it myself!
I've noticed we don't parse tuple patterns... are these planned but just not implemented yet?
The parser only looks for pattern assignments when it sees a LowerIdent
token. For tuple patterns starting with (
, it falls through to the default case where it parses an expression statement instead of a pattern assignment.
To fix this, we would need to:
.OpenRound
and .NoSpaceOpenRound
in parseStmtByType
OpAssign
.decl
statement with the patternHowever, this approach has a problem: we'd need to look ahead quite a bit to determine if we have a pattern assignment vs just a tuple expression. A better approach might be to:
OpAssign
.decl
statementThis is likely why the current parser implementation only supports simple identifier patterns in assignments - handling complex patterns requires either significant lookahead or the ability to convert expressions to patterns after parsing.
The snapshot shows this limitation clearly - tuple "patterns" are being parsed as tuple expressions, leading to the "expr_unexpected_token" error when the parser encounters the =
sign.
I thought these would be supported, but I think they haven't been implemented in the Parser yet
~~~META
description=Tuple pattern matching tests
type=expr
~~~SOURCE
{
# Simple tuple destructuring
(x, y) = (1, 2)
# Nested tuple patterns
((a, b), (c, d)) = ((10, 20), (30, 40))
# Mixed patterns with literals
(first, second, third) = (100, 42, 200)
# Tuple with string and tag patterns
(name, string, boolean) = ("Alice", "fixed", True)
# Tuple with list pattern
(list, hello) = ([1, 2, 3], "hello")
{}
}
check the syntax grab bag snapshot
that's all of the implemented syntax
if tuples aren't there i think it was an oversight
I think it is specifically for pattern matching that they aren't implemented.
I mentioned this a while ago in this thread when porting List.roc over
should be trivial to fix, but it would be MUCH easier for all of this if decls started with a keyword :rolling_on_the_floor_laughing:
Like let
? What are you thinking
yeah
this is how we end up with nice syntax: we do the thing that's nicer for the end user but harder for the compiler authors :grinning_face_with_smiling_eyes:
oh totally! not suggesting we SHOULD
just stating the simple fact that patterns in decls with no keywords that are almost indistinguishable from a expr of the same shape until you find the equal sign is complicated specifically for things like lists and tuples
with a record i know after two tokens
ah for sure!
but lists and tuples patterns can mean a theoretically unbounded lookahead
What do you think of Claude's suggestion
handling complex patterns requires either significant lookahead or the ability to convert expressions to patterns after parsing.
Could we convert it from an expression to pattern in Can?
possibly
FWIW I’d actually like to have the same underlying ast nodes be reused for exprs and patterns
but i think it would be a pain, but i can look at it
That way we don’t have to convert (only verify)
Joshua Warner said:
FWIW I’d actually like to have the same underlying ast nodes be reused for exprs and patterns
there'd have to be near perfect symmetry between them for me to roll that way. i can take a look at that tomorrow
it would unlock some nice characteristics
really removing the need for lookahead at all
the only issue i can think of off top is alternates
There’s not a perfect symmetry, but it’s close enough to be useful IMO. The only alternative is allowing conversions, which I think will end up being a bit costly from a perf perspective
like Foo(1 | 2)
Alternates can’t occur in expr position anyway (at the start of a statement)
yeah but i wouldn't want to underfit the node to support something for this kind of pattern that's also used for exprs
Oh to be clear I’m taking about the internal node types being shared, not the types exposed to the rest of the compiler
i mean all nodes internally are identical
Right, but expressions and patterns should use the same node type IDs 
We can then translate them into different public enumeration types, depending on the context
it's just the different tags and how those tags affect the interpretation of the data and extra data
Yep, so the same internal tags, but different external enumerations 
so all use expr ids
but you get / set with specific functions for pattern or expr?
I was matching we would also have a pattern ID that sneaky uses the same ID space as expression IDs
Sneakily
they all have the same address space
they are all just integer offsets into a single array list
Errrr… right, I forgot about that.
so this shouldn't be two hard at all
I guess the only critical thing then is making sure that if you were to take a pattern ID and cast it to an expression ID, then decoded as an expression, it should “just work”
yes
actually that might not be that much work
(famous last words)
Naturally, these code paths to convert to the external types will act as correctness assertions on the purse parser algorithms that are deciding what is and is not a valid expression (or pattern)
Yeah hopefully
lol dictating the word parser is impossibrl
wow you are right!
just tried myself
Does the parser currently support multiple patterns in match
? ie A | B => ...
The type looks like in parse/AST.zig
looks like:
pub const MatchBranch = struct {
pattern: Pattern.Idx,
body: Expr.Idx,
region: TokenizedRegion,
}
So I'm guessing either it's not yet implemented, or when we parse we split A
and B
into their own branches pointing to the same expr?
I did think about that. I haven't implemented it (or at least not deliberately) in my PR
My guess is we'll split that into multiple branches... but the Parser should be where that is done I think. Actually I'm not sure...
We dont want to desugar in this version of the compiler so we probably need a different approach to ha dle this.
In CIR, it has a span. Can the parse AST do the same?
pub const Branch = struct {
patterns: Match.BranchPattern.Span,
...
}
I think so, it's a fair change to the parser. We may want to wait until after @Anthony Bullard has done his thing before we change it.
seems like higher-order function type annotations don't parse without parens right now :sweat_smile:
this parses:
foo : ((Str -> Str) -> Str)
...but without the outer parens, this fails to parse:
foo : (Str -> Str) -> Str
that's definitely a bug
if you add a snapshot for it i'll fix it
thanks! I pushed a repro snapshot to higher-order-annotations
Last updated: Jul 06 2025 at 12:14 UTC