This is not a compiler / language bug, but it does affect the community. I have been seeing a fair number of highlighting errors using neovim. During AoC I was able to isolate a nice example of the broken parse tree. Here is the example:
Screenshot_20241201_205705.png
Can someone point me to a good explanation of the significant newline / indenting rules, or where this would be implemented in the roc parser so I can patch the tree sitter grammar?
The tree-sitter grammar is very much on hold because of roc's very unstable syntax right now.
(I'm faldor20 the grammar author btw)
Once the purity inference and iterator syntax is in, I'll go in and update it to bring it up to scratch.
I'm super happy for you to submit a PR though and I can lend a hand. I'm just not personally going to put much work on it while roc is a rapidly moving target. :)
If it's interesting, a couple things that cross my mind that we could do to make tree-sitter grammar development easier:
test_syntax
crate, to get automated testing for when a test passes on the compiler's parser but gives an error on the tree-sitter oneCan we add tree-sitter as a fuzz target?
We can
I remember a few years ago when I was playing with fuzzing tree-sitter, it immediately fell over
Like, triggering crashes in the C code with relatively benign-looking inputs
Hopefully things are nicer these days?
Also that might have been specific to the grammar I was working with at the time (the rust grammar I think?)
One interesting data point for this particular file is running roc format
fixes the TS parse tree.
Broken pre-formatting: https://github.com/r-bar/advent24/blob/908fc5b735b37093a1a3db79a40c9bd872cc988c/day01/p2.roc
Fixed post-formatting: https://github.com/r-bar/advent24/blob/68346809ed5ee4cbdb21ab166061cb37036f2155/day01/p2.roc
In this case breaking up the Ok count ->
branch above the error fixes the issue for the top level declarations below.
Broken:
Ok count -> (Num.toI64 count) * item
|> Num.add accum
Ok:
Ok count ->
(Num.toI64 count) * item
|> Num.add accum
Joshua Warner said:
If it's interesting, a couple things that cross my mind that we could do to make tree-sitter grammar development easier:
- We could hook that up in our
test_syntax
crate, to get automated testing for when a test passes on the compiler's parser but gives an error on the tree-sitter one- We can even hook up automatic minimization to take a given failing test case, and try to find the minimal similar test that parses without errors on the compiler's parser but gives errors under tree-sitter.
That's a cool idea! I'd love to have that info, but blocking merges would be pretty annoying, so if it could just warn us that would be awesome!. Updating the grammar isn't always easy, and it's actually quite tolerant, so having it break a little now and again probably isn't a big deal.
Ryan Barth said:
One interesting data point for this particular file is running
roc format
fixes the TS parse tree.
Hahahah, look indentation based syntax is hard. The amount of time I spent just on that in the grammar is stupid
I have no doubt
Eli Dowling said:
Ryan Barth said:
One interesting data point for this particular file is running
roc format
fixes the TS parse tree.
Hahahah, look indentation based syntax is hard. The amount of time I spent just on that in the grammar is stupid
You had to create a custom tokenizer right?
It uses some c code to do part of the parsing for newlines, yeah. I try to keep the c code part as minimal as possible, it's adapted from the python version if I remember rightly
Last updated: Jul 06 2025 at 12:14 UTC