Treesitter expression body error · bugs

Stream: bugs

Topic: Treesitter expression body error

Ryan Barth (Dec 02 2024 at 05:02):

This is not a compiler / language bug, but it does affect the community. I have been seeing a fair number of highlighting errors using neovim. During AoC I was able to isolate a nice example of the broken parse tree. Here is the example:

Screenshot_20241201_205705.png

Can someone point me to a good explanation of the significant newline / indenting rules, or where this would be implemented in the roc parser so I can patch the tree sitter grammar?

Eli Dowling (Dec 02 2024 at 09:51):

The tree-sitter grammar is very much on hold because of roc's very unstable syntax right now.
(I'm faldor20 the grammar author btw)
Once the purity inference and iterator syntax is in, I'll go in and update it to bring it up to scratch.

I'm super happy for you to submit a PR though and I can lend a hand. I'm just not personally going to put much work on it while roc is a rapidly moving target. :)

Joshua Warner (Dec 03 2024 at 01:16):

If it's interesting, a couple things that cross my mind that we could do to make tree-sitter grammar development easier:

We could hook that up in our test_syntax crate, to get automated testing for when a test passes on the compiler's parser but gives an error on the tree-sitter one
We can even hook up automatic minimization to take a given failing test case, and try to find the minimal similar test that parses without errors on the compiler's parser but gives errors under tree-sitter.

Luke Boswell (Dec 03 2024 at 01:17):

Can we add tree-sitter as a fuzz target?

Joshua Warner (Dec 03 2024 at 01:17):

We can

Joshua Warner (Dec 03 2024 at 01:18):

I remember a few years ago when I was playing with fuzzing tree-sitter, it immediately fell over

Joshua Warner (Dec 03 2024 at 01:18):

Like, triggering crashes in the C code with relatively benign-looking inputs

Joshua Warner (Dec 03 2024 at 01:18):

Hopefully things are nicer these days?

Joshua Warner (Dec 03 2024 at 01:18):

Also that might have been specific to the grammar I was working with at the time (the rust grammar I think?)

Ryan Barth (Dec 03 2024 at 01:32):

One interesting data point for this particular file is running roc format fixes the TS parse tree.

Broken pre-formatting: https://github.com/r-bar/advent24/blob/908fc5b735b37093a1a3db79a40c9bd872cc988c/day01/p2.roc
Fixed post-formatting: https://github.com/r-bar/advent24/blob/68346809ed5ee4cbdb21ab166061cb37036f2155/day01/p2.roc

In this case breaking up the Ok count -> branch above the error fixes the issue for the top level declarations below.

Broken:

            Ok count -> (Num.toI64 count) * item
                |> Num.add accum

Ok:

            Ok count ->
                (Num.toI64 count) * item
                |> Num.add accum

Eli Dowling (Dec 03 2024 at 01:32):

Joshua Warner said:

If it's interesting, a couple things that cross my mind that we could do to make tree-sitter grammar development easier:

We could hook that up in our test_syntax crate, to get automated testing for when a test passes on the compiler's parser but gives an error on the tree-sitter one

We can even hook up automatic minimization to take a given failing test case, and try to find the minimal similar test that parses without errors on the compiler's parser but gives errors under tree-sitter.

That's a cool idea! I'd love to have that info, but blocking merges would be pretty annoying, so if it could just warn us that would be awesome!. Updating the grammar isn't always easy, and it's actually quite tolerant, so having it break a little now and again probably isn't a big deal.

Eli Dowling (Dec 03 2024 at 01:33):

Ryan Barth said:

One interesting data point for this particular file is running roc format fixes the TS parse tree.

Hahahah, look indentation based syntax is hard. The amount of time I spent just on that in the grammar is stupid

Ryan Barth (Dec 03 2024 at 01:34):

I have no doubt

Anthony Bullard (Dec 03 2024 at 01:54):

Eli Dowling said:

Ryan Barth said:

One interesting data point for this particular file is running roc format fixes the TS parse tree.

Hahahah, look indentation based syntax is hard. The amount of time I spent just on that in the grammar is stupid

You had to create a custom tokenizer right?

Eli Dowling (Dec 03 2024 at 03:25):

It uses some c code to do part of the parsing for newlines, yeah. I try to keep the c code part as minimal as possible, it's adapted from the python version if I remember rightly

Last updated: Jul 26 2025 at 12:14 UTC