Stream: contributing

Topic: Use tree-sitter-roc for GH grammar


view this post on Zulip Sam Mohr (Jan 24 2025 at 21:13):

According to the vendor list for linguist (which is GitHub's syntax highlighting lib), we can use tree sitter for our GitHub syntax, which would keep everything aligned with the tool most users rely on for syntax highlighting. We currently use the VSCode Roc highlighting which I think gets less love.

view this post on Zulip Anthony Bullard (Jan 24 2025 at 21:13):

We definitely need that treesitter grammar to be updated

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:13):

I'm biased for tree-sitter being a Helix user, but also because it seems like the more future-facing technology

view this post on Zulip Anthony Bullard (Jan 24 2025 at 21:13):

I have some experience with TS, but @Eli Dowling owns the repo I think and is on vacation

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:14):

Yes, we'd want to update the tree-sitter-roc grammar to point at all the syntax features we want for the foreseeable future

view this post on Zulip Anthony Bullard (Jan 24 2025 at 21:14):

Yeah TM grammars are not the future

view this post on Zulip Anthony Bullard (Jan 24 2025 at 21:14):

It should become a roc-lang repo

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:14):

And then we would make a PR to GitHub to point at the updated commit hash of tree-sitter-roc

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:15):

GitHub makes releases of linguist every few months, and the last one was at the end of November, so if we update by mid-February we can probably get this done

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:15):

I agree that it should be an official Roc repo

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:16):

Maybe someone with a more holistic vision than of the planned syntax could write down a set of what we'd need to highlight, even informally

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:16):

And then it wouldn't be so bad to update the tree-sitter grammar given that kind of list

view this post on Zulip Eli Dowling (Jan 24 2025 at 21:18):

I think that's reasonable. And yes, I'm on vacation. I'll be ending that by late February though.
The syntax will need some pretty low level changes for the PNC syntax. But that will be the hardest part. It's much simpler than the current calling syntax in many ways though.

view this post on Zulip Eli Dowling (Jan 24 2025 at 21:21):

It would be worth making it support the for syntax as well and just putting it behind an ifdef for now.
I agree with @Sam Mohr about getting a list of all planned syntax changes and then making ts ahead of what's actually implemented in the roc compiler.

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:21):

Unfortunately we need to support both whitespace for types and parens for values

view this post on Zulip Sam Mohr (Jan 24 2025 at 21:21):

Yep, for, var, reassignable_ variables, etc.

view this post on Zulip Eli Dowling (Jan 25 2025 at 11:42):

I have some time today to work on the TS grammar. I'll make some corpus entries from the real-world example repo that @Richard Feldman made and then work on getting the syntax right.

One of the big areas we could work on first is starting a new corpus that has nice, simple examples of the new features.

As I said I'll use the realworld repo to try to rough out what the changes should be to the grammar

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:43):

Awesome! I can try to put together some such syntax examples

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:44):

It feels like having an official, tested syntax definition might help here as well

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:44):

I started working on a https://docs.rs/pest/latest/pest/ definition for Roc earlier, but it feels like it's just gonna be another dependency we'd try to remove eventually

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:45):

I wonder if there's some simple specification for a grammar we could implement ourselves

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:45):

Though maybe a dependency like pest that we only use for testing and LSP/tree-sitter coherence wouldn't be such a bad thing to pull in

view this post on Zulip Eli Dowling (Jan 25 2025 at 11:47):

Is a pest definition going to be more understandable than a treesitter definition?

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:47):

I don't think so...

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:48):

I think it's a PEG-adjacent grammar that we could test both our custom parser and tree-sitter against (probably)

view this post on Zulip Anton (Jan 25 2025 at 11:48):

We used pest in the past

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:48):

Whatever tool would fill the behavior would be great

view this post on Zulip Anthony Bullard (Jan 25 2025 at 11:49):

I think the new tutorial has plenty of small examples of new syntax, as well as the new release of roc-lang/examples

view this post on Zulip Eli Dowling (Jan 25 2025 at 11:49):

ocaml has what I'd consider the best syntax docs. They have these fancy (I asssume autogenerated) syntax specs.
take this:
https://ocaml.org/manual/5.3/bindingops.html

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:50):

Here's the source for that page: https://github.com/ocaml/ocaml/blob/trunk/manual/src/refman/extensions/bindingops.etex

view this post on Zulip Eli Dowling (Jan 25 2025 at 11:50):

or their docs on what an expression is:
https://ocaml.org/manual/5.3/expr.html

view this post on Zulip Anton (Jan 25 2025 at 11:51):

Anton said:

We used pest in the past

This is not the latest version of the file but here's the link: https://github.com/roc-lang/roc/pull/2675/files#diff-490af2dd8e5b9915ae07391871ab6575a352e10294bc3d90f4d05e7328aab278

view this post on Zulip Eli Dowling (Jan 25 2025 at 11:51):

Sam Mohr said:

Here's the source for that page: https://github.com/ocaml/ocaml/blob/trunk/manual/src/refman/extensions/bindingops.etex

Oh well, maybe it's not autogenerated.... shame

view this post on Zulip Sam Mohr (Jan 25 2025 at 11:51):

@Eli Dowling if we could generate something like those docs from some PEG-adjacent grammar, that'd be great

view this post on Zulip Eli Dowling (Jan 25 2025 at 12:02):

This isn't a topic I know a whole lot about but :
https://github.com/matthijsgroen/ebnf2railroad
And it seems there are a number of parser generators that take EBNF as input it seems the most standardised. Could be a good choice as a way to generate some docs and also a reference implementation that while maybe slow could be used to compare against the rust and TS implementations and provide a spec

https://www.researchgate.net/publication/376883410_Comparison_of_Leading_Language_Parsers_-_ANTLR_JavaCC_SableCC_Tree-sitter_Yacc_Bison

view this post on Zulip Anthony Bullard (Jan 25 2025 at 12:28):

I think this is a good thing to explore, but syntax highlighting grammar != actual parsing grammar

view this post on Zulip Sam Mohr (Jan 25 2025 at 12:32):

Yep

view this post on Zulip Sam Mohr (Jan 25 2025 at 12:32):

Something it'd be nice to enable for the future is easy, safe codegen

view this post on Zulip Sam Mohr (Jan 25 2025 at 12:33):

Without a macro system, we need codegen for stuff like gRPC and DB access interfacing

view this post on Zulip Eli Dowling (Jan 25 2025 at 12:33):

Anthony Bullard said:

I think this is a good thing to explore, but syntax highlighting grammar != actual parsing grammar

Sure but from a testing standpoint, if the syntax grammar ever rejects anything the parsing grammar accepts we have an issue

view this post on Zulip Sam Mohr (Jan 25 2025 at 12:34):

I want something that everyone can point at for "correctness"

view this post on Zulip Sam Mohr (Jan 25 2025 at 12:34):

But doing good error handling in PEG grammars can't compare to what we have today

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:30):

https://github.com/faldor20/tree-sitter-roc/new-syntax
I made a tiny start, but got a bit bogged down in nix stuff trying to update tree-sitter on my terrible train wifi :sweat_smile:

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:32):

I might get some more train time, but otherwise I'll finish it when I'm more free in a month.

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:38):

I'm not very familiar with tree sitter, but I can try to work on it too

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:38):

Would you have time to review on vacation if you didn't have to write it?

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:38):

Please feel free to say no

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:40):

yeah, for for sure. I can definitely carve out the time for that.
Honestly If you're not familiar with TS working on the test corpus would be the biggest help.
I can crank out the changes to TS, but I'm not familiar enough with the changes to roc to know exactly what needs doing.
Having a list of changes and a test corpus to make the changes against would be amazing.

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:41):

Any particular format for the test corpus?

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:41):

Presumably just a list of Roc snippets

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:41):

Without their TS S expression translation

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:41):

I suspect you or others a little more dialled into the current roc changes would be much quicker at the sorting out the corpus and I'd probably be quicker at doing the TS changes, so we may as well work together

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:41):

take a look at the existing corpus

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:41):

Yeah, I think I know what everything will look like

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:42):

Don't bother with the s-expressions. I'll just be checking that the generated expressions have no errors and look reasonable and I'll accept whatever TS generates.

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:42):

Okay

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:43):

you can omit the output part of the corpus and just add the samples, then tree-sitter will insert all the s-expressions when we accept the test output as good

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:43):

I'll ping you if/when I get to this. Should be this weekend, but I've suggested I'll be doing like 4 things so we'll see which of them happen...

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:45):

no worries at all.

My one suggestion is format the test names as category_less_specific_name_
that makes it easy to filter and run a small subset of tests:tree-sitter test -t func_def

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:46):

Just like we do with the compiler tests, got it

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:48):

I like to build each group of tests starting with a super basic test that's really short and then building up to much longer more complex ones, but I think you know what you're doing :)

view this post on Zulip Eli Dowling (Jan 25 2025 at 14:48):

feel free to reuse the existing tests, I just copied them all to old_corpus so we can be intenional about what is compatible and what isn't

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:48):

I appreciate the reminder!

view this post on Zulip Sam Mohr (Jan 25 2025 at 14:48):

It's good to reinforce shared knowledge

view this post on Zulip Luke Boswell (Jan 25 2025 at 18:28):

Why not just make a platform that uses the actual roc parser as a library?

I was talking with Josh Warner about this and we made a bit of a start on that. He has some ideas he was cooking.

I guess fuzzing has been the priority rn.

view this post on Zulip Eli Dowling (Jan 25 2025 at 20:33):

Update on this:
I added a couple more features (new string interpolation, PNC calling and static dispatch calling ), and added some very basic corpus tests for them.

view this post on Zulip Sam Mohr (Jan 25 2025 at 21:19):

Luke Boswell said:

Why not just make a platform that uses the actual roc parser as a library?

I was talking with Josh Warner about this and we made a bit of a start on that. He has some ideas he was cooking.

I guess fuzzing has been the priority rn.

This generally makes sense, but there are two problems that it'd be interested to find a solution to:

If those problems can be mitigated, then this solution makes sense

view this post on Zulip Luke Boswell (Jan 25 2025 at 21:56):

Yeah I think the idea was to serialise and deserialise the AST.

view this post on Zulip Joshua Warner (Jan 31 2025 at 04:25):

What's the connection between tree-sitter / GH grammar and the "roc ast platform" thing (for lack of a better name)?

view this post on Zulip Joshua Warner (Jan 31 2025 at 04:26):

Is that just the ast serialization?

view this post on Zulip Luke Boswell (Jan 31 2025 at 04:27):

Sam Mohr said:

Something it'd be nice to enable for the future is easy, safe codegen
Without a macro system, we need codegen for stuff like gRPC and DB access interfacing

Things like this

view this post on Zulip Luke Boswell (Jan 31 2025 at 04:30):

So I took that to mean, we want a way to parse roc code, to get an AST we can do interesting things with

view this post on Zulip Joshua Warner (Jan 31 2025 at 04:33):

Ahh, so that kinda sounds like the reverse - you'd have a roc platform that is 100% sandboxed, and only has a single main: Ast (or perhaps main: {} -> Ast) it exposes, where it's expected to import any source files it needs as str/byte literals, run, and then spit out the generated code

view this post on Zulip Sam Mohr (Jan 31 2025 at 06:02):

Something like that, yeah


Last updated: Jul 06 2025 at 12:14 UTC