According to the vendor list for linguist (which is GitHub's syntax highlighting lib), we can use tree sitter for our GitHub syntax, which would keep everything aligned with the tool most users rely on for syntax highlighting. We currently use the VSCode Roc highlighting which I think gets less love.
We definitely need that treesitter grammar to be updated
I'm biased for tree-sitter being a Helix user, but also because it seems like the more future-facing technology
I have some experience with TS, but @Eli Dowling owns the repo I think and is on vacation
Yes, we'd want to update the tree-sitter-roc grammar to point at all the syntax features we want for the foreseeable future
Yeah TM grammars are not the future
It should become a roc-lang repo
And then we would make a PR to GitHub to point at the updated commit hash of tree-sitter-roc
GitHub makes releases of linguist every few months, and the last one was at the end of November, so if we update by mid-February we can probably get this done
I agree that it should be an official Roc repo
Maybe someone with a more holistic vision than of the planned syntax could write down a set of what we'd need to highlight, even informally
And then it wouldn't be so bad to update the tree-sitter grammar given that kind of list
I think that's reasonable. And yes, I'm on vacation. I'll be ending that by late February though.
The syntax will need some pretty low level changes for the PNC syntax. But that will be the hardest part. It's much simpler than the current calling syntax in many ways though.
It would be worth making it support the for syntax as well and just putting it behind an ifdef for now.
I agree with @Sam Mohr about getting a list of all planned syntax changes and then making ts ahead of what's actually implemented in the roc compiler.
Unfortunately we need to support both whitespace for types and parens for values
Yep, for
, var
, reassignable_
variables, etc.
I have some time today to work on the TS grammar. I'll make some corpus entries from the real-world example repo that @Richard Feldman made and then work on getting the syntax right.
One of the big areas we could work on first is starting a new corpus that has nice, simple examples of the new features.
As I said I'll use the realworld repo to try to rough out what the changes should be to the grammar
Awesome! I can try to put together some such syntax examples
It feels like having an official, tested syntax definition might help here as well
I started working on a https://docs.rs/pest/latest/pest/ definition for Roc earlier, but it feels like it's just gonna be another dependency we'd try to remove eventually
I wonder if there's some simple specification for a grammar we could implement ourselves
Though maybe a dependency like pest
that we only use for testing and LSP/tree-sitter coherence wouldn't be such a bad thing to pull in
Is a pest definition going to be more understandable than a treesitter definition?
I don't think so...
I think it's a PEG-adjacent grammar that we could test both our custom parser and tree-sitter against (probably)
We used pest in the past
Whatever tool would fill the behavior would be great
I think the new tutorial has plenty of small examples of new syntax, as well as the new release of roc-lang/examples
ocaml has what I'd consider the best syntax docs. They have these fancy (I asssume autogenerated) syntax specs.
take this:
https://ocaml.org/manual/5.3/bindingops.html
Here's the source for that page: https://github.com/ocaml/ocaml/blob/trunk/manual/src/refman/extensions/bindingops.etex
or their docs on what an expression is:
https://ocaml.org/manual/5.3/expr.html
Anton said:
We used pest in the past
This is not the latest version of the file but here's the link: https://github.com/roc-lang/roc/pull/2675/files#diff-490af2dd8e5b9915ae07391871ab6575a352e10294bc3d90f4d05e7328aab278
Sam Mohr said:
Here's the source for that page: https://github.com/ocaml/ocaml/blob/trunk/manual/src/refman/extensions/bindingops.etex
Oh well, maybe it's not autogenerated.... shame
@Eli Dowling if we could generate something like those docs from some PEG-adjacent grammar, that'd be great
This isn't a topic I know a whole lot about but :
https://github.com/matthijsgroen/ebnf2railroad
And it seems there are a number of parser generators that take EBNF as input it seems the most standardised. Could be a good choice as a way to generate some docs and also a reference implementation that while maybe slow could be used to compare against the rust and TS implementations and provide a spec
I think this is a good thing to explore, but syntax highlighting grammar != actual parsing grammar
Yep
Something it'd be nice to enable for the future is easy, safe codegen
Without a macro system, we need codegen for stuff like gRPC and DB access interfacing
Anthony Bullard said:
I think this is a good thing to explore, but syntax highlighting grammar != actual parsing grammar
Sure but from a testing standpoint, if the syntax grammar ever rejects anything the parsing grammar accepts we have an issue
I want something that everyone can point at for "correctness"
But doing good error handling in PEG grammars can't compare to what we have today
https://github.com/faldor20/tree-sitter-roc/new-syntax
I made a tiny start, but got a bit bogged down in nix stuff trying to update tree-sitter on my terrible train wifi :sweat_smile:
I might get some more train time, but otherwise I'll finish it when I'm more free in a month.
I'm not very familiar with tree sitter, but I can try to work on it too
Would you have time to review on vacation if you didn't have to write it?
Please feel free to say no
yeah, for for sure. I can definitely carve out the time for that.
Honestly If you're not familiar with TS working on the test corpus would be the biggest help.
I can crank out the changes to TS, but I'm not familiar enough with the changes to roc to know exactly what needs doing.
Having a list of changes and a test corpus to make the changes against would be amazing.
Any particular format for the test corpus?
Presumably just a list of Roc snippets
Without their TS S expression translation
I suspect you or others a little more dialled into the current roc changes would be much quicker at the sorting out the corpus and I'd probably be quicker at doing the TS changes, so we may as well work together
take a look at the existing corpus
Yeah, I think I know what everything will look like
Don't bother with the s-expressions. I'll just be checking that the generated expressions have no errors and look reasonable and I'll accept whatever TS generates.
Okay
you can omit the output part of the corpus and just add the samples, then tree-sitter will insert all the s-expressions when we accept the test output as good
I'll ping you if/when I get to this. Should be this weekend, but I've suggested I'll be doing like 4 things so we'll see which of them happen...
no worries at all.
My one suggestion is format the test names as category_less_specific_name_
that makes it easy to filter and run a small subset of tests:tree-sitter test -t func_def
Just like we do with the compiler tests, got it
I like to build each group of tests starting with a super basic test that's really short and then building up to much longer more complex ones, but I think you know what you're doing :)
feel free to reuse the existing tests, I just copied them all to old_corpus so we can be intenional about what is compatible and what isn't
I appreciate the reminder!
It's good to reinforce shared knowledge
Why not just make a platform that uses the actual roc parser as a library?
I was talking with Josh Warner about this and we made a bit of a start on that. He has some ideas he was cooking.
I guess fuzzing has been the priority rn.
Update on this:
I added a couple more features (new string interpolation, PNC calling and static dispatch calling ), and added some very basic corpus tests for them.
Luke Boswell said:
Why not just make a platform that uses the actual roc parser as a library?
I was talking with Josh Warner about this and we made a bit of a start on that. He has some ideas he was cooking.
I guess fuzzing has been the priority rn.
This generally makes sense, but there are two problems that it'd be interested to find a solution to:
roc_parse
and it parses an AST node, how do we access that arena-allocated syntax tree if not with a lot of indirection. Would we copy the whole thing into Roc recursively?If those problems can be mitigated, then this solution makes sense
Yeah I think the idea was to serialise and deserialise the AST.
What's the connection between tree-sitter / GH grammar and the "roc ast platform" thing (for lack of a better name)?
Is that just the ast serialization?
Sam Mohr said:
Something it'd be nice to enable for the future is easy, safe codegen
Without a macro system, we need codegen for stuff like gRPC and DB access interfacing
Things like this
So I took that to mean, we want a way to parse roc code, to get an AST we can do interesting things with
Ahh, so that kinda sounds like the reverse - you'd have a roc platform that is 100% sandboxed, and only has a single main: Ast
(or perhaps main: {} -> Ast
) it exposes, where it's expected to import
any source files it needs as str/byte literals, run, and then spit out the generated code
Something like that, yeah
Last updated: Jul 06 2025 at 12:14 UTC