I would like to contribute to roc in some way, i love what you guys are doing and I think the language holds a lot of promise. My main experience that's relevant to roc is with editor tooling. However I understand from reading about the current langserver that the roc compiler isn't at a point where implementing proper completion is possible.
One thing i can help with is making a tree-sitter grammar though, this allows for better support for neovim, helix and any other editors that make heavy use of tree-sitter. A TS grammar may also be useful when developing a structured editor which as i understand it is an eventual goal for roc. See https://www.masteringemacs.org/article/combobulate-structured-movement-editing-treesitter.
Basically i just want to make sure someone else isn't already doing this, and if not, I'll get started :)
very cool! As far as I know, nobody else is working on this right now
Great, I'll get on it then!
I would 100% use it. I have debated trying to make one a few times, but never dove into it.
I'm probably going to have a few questions as I go, the first one, is there anywhere I could find a spec for things like what chars can exist in an identifier? Var name, func name etc
I think the parser has tests for things like that
https://github.com/roc-lang/roc/tree/main/crates%2Fcompiler%2Ftest_syntax%2Ftests
Oh, thanks that will be super helpful :)
Well it's far too early in the morning, but here we are, the basics are all working, just need to start implementing the esoteric stuff now :)
image.png
wow, impressive speed.
Amazing @Eli Dowling!
Nice colors too :)
Wonderful! I have a rudimentary integration of the language server with my Emacs and would love the syntax highlighting too. Where can I find your code @Eli Dowling?
Found it https://github.com/faldor20/tree-sitter-roc :D
Oh don't look, it's far to ugly for the world to see.
It's all a mess right now, for my eyes only :sweat_smile:.
Gimme a few days
Too late :eyes:
Honestly, it's very impressive what you got so far. I'm trying to figure out how to build it though. I guess with cargo?
I know nothing about Tree Sitter, but maybe I can help by improving the flake.nix and providing a makefile?
I am so excited of all of this. Can't want for nice colors in helix!!!
Oh yeah, it's a bit awkward to build tree-sitter definitions using nix.
Well it's a god awful mess because it's based of the elm parser. I decided to start there and slowly replace it with the differences. Unfortunately elm makes some fundimental assumptions about the structure of their code that are incompatible with roc so there were some issues I just couldn't fix. As far as I can tell, because they enforce purity they never have two independent expressions within the same scope.
So I've restarted based off fsharp instead. Things are coming along nicely, and it's almost back to how it was with elm.
If you want to test it you can just run 'tree-sitter test'
On nixos I can't get helix to build my tree-sitter definitions, they use a special nix flake for building in the normal nix install for helix so you'd have to make an override for helix and add it that way.
My nasty way is right now if I want to test on helix is: reference the Grammer in neovim, run tsupdate which builds it properly then if I want to test it in helix I roc.so from the neovim tree-sitter parsers directory into the helix grammars directory.
But I wouldn't recommend that... Because it's stupid :sweat_smile:
Looks like a lot of people here are very excited about your effort. Please take it easy. I'm sure it's nobody's intention to push you.
Appreciate the concern :)
Well tonight's work has us a working grammar, highlighting and a Todo list with the things I'll work on in the upcoming days. The foundation should be all setup now and everything just needs dialing in. Edge cases,and missing language features, neatening up names etc but the basic structural bits are all represented and parse correctly.
I know it's possible to integrate a tree-sitter grammar into a VSCode extension, might be something for @Ivan Demchenko to look into. I think tree-sitter sits somewhere between the basic textmate grammar highlighting that VSCode uses and the full blown LSP integration which is still in the early stages.
@Hannes, my understanding is that it's not very normal and support is pretty poor.
There is this https://marketplace.visualstudio.com/items?itemName=ms-vscode.anycode
But I believe the way it currently works in most cases is that you basically fake the semantic tokens response using the data from tree-sitter. That's how this works at least https://github.com/EvgeniyPeshkov/syntax-highlighter
Are you aware of any other methods?
Haha, to be honest I i just remember seeing a heading called "integrating tree-sitter" in the VSCode extension documentation, i didn't actually read it :sweat_smile: I'll defer to someone with actual knowledge on the topic
Little update on tree-sitter status, I've been going through all the example code in the roc repo, and the examples pages on the Wip website, and making sure everything works. I've added many many little features and I'm getting to the point where most stuff I paste in mostly just works. Still lots of edge cases to be found though.
If anyone could point me to some good examples that use a lot of language features that'd be great.
The only other really large scale change the needs making is going through and ensure names of everything are good and the structure of the tokens is nice coming out. But I'm mostly just gonna let it be a mess till I have every language feature working.
Currently you should be able to use the highlights.scm and parser in master in the helix editor, I'm not sure if the highlights work in neovim as I don't test there while I'm working.
I'll write a readme with some explanation of development and building soon
Also I'd like to see if I can improve error recovery, currently an error at the top just tends to kind of break everything below, I believe the python tree-sitter does some error recovery stuff so I'll try to investigate how that works
I suspect it's a problem of not escaping the current scope because of the whole indentation situation.
https://github.com/lukewilliamboswell/roc-json/blob/main/package/Core.roc might be a good test case
Pretty sure most of the languages features are in there
Also give https://github.com/roc-lang/examples/blob/main/examples/RecordBuilder/IDCounter.roc a go too, that has record builder in it which is a bit special
Thanks a lot @Luke Boswell, I had seen the record builder syntax and filed that away as probably the scariest bit of syntax left, and something that I'd need to collect some more examples of :sweat_smile:.
Fantastic @Eli Dowling ! I got some success integrating your grammar with Emacs. It's still WIP, but very exciting.
image.png
Hey nice! Well done :)
The work-in-progress code for the major mode is here: https://gitlab.com/tad-lispy/nixos-configuration/-/blob/main/doom-emacs/modules/lang/roc/config.el
I wonder about the granularity of the AST. Mind you I never used Tree Sitter before, so maybe I just don't know what is what. Anyway, it seems to me that there is no way to query for keyword "app", or "pacakges" or "imports", without code that "belongs to it". So I made the whole app_header
highlighted as a keyword, and then I target individual parts inside. It seems off. Maybe those words inside the header should be their own nodes? What do you think @Eli Dowling?
It's similar with if
, then
and else
.
Take a look at the highlights.scm file in the repo. The way most tree-sitter grammars I've seen do keyword queries is:
[
"app"
"If"
;;Etc
]@keyword
The grammar doesn't define a special toke for each keyword. I can't say why that is that way, maybe it's more efficient maybe it's just annoying to make a saved named token for every keyword :shrug:
I see. Thanks. I'll have to learn more about the query system.
But I'm not at all happy with the current state of the output tree. The names and shape of it is pretty ugly. But as I mentioned, for now I'm focusing on getting everything working well.
The two big things that need to change are being much more specific about what expressions can go where, and rethinking the current very flat structure. I haven't really decided, but technically roc should have a lot more nesting with backpassing. Currently we actually allow a list of side by side expressions which I've realised isn't really how roc works. Each value should be better though of containing nested expressions, value definitions and backpassing just happen to not increase indentation.
Yes! Backpassing is really interesting syntactically, because it encompasses everything below it, I guess up to the drop in indentation.
I think we have a very good starting point and we can go incrementally from here.
An else_expression
is a sibling of an if_expression
. Is that right?
(if_expression if
guard:
(infix_expression
(long_identifier_or_op
(long_identifier (identifier)))
(infix_op ==)
(long_identifier_or_op
(long_identifier (identifier))))
then
then:
(application_expression
caller:
(long_identifier_or_op
(long_identifier (module) (identifier)))
(args
(const
(string " ")))))
(else_expression else
(application_expression
caller:
(long_identifier_or_op
(long_identifier (module) (identifier)))
(args
(const
(string " "))))))
That sounds correct, though an else
can only exist within an if
, so it could be seen as a sibling of then
technically.
Cause the full syntax is something like if expression then block else block
That is the syntax for an if statement.
Hmm, so yeah, I would say sibling to then
, else
is definitely not it's own expression.
I actually just fixed that exact issue. Else is a child of if along with then
Maybe it should be (if_expression "if" (condition_block ...) "then" (then_block ...) "else" (else_block ...))
. The things in quotes are keywords that need to match literally, as discussed earlier in this stream.
I hope you don't mind me dropping those questions / suggestions. I'm slowly writing syntax highlighting code. In the process I'm learning a lot about Emacs, Tree Sitter and adjacent stuff.
I would be hesitant to recommend that. I'm super happy to help, but if you base it off how the tokens are now, you will be very disappointed when I rename them all
Go ahead ofcourse, just when I have all the syntax working correctly there will be a big pass renaming piles of tokens and adding field names and such
No, I wont be. I expect it to change and will be happy to rework my stuff. Most of the effort goes into my learning, so it's not wasted.
Excellent :)
Hopefully I can provide some feedback too. But if you don't want it, say so and I'll be quiet.
No, for sure, provide away, I'm super happy for input.
You probably know, but tree-sitter test
gives
Error in query file "highlights.scm"
Caused by:
Query error at 125:4. Invalid node type
There is ident
(I see it in app_header
-> provides
) and identifier
(everywhere else). Are they suppose to be two different node types?
yeah, the hightlights keep going out of sync because i change the names occasionally
Seems like type annotations are broken now. This:
actual : F64
actual = 2.0 + 3.0
Gives this syntax tree:
(value_declaration
(value_declaration_left
(identifier_pattern
(long_identifier (identifier))))
(ERROR : (ERROR) (ERROR) (ERROR) l)
I think I'm done for today. Once again thanks for your effort. I hope my feedback is helpful. Please don't feel any pressure because of it. Good night!
I'll take a look tomorrow :)
Okay another update.
The grammar is in a good enough state I'd say. I'm sure there is stuff missing and i can guarantee there are bugs with indentation, but it's working on most of the test files I've used.
This has been a very interesting learning experience. Mostly I've learned a ton of things not to do :sweat_smile:.
The grammar does need one more big overhaul to remove all the indentation tokens. The reasons are somewhat complex to explain but basically indentation tokens can match anywhere and so when you have two different sections of code that will diverge and resolve if you have an indentation token in the description of one variant that variant will always match so you need indentation tokens all over the place.
The elm and fsharp grammar both get away with using almost none at all. So i think a migration to an indentation-less grammar would help fix most of the edge cases and make it look way less messy.
However i fly to Indonesia tomorrow so this I won't be adding to this for a few weeks. But I'll check github in case anyone does a PR
Hope you all have fun with this :)
oh and here is the link to the repo:https://github.com/faldor20/tree-sitter-roc
super awesome, hope you enjoy Indonesia! :smiley:
I had a little time on planes and boats and such, so I added a readme for how to setup the grammar in helix and neovim.
I also knocked out a few more edge cases and rewrote a bunch of the ugly bits of the grammar.
Maybe @Tad Lispy you could PR in some Emacs instructions? I'm not very Emacs familiar.
I added a readme for how to setup the grammar in helix
Looks like I have something awesome to setup later today. Thanks for all your work!
Hey! Great job! Sorry for late response. I've been distracted past days. I'll try to contribute, but Emacs Lisp is also a learning experience for me. I guess first thing I need to do is publish a package for with a major mode for Roc. As soon as I get it, I'll provide the instructions.
I've been using it for the past few days (in Neovim) and it's working great! Thanks for your work! I've made a PR that fixes some issues that I found
I've made a package for Emacs and linked to it from your readme. Here is the PR: https://github.com/faldor20/tree-sitter-roc/pull/2
I have opened a pull request to integrate this tree-sitter support into Neovim: https://github.com/nvim-treesitter/nvim-treesitter/pull/6381
nat-418 said:
I have opened a pull request to integrate this tree-sitter support into Neovim: https://github.com/nvim-treesitter/nvim-treesitter/pull/6381
This has been accepted. If anyone notices problems please report them to the issue tracker in this repository: https://github.com/nat-418/tree-sitter-roc .
I started messing with the tree-sitter grammar because I was trying to figure out why thing weren't highlighting. One thing led to another and I spent a good chunk of yesterday changing things and I now have seven reasonably sized commits. I'd like to know if you'd like them in one large PR or a set of smaller ones.
A lot of things are uncontroversial (e.g. tagging doc comments, tagging builtins) but some are more questionable like tagging match captures as parameters so that tree-sitters local analysis highlights typos.
Here's a before and after using my theme:
Screenshot-2024-04-01-at-5.06.49-PM.png
Screenshot-2024-04-01-at-5.08.03-PM.png
Some things come from extending the highlight match tags (e.g. separate colorspace for typedefs, highlighting fn defs) so other themes would need to be tweaked for full support.
Sure make some PRs and we can discuss anything questionable :)
Yeah, smaller ones are usually nicer
Anton said:
Yeah, smaller ones are usually nicer
Hahah I hear you @Anton ;) I swear I do try to make them small :sweat_smile:
Hehe :big_smile: I get it though, if we focus too much on keeping them small there are various improvements that don't happen because you'd need to do a separate PR.
To be clear, I'm completely okay with having the PR rejected and split into smaller parts. I didn't really have a plan so branch naming isn't great and the patches being coherent are mostly a matter of massaging git history
I was hoping to get to it today actually, I just need to download your version, test it out and ensure that the highlights are as standard as they can be. Like obviously helix defines some standard highlighting types and not all themes implement them fully. My memory is that highlights are namespaced so that if a theme doesn't implement the most specific version it will be highlighted as the less specific version eg @function.recursive
or some such
It does look good though :)
Yes, that's how the selectors work.
Also, if you're checking with :tree-sitter-highlight-name
in helix you'll only get the correct name if you have it defined in the theme. Otherwise you'll get the more general rule or none at all if it's not within a major match group.
cool, thanks for the tip
nat-418 said:
nat-418 said:
I have opened a pull request to integrate this tree-sitter support into Neovim: https://github.com/nvim-treesitter/nvim-treesitter/pull/6381
This has been accepted. If anyone notices problems please report them to the issue tracker in this repository: https://github.com/nat-418/tree-sitter-roc .
This is slightly off-topic but it is related and I'd rather not make a whole new thread to mention it: I have a PR open in Vim to add a Roc filetype plugin. This will allow better integration with tree-sitter and generally set the stage for further Roc developer experience improvements in Vim and Neovim.
nat-418 said:
This is slightly off-topic but it is related and I'd rather not make a whole new thread to mention it: I have a PR open in Vim to add a Roc filetype plugin. This will allow better integration with tree-sitter and generally set the stage for further Roc developer experience improvements in Vim and Neovim.
This has been merged. Future releases of Vim and Neovim will now know what a Roc file is.
Thanks @nat-418 for pushing Roc into neovim ecosystem. After the merges I could clean up my local nvim config :)
Here is the full configuration of Roc in LazyVim distro (should be easy to tailor for other distros/plain nvim): https://github.com/jluzny/nvim/blob/main/lua/plugins/roc.lua
After my PR to nvim-ts-context-commentstring gets merged, I will submit this config to LazyVim for its next release.
Jiří Lužný said:
Thanks nat-418 for pushing Roc into neovim ecosystem. After the merges I could clean up my local nvim config :)
Here is the full configuration of Roc in LazyVim distro (should be easy to tailor for other distros/plain nvim): https://github.com/jluzny/nvim/blob/main/lua/plugins/roc.lua
After my PR to nvim-ts-context-commentstring gets merged, I will submit this config to LazyVim for its next release.
we still need to get more filetype defaults for the roc plugin. I am thinking basic indent rules, regex, etc. so the out-of-the-box vim/neovim experience is decent
Hey all, I just updated this to support the new bang syntax, if you were having issues, please grab the new parser and queries :)
@Eli Dowling very cool, I'll update the zed extension.
Here the PR in the zed extension repo, the CI build checks are all green :)
https://github.com/zed-industries/extensions/pull/925
Update PR is merged into zed extensions repo now. :)
Is it ok if I report random bugs in here? These aren't urgent or anything, but I figure it would be helpfull to collect them as I see them.
I've been using zed and occasionally notice things and could take a screenshot and include here, or maybe log an issue somewhere.
Like this - the platform Str
isn't highlighted.
Screenshot-2024-06-28-at-11.27.42.png
Yeah, no worries this is actually an issue with the queries in zeed and not the tree sitter grammar itself. I fixed it for neovim and helix but the zed queries need updating @Alf Richter could you take a look at this? Just see the last commit on the tree sitter repo.
Sweet, well I'll just drop a msg in here when I notice something.
New PR for zed extensions repo update is created:
https://github.com/zed-industries/extensions/pull/981
It’s merged in Zed now.
Sometimes the Tags aren't highlighted correctly. See this build script for an example. ErrBuildingAppStub
is highlighted in blue correctly, while ErrGeneratingGlue
is not.
Screenshot-2024-07-03-at-12.38.42.png
Tree-sitter doesn't love the new syntax for Record Builders :smiley:
Screenshot-2024-07-14-at-18.46.48.png
To be fair, it's a really new change. Just thought I'd mention it.
Also, I've noticed it doesn't like the "platform" in the pf: platform "https://github
part of the header.
Screenshot-2024-07-14-at-19.07.27.png
Luke Boswell said:
Also, I've noticed it doesn't like the "platform" in the
pf: platform "https://github
part of the header.
That seems to work fine in Emacs, so I suspect it's just a missing query in Zed, not a tree-sitter grammar issue
@Luke Boswell https://github.com/faldor20/tree-sitter-roc/pull/21
I have a PR to fix tree-sitter-roc for the new builder syntax
Hey, @Sam Mohr I saw your forked and assumed that was your plan. Let me know if you need any help, or a hand.
The PR is already set for merging IMO
I guess I'll ping you on Zulip next time
Ajai Nelson said:
Luke Boswell said:
Also, I've noticed it doesn't like the "platform" in the
pf: platform "https://github
part of the header.That seems to work fine in Emacs, so I suspect it's just a missing query in Zed, not a tree-sitter grammar issue
I agree with this, I have tested that in helix and neovim.
Oh haha, I just got home after a few days away. I'll check it out now :)
Hey @Luke Boswell I've had a look at the changes needed to the grammar to fix the tags issue.
Basically with the introduction of the ! operator the structure of roc code can now be very different and I'd have to re think a lot of the assumptions, the current grammar makes
Thank you. Look forward to hearing how you go :smiley:
Hey folks, quick status update on this. I'm seeing a lot of new syntax coming down the pipe for roc. That's awesome, but also means potentially a lot for tree sitter churn.
I've pretty much decided to not update this grammar, more than the bare minimum to keep it mostly working, until the syntax settles a bit again.
I'm definitely happy to add little patches that make new syntax not to totally bork the whole document, but for the full overhaul it needs to fix all the new edge cases I'll hold off on for a while.
Last updated: Jul 06 2025 at 12:14 UTC