Tree-sitter grammar and parsing question · contributing

Stream: contributing

Topic: Tree-sitter grammar and parsing question

kris (May 26 2025 at 13:19):

Hello! I am looking to write a tree-sitter grammar, and corresponding emacs-ts-mode for the 0.1 syntax. I'm using the snapshots in roc/src/snapshots to validate my grammar.

However, either i misunderstand something, or i've found an error.

In 'app_header__nonempty-multiline.txt' the ~~~PARSE section contains the record-field "pf" twice, once freestanding, and once as part of the 'packages'. What is going on there?

kris (May 26 2025 at 13:19):

https://github.com/roc-lang/roc/blob/main/src/snapshots/app_header__nonempty_multiline__commented.txt#L27

Anthony Bullard (May 26 2025 at 14:09):

i think i forgot to filter the platform out of the record fields in the IR SExpr representation

Anthony Bullard (May 26 2025 at 14:10):

the platform is just an index in the package record fields so that we make sure to format it first

Anthony Bullard (May 26 2025 at 14:10):

and can identify it in Can

kris (May 26 2025 at 14:37):

That makes sense, i'll have a look and see if i can fix it up locally. Still new to zig, but excited about both languages!

My initial approach is to parse the tree-sitter output and the snapshot output and compare for equality in the grammar test suite, how realistic that is i do not yet know!

For example, i would have assumed that a record_field would be part of a record, and not just listed flatly in the parent sexp.

Is this intentional, or is the sexp representation not supposed to include everything?

Anthony Bullard (May 26 2025 at 14:57):

i think ideally in the SExpr it would just be an index in the fields array

kris (May 26 2025 at 15:06):

I have way too little context to know what you're refering to here.

If i understand correctly, the 'packages' is not a regular record, but a special record that accepts a keyword 'platform', and the index of this kv-pair is extra information that needs to be kept in App?

Is this not mixing up domains though?

From my naive perspective, the parser Sexp form should map cleanly to the tree sitter parse output, and anything more complex, like keeping track of indicies of special variables should be on another level.

In this case, maybe differing between record_fileld and record_platform_field or the like?

Anthony Bullard (May 26 2025 at 15:08):

it was never the intention that the Parser IR would somehow match the tree sitter grammar

Anthony Bullard (May 26 2025 at 15:10):

Parser IR is an Abstraxt syntax tree for parsing into a shape that benefits semantic analysis. Tree Sitter produces a Concrete syntax tree whose sole goal is to identify regions of text for highlighting and providing information for queries

Anthony Bullard (May 26 2025 at 15:10):

obviously there is a lot that a ts grammar author could learn by reading and understanding the Parse IR

kris (May 26 2025 at 15:12):

Ok, i'll update my expectations accordingly!

I'm happy to be able to re-use the snapshots for this** in any case, i'll just have to massage them a bit more!

Anthony Bullard (May 26 2025 at 15:26):

yep and feel free to reach out with any questions!

Last updated: Jul 26 2025 at 12:14 UTC

Stream: contributing

Topic: Tree-sitter grammar and parsing question

kris (May 26 2025 at 13:19):

kris (May 26 2025 at 13:19):

Anton (May 26 2025 at 14:00):

Anthony Bullard (May 26 2025 at 14:09):

Anthony Bullard (May 26 2025 at 14:10):

Anthony Bullard (May 26 2025 at 14:10):

kris (May 26 2025 at 14:37):

Anthony Bullard (May 26 2025 at 14:57):

kris (May 26 2025 at 15:06):

Anthony Bullard (May 26 2025 at 15:08):

Anthony Bullard (May 26 2025 at 15:10):

Anthony Bullard (May 26 2025 at 15:10):

kris (May 26 2025 at 15:12):

Anthony Bullard (May 26 2025 at 15:26):