The Roc compiler has no scanner? Scanning and Parsing are done in the Parser?
scanner == lexer?
And yeah, roc uses parser combinators to go from source to first level ast.
Yes sir. So that is like the Syntax-Directed Translation pattern?
I'm not familiar with that term, but it's basically just that we have a function that goes directly from bytes in the file to an Abstract Syntax Tree
there's no intermediate structure like a "token" or anything like that
although @Joshua Warner and I have talked about wanting to rewrite the parser in a way which uses the more traditional approach, for various reasons :big_smile:
the reason it works the way it does today is that when I started working on the compiler, parser combinators were the only technique I knew for parsing (from my experience using elm/parser) and Bodil Stokke wrote a post about how to do parser combinators in Rust, so I followed that to get started...and now several years later it's still how our parser is written :laughing:
Hah! My first exposure to parser combinators was Bodil’s article.
Yesterday when I was reading the roc parser, I was going like wait, I know this, I know that and this :joy::joy:
Huh, I've never heard of Syntax-Directed Translation before...
If I'm reading the wikipedia page right, SDT is a fairly extreme compiler construction method, if taken literally.
It's like the lex/yacc examples that show you how to do an expression evaluator in the yacc grammar. i.e. it doesn't build up an AST and evaluate it - it does the evaluation in the parser!
New to these stuff. Just reading around.
I could imagine stretching the definition a bit and just considering "doing extra steps in the parser" to be SDT
e.g. in GCC I think for a long time there was a simple constant evaluator done very very early in the compiler, possibly directly in the parser (I'm like 95% speculating here, don't quote me)
In Roc, one might imagine trying to do desugaring, canonicalization, name resolution, etc - all as part of the parser.
One disadvantage then is that it's hard to have a parser that can accept fragments of code and parse them into something sensible - which can be useful for things like auto-formatting code that doesn't compile yet, or better syntax highlighting, etc.
Or at least, you'd have to have two separate parsers - one for tooling and one for the production compiler.
:thinking: I wonder if the compiler could be faster if you cram everything into as few stages as possible?
Definitely harder to maintain, but maybe that could be overcome with tooling or generics
Looks like a shortcut that will come with a lotta gotcha’s down the line
Definitely need to embark on that... carefully.
I don't know that I would immediately reject it as "impossible to do well"
But it's also not obvious how one would do it well.
Last updated: Jul 05 2025 at 12:14 UTC