I have spent a lot of time on making this snapshot format in a stable fashion:
8
("""""""")f:C
U
Which should format to:
8
(
"""
"""
"")
f : C
U
But on reformat ALWAYS becomes:
8
(
"""
"""
"")
f : C
U
See that extra newline? The reason for it is complicated and has to do with ParensAround not existing in Patterns and the fact that we first parse annotation headers as an expr and translate to Pattern. But it really only impacts this case.
But what is the value of having a snapshot that tests the formatting behavior of a illegal pattern? Should we have some way to say "This thing we parsed actually doesn't make any sense, so we expect the formatter to fail here and shouldn't test this?" I think such a change would largely impact the fuzzer since it's the thing that introduced these snapshots in the first place.
My large point is we have a lot of illegal or invalid Roc syntax in test_syntax snapshots. I think there's value in us being able to parse them, but I think we should be more aggressive in making these not part of the Parse->Format->Reformat cycle - perhaps by making them Malformed sooner?
Lastly, I really think that having test_syntax actually document invariants of what is actual, valid Roc syntax is just so much higher value.
And if we can't find a way to turn a fuzzer failure into a real, valid piece of Roc syntax - we should be doing something in either the fuzzer or the parser to ensure that it is marked as Malformed and therefore the fuzzer will discard such an input in the future without generating noise.
I'd like to see:
(
"""
"""
"")
f : C
Be a TypeAnnotation(TypeHeader(Apply(Malformed, Ident("f"))), Tag("C")) and then have fuzzer bail out at that point
@Joshua Warner @Sam Mohr I think you will have the most thoughts on this.
I guess I'll end my mini-rant on a positive note and give a potential vision of what these snapshots could be:
A visual guide to valid Roc syntax - from the simplest constructs to the most complex. Showing how it can be written quickly, and how it will always look when formatted. Documenting both the syntax of the language, as well as the formatter's style.
That will then help us identify clearly what the style is and the principles we use when maintaining and extending it with new syntax.
And just for a laugh, here's the same example with PNC migration if we did that
8
"""
"""(
"",
)(
f,
) : C
U
And with collapsed whitespace in the PNC applys:
8
"""
"""("")(f) : C
U
Just one more thing before i go to work (and I may move this to #ideas ), should this really just be a md document(or documents) with code blocks appropriately annotated? Then it could really be what I envision above - a programmatically checked guide to valid Roc syntax and the canonical formatted style.
programmatically checked guide to valid Roc syntax
That sure seems useful for the tree sitter parser and similar tools
I think so, and for people learning Roc
We could even tag the document sections and find a way to link Syntax problems with the relevant section(s) and output it with the report
Someone does something crazy like the above and they get a nice syntax error report like:
Syntax Error @ main.roc 12:2-12:8 -----------------------------------------
("""""""") f: C
^----- | The problem is here
It looks like you are trying to perform function application on a string literal, but that
is not valid Roc. Here's some tips:
Usually, you would apply an Identifier or a Tag, like this:
func(arg)
# Or
Tag(arg)
Both of these would format exactly the same as above.
To get more tips on syntax for function application, use `roc syntax apply`.
--------------------------------------------------------------------------------------
Where roc syntax
could be a new subcommand in the CLI to allow the user to browse or search the syntax guide
And since this is checked in CI on every commit - this guide would always be correct for that version of the compiler.
We generate an llms.txt
file and provide it in the tutorial here
I think a generated file would suffice here, something that's reasonably legible for humans and definitely for computers
What I'm talking about it a systematic inventory of valid syntax that can be tested and verified and can replace test_fmt at least the _vast_ majority of snapshots. (They would act as snapshots)
Oh, in place of snapshots.
Yes
And since they would be user-facing inside of documentation, there would be context around each example, they would be valid Roc code, and not a bunch of randomly-generated non-sense
I agree that the snapshots are moving further from a set of unit test-like valid examples of Roc code and more like Eldritch horrors that get cleaned up and saved to keep us from crashing when we see them
And this is NOT saying that the fuzzer does not have value. But in it's current way of being used it's painful
I'd like to see the fuzzer move to a generator/property-based test
I'm thinking of the Rust reference: https://doc.rust-lang.org/reference/introduction.html
(I know the fuzzer technically is, but I mean generated from a specification, not from a small corpus and otherwise random text)
I think that Rust reference is a good place to start. But I'd really like to show both the canonical form of each bit of syntax as well as the "most terrible way to type this and it still parse right"
And that can be our version of *_formats_to
/*_formats_same
That is a noble goal, but I don't know how achievable it is to make something that is a good reference resource AND good for testing
Unless it's not actually your goal to do both at once
I have a design that would make it possible to do both at once. It's called the documentation will link in the code samples that are valid into the reference
And if we do make a CLI subcommand for it you could have a --extended flag or something and see everything that matches the search term
I feel like we could just delete any snapshots that are not helpful when making a change. They're not sacred or anything. Is this the core of your issue? trying to save or fix snapshots that are super random and strange.
We've been on a mission to get fuzz clean.... parsing and canonicalisation of all the things and not crashing.
Returning Malformed for something really strange sounds like a good strategy to me.
I'm concerned about changing the current setup dramatically, Josh has used it to good effect finding and smashing a lot of bugs.
Yeah I just want a fuzzer thats only crashing on legit bugs
And not crazy ass syntax it got by throwing paint on the wall
So if we just make things malformed earlier (and / or canonicalize them) I think it would be better
Agree a lot of them are on the funky side. Not sure I agree they aren't legitimate bugs.
I strongly value _reliable_ software
I want to provide a 100% guarantee that using the formatter is "safe" - i.e. it won't change the meaning of your code or change again once formatted again, etc.
I will say, not that we'll feel the benefit for the next month or so, but the roc_can
rewrite aims to never crash for this stuff. The parser might crash, but there will be literally zero unwraps or expects in the new canonicalization code
So if we're putting a lot of effort into fixing current roc_can
, that may not be necessary
Ah good to know
Do you have more detail on this roc_can rewrite? What's the goal/scope/etc?
(maybe in another thread...)
Sure
I can also probably outline this at the next meetup
I'll make another thread for now
Anyway to finish my earlier thought, I want to provide that 100% guarantee, but I'd be open to alternative ways of accomplishing that
For example, we could do things like detect some of these more niche cases and just refuse to format in that case (maybe that's what you're getting at)
Ideally, that only introduces a "local" problem, so if you have one tiny problem in a giant file, most of the file can still be formatted properly, and it's only the top-level def with the problem that is copied verbatim from the input
Roc needs to be 100% reliable
Mental security is like, the whole point of this language
Yes it will be safe and only introduce a local issue where the illegal syntax does not get formatted
Cool, makes sense
This is syntax that will NOT be accepted by later stages of the compiler anywya
Yeah, that's true
Like trying to apply a string literal :joy:
I want all valid roc syntax to be Roc solid ducks
hey man, like, strings are functions too
Just because the minimal example that currently hits this case is silly, doesn't mean all such examples that hit this case are silly
The term "stringly-typed" should not need to exist
Maybe but I’d like to focus on the actual examples that are
I've thought about an April fools joke announcement of like introducing truthiness or unchecked null
or something like that
A fuzzer bug should be able to be coerced into a real working code sample and still reproduce
In my experience, that quickly devolves into either:
Yes, #2 but only bailing out at format
That could be done in format itself
For your example with multiline strings for example, I 100% agree applying a function like that is not valid - but take this as an example then:
"""abc""".foo(1)(2)
... where foo is a curried function of some kind
I think that'll end up hitting similar problems
Now or when we have static dispatch?
That example is obviously using static dispatch
Anyway, my point is that I've found it's better to just give in and fix the problem rather than avoiding it
Avoiding it completely ends up with very complicated conditions, or very "blunt" / annoying conditions
I just don’t think it’s a problem. It’s invalid syntax, no?
Maybe I’m just being dull
No, it's perfectly valid syntax
Sorry not what you just put
The motivating example above
Ahh
Yeah
In that particular case it is kinda-but-not-really invalid right now
That’s what I’m talking about
That'll bail early in can
I actually have a PR locally to refactor a bit and make that malformed
syntax, which right now the fuzzer won't try to assert formatting conditions on
Yes and if it bails early in can, I think we can kind of punt on it in formatting
I think we can kind of punt on it in formatting
Disagree
Joshua Warner said:
I actually have a PR locally to refactor a bit and make that
malformed
syntax, which right now the fuzzer won't try to assert formatting conditions on
This is exactly what I’m advocating for
Ok cool
(deleted)
Somehow we are disagreeing and agreeing at the same time. It’s probably my poor communication
Haha np. Takes two to (mis)communicate
To be clear the fuzzer is an awesome tool.
I've been pushing hard on the angle of "just make it work", since I've been seeing progress there recently
I just think that we need tests that give context on what they are testing, why we care, and what we want things to look like
Like 2-ish years ago I ran into a period where I got very frustrated with that approach and basically gave up for a while
I’ve read more gobbledygook fuzzer Roc than real Roc the past two weeks and I think I have PTSD
Fo real
100% on board with taking tests and changing them to make them more realistic, so long as they're still covering the same conditions
(And totally fine if that means they are now marked as 'malformed')
That makes me happy
And then we can use the best of the best in the syntax reference I’m talking about (which could be very selective and part of the tutorial)
I'm extremely excited to see all the progress on fixing these things the fuzzer is turning up, because compiler bugs are one of the biggest things holding Roc back from reaching its potential
and fizzers that run for a long time without turning up anything give me way more confidence than anything like what we've ever had in the past!
so I really appreciate all your efforts on wading through the gibberish to get us there! :hearts:
FWIW I don't think the fuzzer is covering any of the really "interesting" parts of the compiler yet (say, the solver) - where I'd define "interesting" as "users often hitting compiler crashes in this area"
But that would be my eventual goal here
Baby steps
I think my open PR merging spaces within spaces will help with some fuzzer crashes
There are some peculiarities of roc syntax that make it particularly hard to parse+format consistently
For example, multiline strings very often cause problems if they're used outside of very specific situations
Like, they're fine if you're just assigning that to a local, but if you try to do anything else with them, that requires a lot of persnickety condition checking in the formatter
when
branches are also tough
With backpassing gone, function types are [almost] the last instance where we have "naked" parens inside a syntax element (i.e. where there's not a starting + finishing delimiter to branch on, so we either have to do excessive backtracking or we have to de-normalize the function type parser in the context of tuple types and tag unions)
The other case I believe just being comma-separated where
clauses
If where ...
is the last place, is there a way to change how they look to make that not the case?
Function types are still causing problems, so where ...
is definitely not the last place, but anyway...
The solution for function types would be to have some sort of "introduction" delimiter
e.g. could prefix them with \
or fn
Or use PNC for types right?
Yeah, would #ideas > Using parens for types help?
I don't think PNC helps with types
Why not? That makes all type expressions bounded
PNC for types only changes type application, e.g. List(foo)
instead of List foo
. That's not the issue here.
No, it also means () around params
Ahhh yes that would help
Sorry I misunderstood
No worries
I think Sam brought that up this morning or last night
e.g. (Str, Str) -> Str
instead of Str, Str -> Str
I suggested that because it would make parsing code for devs and the compiler all very consistent
And would slot in well with zero-arg functions
Oh yeah that works nicely
Another point for Sam!
For where ...
, I think the solution would look something like allowing parens around the ...
part, and furthermore _requiring_ cases where there are multiple implements clauses to use that parens syntax, at least if it's in a context where ,
would separate elements (e.g. in a tuple type)
Love you too
That would almost never come up in practice, so probably not much of an actual change
Or could just disallow where
except at the top level
That seems even better actually. Not sure why you'd ever want (List a where a implements Foo, List b where b implements Foo)
instead of just (List a, List b) where a implements Foo, b implements Foo
The latter is my current thought for what Roc's type syntax would be. That's not a problem, right?
Technically speaking I guess there are very niche cases where that could come up, if there's a list at a higher level
e.g. a tuple of expressions, where one of the expressions is a Defs node with a type annotation
Distinguishing whether that comma means we should parse the next implements
clause, or go up and parse the next top-level expr in the tuple is non-trivial
Wouldn’t that be bounded by the arrow?
There's not necessarily any arrow after
I think we should have implements at the tail end of an annotation always
(
a = 1
b = 2
foo: List a where a implements Foo,
bar
)
That's not fully valid syntax, but at the point where we see bar
, we don't know that yet
And in particular we don't know whether we should start parsing bar
as a type (to be followed by implements
or an expr (i.e. the next element of the tuple).
Actually I take that back
So long as we require a final expr in a Defs, this is fine
That’s interesting, I hope we do
That feels like an increasingly fragile condition with statements tho
Yeah
I would actually like to not require that, syntactically
(and only validate that in can
)
I think if you have more than a single implements, you must have parens
Here's that PR to introduce a proper TypeVar type (used in TypeHeader), and mark anything that's not a lowercase ident as Malformed in the AST. (Such things would already generate can
errors) https://github.com/roc-lang/roc/pull/7511
I can review that at lunch
@Anthony Bullard I hit approve... but feel free to also review
Last updated: Jul 06 2025 at 12:14 UTC