I'd like a second opinion on the decision to no longer format malformed input in snapshots.
On one hand we will have less noise in our snapshots as we fix things, on the other we lose the confidence that things are formatting stable right?
@Joshua Warner do you have any thoughts?
I'd like to format malformed input because I want to make sure it's not discarding information
I think for now we should just bail and refuse to format malformed input. As a second phase, we can format the top level defs that are not malformed.
In my experience, the general problem of formatting malformed input is just a really hard problem.
There are just too many ways the input can be malformed, and it’s really hard to make sure that our rendition of it doesn’t interact strangely with some property of the rest of the input (and make things worse)
(this was a big source of fuzzing problems in the old compiler, and ultimately I’m not sure if it was worth it)
I think it's fine to refuse to format malformed nodes, but what I want to avoid is refusing to format the entire file if any node is malformed
Agree, but refusing to format at all is better than having a buggy formatter in the presence of malformed nodes - and that’s likely unless we have a very concerted effort.
I don't understand what's buggy about just like - take the Region
of the malformed node and literally @memcpy
those bytes into the output buffer and continue
like don't try to fix the indentation or anything like that, just if it's malformed we copy it over as-is, and it probably looks totally wrong, but that's fine because it's malformed
and maybe it messes up the stuff around it, and that's fine too because it's malformed
the important part being that if you have a malformed node somewhere earlier in your source code - e.g. the classic culprit of <<<<<<<<<
from a merge conflict marker - it doesn't destroy your ability to have a nice experience editing the rest of your file
the blast radius of brokenness is isolated to just the area right around the malformed node
maybe I'm missing something about why that would be error-prone but it sounds very striaghtforward to me! :smile:
With PNC I think the potential malformed weirdness is generally more bounded than before PNC (and braces). The general shape of the problem is that you can have something that the format changes immediately outside of the malformed block that causes the malformed block itself to be parted differently, perhaps as a different malformed thing or even well formed now - pushing the malformed section elsewhere.
The git merge markers are a great example for another reason: if we format those badly, we might invalidate them and confuse other tools ( like editors)
For example, it would be bad if one of those got indented or the new line before/after it removed.
It’s a bit like undefined behavior: anything can be going on in the malformed node, and there’s no guarantee that after formatting that the “undefined” behavior will be still contained to that same node.
Obviously, the behavior stays the same if literally nothing else about the code changes, but if that’s the case, there’s no point in formatting anyway
19 messages were moved here from #contributing > Pull Request for Review by Luke Boswell.
One thing I intended for the snapshots was to also validate behaviour for unhappy paths like <<<<<
merge markers. That was what prompted me to ask this question, because I feel like it is helpful to see the formatter behaviour in our snapshots.
How we do that I'm not sure, but it sounds like either way we don't want a "MALFORMED INPUT" @JRI98... is that right?
yeah, fwiw I have a branch that's changing a bunch of stuff at once, so I'll roll this into that
Now that I think of it, I shouldn't have sent that change in the same PR. But it came after this short discussion
My rationale is that in the normal formatting flow, outside of snapshots, the current behavior is to bail out when there are parser errors. So, the FORMATTED
section of a snapshot being broken doesn't mean that the same will happen when formatting a file via de CLI. Now, if the behavior of the CLI would change, that would be different, but as it is right now, my change would just be mimicking the formatting behavior in other places.
Last updated: Sep 08 2025 at 12:16 UTC