Stream: contributing

Topic: formatting malformed nodes


view this post on Zulip Luke Boswell (Aug 29 2025 at 00:33):

I'd like a second opinion on the decision to no longer format malformed input in snapshots.

On one hand we will have less noise in our snapshots as we fix things, on the other we lose the confidence that things are formatting stable right?

@Joshua Warner do you have any thoughts?

view this post on Zulip Richard Feldman (Aug 29 2025 at 00:34):

I'd like to format malformed input because I want to make sure it's not discarding information

view this post on Zulip Joshua Warner (Aug 29 2025 at 00:58):

I think for now we should just bail and refuse to format malformed input. As a second phase, we can format the top level defs that are not malformed.

view this post on Zulip Joshua Warner (Aug 29 2025 at 00:59):

In my experience, the general problem of formatting malformed input is just a really hard problem.

view this post on Zulip Joshua Warner (Aug 29 2025 at 01:00):

There are just too many ways the input can be malformed, and it’s really hard to make sure that our rendition of it doesn’t interact strangely with some property of the rest of the input (and make things worse)

view this post on Zulip Joshua Warner (Aug 29 2025 at 01:51):

(this was a big source of fuzzing problems in the old compiler, and ultimately I’m not sure if it was worth it)

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:06):

I think it's fine to refuse to format malformed nodes, but what I want to avoid is refusing to format the entire file if any node is malformed

view this post on Zulip Joshua Warner (Aug 29 2025 at 02:11):

Agree, but refusing to format at all is better than having a buggy formatter in the presence of malformed nodes - and that’s likely unless we have a very concerted effort.

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:12):

I don't understand what's buggy about just like - take the Region of the malformed node and literally @memcpy those bytes into the output buffer and continue

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:12):

like don't try to fix the indentation or anything like that, just if it's malformed we copy it over as-is, and it probably looks totally wrong, but that's fine because it's malformed

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:13):

and maybe it messes up the stuff around it, and that's fine too because it's malformed

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:15):

the important part being that if you have a malformed node somewhere earlier in your source code - e.g. the classic culprit of <<<<<<<<< from a merge conflict marker - it doesn't destroy your ability to have a nice experience editing the rest of your file

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:16):

the blast radius of brokenness is isolated to just the area right around the malformed node

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:16):

maybe I'm missing something about why that would be error-prone but it sounds very striaghtforward to me! :smile:

view this post on Zulip Joshua Warner (Aug 29 2025 at 02:23):

With PNC I think the potential malformed weirdness is generally more bounded than before PNC (and braces). The general shape of the problem is that you can have something that the format changes immediately outside of the malformed block that causes the malformed block itself to be parted differently, perhaps as a different malformed thing or even well formed now - pushing the malformed section elsewhere.

view this post on Zulip Joshua Warner (Aug 29 2025 at 02:24):

The git merge markers are a great example for another reason: if we format those badly, we might invalidate them and confuse other tools ( like editors)

view this post on Zulip Joshua Warner (Aug 29 2025 at 02:25):

For example, it would be bad if one of those got indented or the new line before/after it removed.

view this post on Zulip Joshua Warner (Aug 29 2025 at 02:27):

It’s a bit like undefined behavior: anything can be going on in the malformed node, and there’s no guarantee that after formatting that the “undefined” behavior will be still contained to that same node.

view this post on Zulip Joshua Warner (Aug 29 2025 at 02:28):

Obviously, the behavior stays the same if literally nothing else about the code changes, but if that’s the case, there’s no point in formatting anyway

view this post on Zulip Notification Bot (Aug 29 2025 at 02:29):

19 messages were moved here from #contributing > Pull Request for Review by Luke Boswell.

view this post on Zulip Luke Boswell (Aug 29 2025 at 02:33):

One thing I intended for the snapshots was to also validate behaviour for unhappy paths like <<<<< merge markers. That was what prompted me to ask this question, because I feel like it is helpful to see the formatter behaviour in our snapshots.

How we do that I'm not sure, but it sounds like either way we don't want a "MALFORMED INPUT" @JRI98... is that right?

view this post on Zulip Richard Feldman (Aug 29 2025 at 02:37):

yeah, fwiw I have a branch that's changing a bunch of stuff at once, so I'll roll this into that

view this post on Zulip JRI98 (Aug 29 2025 at 06:43):

Now that I think of it, I shouldn't have sent that change in the same PR. But it came after this short discussion #compiler development > zig-compiler - Formatter and style @ 💬
My rationale is that in the normal formatting flow, outside of snapshots, the current behavior is to bail out when there are parser errors. So, the FORMATTED section of a snapshot being broken doesn't mean that the same will happen when formatting a file via de CLI. Now, if the behavior of the CLI would change, that would be different, but as it is right now, my change would just be mimicking the formatting behavior in other places.


Last updated: Sep 08 2025 at 12:16 UTC