in the formatter we take care to make multiline things be multiline and single-line things stay single-line. I like this behavior and want to keep it; it's something I always miss with tools like rustfmt that make all newline decisions for me
but I don't like how the Rust compiler represents newline info using parse IR nodes
it didn't turn out to be nice to work with
I think a simpler design would be to scan source ranges for newlines
My thinking has always been to not have newlines/comments as part of the AST (and maybe not even in the token stream, now).
The formatter looks at the AST, sees a token id for the thing it's trying to format, then goes and looks in the source for the newlines/comments (if any) that came between that token and the one before.
sounds good!
also I think we can do the same in the parser for checking to see if tokens have a whitespace gap between them or not
My thinking is to have the parser never need to look at the underlying source
and that strategy works with source ranges being either in line/col or start byte/length
yeah it shouldn't need to - the ranges should be enough
Ahh I see - comparing the end byte of the last token to the start byte of this token
like if I see a ?
token and the preceding token has a source range that ends right in front of it
:thinking:
then we know there's no gap
My current approach has been to explicitly put that data into the token stream, where it's needed
and we don't even need to keep around all token source ranges to do that
just the previous one and the current one
So e.g. there's OpenRound and NoSpaceOpenRound
ah so lookahead 1 byte?
For that case we do lookbehind, but same idea
gotcha, that works too!
do you think the formatter can get away with just looking at tokens?
I'd assume it would need parse IR but maybe not
We're a lot closer with braces syntax than without
Ultimately I think there are going to be cases that are annoying to handle without looking at the parse IR
(which, side node, it's an AST; let's just call it that - we have too many IRs for IR to mean something useful)
Anyway
It would be interesting to _explore_ whether the formatter could look just at the token stream. I'm betting there will be cases that make that difficult, where we'd essentially be re-implementing significant parts of the parser to run during formatting.
Joshua Warner said:
It would be interesting to _explore_ whether the formatter could look just at the token stream. I'm betting there will be cases that make that difficult, where we'd essentially be re-implementing significant parts of the parser to run during formatting.
This is my actual plan...
for what it's worth, I think we can simplify how the formatter thinks about newlines to just:
I think this because of the "trailing commas mean render multiline, and no trailing comma means don't" design in conjunction with parens-and-commas
if we want a multiline function application, we can do a trailing comma to indicate multiline mode (and unlike whitespace application, the case of "have the first arg on the same line but everything else is on a different line" would look weird and I don't think should be supported anymore)
this makes me wonder if we should try just having the formatter manage blank lines automatically
like it just has a rule for where they do and don't go, and it puts them in accordingly
it might be annoying (I'm not sure), but at that point the only user-configurable aspect of whitespace is "does this comma-separated thing render in single-line or multiline mode?" and that is determined entirely by whether it has a trailing comma or not
kinda seems worth trying to me, just to see how it feels in practice?
Richard Feldman said:
this makes me wonder if we should try just having the formatter manage blank lines automatically
I feel very strongly about my blank lines and use them significantly more than the average dev :p
I also feel as Anton does, and I believe so does @Luke Boswell
Though there are some places that I think we should consider removing them that I mentioned to Luke but never firmed up enough to make a discussion about
The main one being newlines between a functions args and the first line of its body
Which now could be simplified to the rule "no blank newlines between an opening curly brace and its body's first line"
And maybe "always add a newline above the return expression of a function unless the body has only one expression"
But that one is tricky, like what about if your function ends with some Stdout.line
calls, those aren't normal "I'm returning something useful" lines
I feel like if you have more then 2 blank lines they definitely should be colapsed
If you really want them....add comments
I totally see the general argument for 2 blank lines between important things for visual separation
That said, I think the formatter should manage blank lines. (Again, comments are the way to complete freedom)
If I were to write the rule it would probably 1 or 2 blank lines allowed between top levels.
1 or 0 blank lines between other things.
No blank lines after open brackets or before closing brackets.
Hmm...though for top level single line expression zero blank lines can be nice (like a block of constants)
What Richard described was basically what I was going to do
Last updated: Jul 06 2025 at 12:14 UTC