Whitespace. · ideas · Zulip Chat Archive

Stream: ideas

Topic: Whitespace.

Peter Marreck (Sep 12 2024 at 12:47):

I'm willing to accept significant whitespace (despite not being a Python fan), but that being said, I can't believe I've slowly become a tab person over time regarding the ancient tabs vs spaces debate, but hear me out.
There's an accessibility argument (example: https://alexandersandberg.com/articles/default-to-tabs-instead-of-spaces-for-an-accessible-first-environment/) but I also noticed a couple things:
1) one of the main arguments for spaces over the years has been blaming inadequate tooling for not handling tabs well, but that's a tooling issue (and most of it has been resolved by now, such as github's in-browser tooling etc.)
2) I was copying code from an LLM the other day (hey, it was Bash, who likes to write Bash?) and it kept defaulting to 4-space "tab stops" while I prefer 2, and there was no way to tell it my pref, and I ended up having to manually backdent each block and sub-block, and I realized that the entire problem would be nonexistent if we just used tabs
3) regarding other standard tooling, tabs -n will set your terminal tab stops every n spaces (which unfortunately defaults to 8!)- you can set that in your dotfiles, and for git diffs, in .gitconfig (assuming you use delta) you can set pager = delta --tabs=2 under [core] and difffilter = delta --tabs=2 under [interactive] (or whatever values you prefer) to get your preferred tab stops in terminal diffs.
4) code indentation can adhere to individual developer preference across all tooling (who doesn't like that?) and does not have to be forced to be the same across the entire community (some consider this an argument against it, but I don't get that... in a remote situation, how often are you staring at someone else's screen and feeling irritated about their super-wide tab stops? In presentations?)
5) I'm no Go fan, but credit where it is due, gofmt inspired a lot of other formatters, and Go seems to have survived on a strictly-tab diet just fine.
6) Semantically, using actual tabs for "a tab" is simply the correct thing to do. We like correctness, no?

At least consider a whitespace-agnostic position on this with regards to the compiler while things are not set in stone.

Richard Feldman (Sep 12 2024 at 13:05):

we've talked about this in the past, and I think a design like this could be an improvement over status quo:

\n followed by 1+ tab characters is an indentation
tab characters anywhere else are a syntax error
\n followed by 1+ space characters is a syntax error
space characters anywhere else are fine

Richard Feldman (Sep 12 2024 at 13:06):

so in other words, tabs are for indentation, and they are only for indentation, and only tabs are for indentation

Richard Feldman (Sep 12 2024 at 13:06):

this would be a big parser change, but it seems like it would be a positive one

Peter Marreck (Sep 12 2024 at 13:12):

That would be perfect IMHO. I've been experimenting with using tabs (after using spaces, like most everyone else, for years) and I'm frankly liking it. The tooling (perhaps thanks to Go) is just there, now.
I guess the only issue with disallowing tabs anywhere else is code formatted in a tabular (ahem) style for self-documentation reasons mainly, but that would assume specific tab stops that we don't store anywhere and could screw up assumptions and thus formatting (and in a monospaced font context can be spaces anyway), which is why you're probably suggesting it only be used for initial indentation.
I like it! /shrug

Kevin Gillette (Sep 12 2024 at 14:12):

Does that mean that space characters following a tab are fine?

Richard Feldman (Sep 12 2024 at 14:43):

no, I should have phrased that differently :big_smile:

Richard Feldman (Sep 12 2024 at 14:44):

"tabs are only for indentation and only tabs are for indentation" might be the more concise way to say the rules

Richard Feldman (Sep 12 2024 at 14:44):

so \n followed by tabs followed by a space would be an error

Richard Feldman (Sep 12 2024 at 14:44):

bc space can't go there

Joshua Warner (Sep 14 2024 at 17:28):

Especially at first, the parser needs to accept both in order to facilitate auto-upgrading things via formatting

Joshua Warner (Sep 14 2024 at 17:30):

The way this works in python is indentation is tracked as the tuple (num_spaces, num_tabs). Indented lines must have exactly one of those numbers be greater, relative to the parent scope, and when dedenting again, we must go back to a tuple value exactly equal to one of the parent scopes.

Joshua Warner (Sep 14 2024 at 17:31):

This is actually the way I have things implemented in a couple of my experiments with changing the parser to tokenize things first - in which case this transformation is handled in the tokenizer.

Joshua Warner (Sep 14 2024 at 17:32):

(these experiments are not ready for prime time!)

Trevor Settles (Sep 14 2024 at 20:08):

I don't remember if roc has multi-line strings, but that could add a possible extra little complication. There's been times, in other programming languages, where I've created a game board ascii art style. In those situations, its common to have spaces at the start of an indented line.

Roc has other ways around this, for example importing a text file as a string, though.

Richard Feldman (Sep 14 2024 at 21:17):

oh good point

Richard Feldman (Sep 14 2024 at 21:19):

I guess we'd need to allow tabs followed by spaces specifically in multiline strings

Richard Feldman (Sep 14 2024 at 21:19):

in that case the tabs are still unambiguously for indentation though

Aurélien Geron (Sep 14 2024 at 21:30):

It's not often I change my mind so fast: I was rather strongly anti-tabs, and this great discussion made me pro-tabs. :+1:

Kasper Møller Andersen (Sep 15 2024 at 05:53):

Can the formatter detect spaces used for indentation and replace them with tabs in this case?

Writing Elm, it's at least a fairly regular thing that elm-format has been stumped by the indentation, typically if my IDE has inserted some code at some indentation level that was already "taken". In that case, I have to manually disambiguate the indentation, but if it's a case where the next indentation levels are already taken as well, then me indenting the new code with a tab will just indent the code to a new level that's also taken, and I'll have to disambiguate more parts of the code. Whereas if I can use a space to disambiguate, the formatter can tell where my new code fits in the hierarchy between the existing tabs, and make everything line up after that.

Basically, I don't care about spaces or tabs, as long as it gets out of my way :big_smile:

Joshua Warner (Sep 15 2024 at 17:03):

I would want to try reasonably hard to accept such mis-indented input

Joshua Warner (Sep 15 2024 at 17:05):

One possible way to do that fairly cheaply would be to try parsing the input unmodified, and if we hit an error that seems to be indicative of misapplied mixed tabs+spaces, we could convert tabs to spaces (trying 4 sp and 8 sp tabs), and see if either of those results in a successful parse.

Joshua Warner (Sep 15 2024 at 17:06):

That does have some potential caveats, in particular the fact that we'd no longer be using the unmodified input, and so we'd either need to accept that error offsets may be incorrect, or put some effort into mapping the offsets back to the original file

Peter Marreck (Sep 19 2024 at 01:48):

I once wrote a spaces-to-tabs formatter (mainly in Bash and Awk) and it used a bit of statistical analysis to figure out what the most common indentations were in spaces in that document, and what their greatest common factor was, in order to deduce the number of spaces per tab.
It then replaced those leading spaces (as a multiple of the number of spaces per tab deduced from the content) with tabs. If there were any spaces left over afterwards, it left them in after the tabs and left them alone.
This seemed to work well with few downsides. Any hypothetical roc format formatter might do this as a first stab at the problem, and the parser would only have to look at the number of leading tabs to determine significant indentation; any spaces after that would be assumed to be "visual adjustment" for things like the aforementioned ASCII art gameboard data and would be considered "insignificant indentation".
You could even test this out without making anyone change their codebases by installing an extra pass before the parser that transformed the code in this way before parsing it (overhead aside), in case you wanted an "overlap period" during which both indentations would be acceptable, prior to deprecation. (The downside to that is that errors might end up being confusing.)

Last updated: Jul 23 2026 at 13:15 UTC