Design: Indents and Blocks · ideas

Hi! I've been thinking about how we could make some small adjustments to how roc interprets newlines and indentation, to make the syntax more consistent, understandable, and easier to parse.

Pending some questions at the bottom, I believe this should be a no-op change for almost all code in the wild. I expect few if any people will need to change code to get it to continue to parse and mean exactly the same thing it did before.

Richard Feldman (Feb 23 2023 at 02:48):

this was the rule in CoffeeScript and it seemed to work out fine from what I can remember

Richard Feldman (Feb 23 2023 at 02:49):

I've historically shied away from it because it makes it more obvious that the language is One Of Those Indentation-Sensitive Languages, but maybe we should just embrace that even though it doesn't (currently) come up as often as it does in, say, Python or CoffeeScript

Richard Feldman (Feb 23 2023 at 02:50):

like I don't think people lump Haskell or Elm in with Python and CoffeeScript even though they have just as much indentation-sensitivity as Roc does today

Richard Feldman (Feb 23 2023 at 02:51):

although I guess defs in Roc not having let and in as delimiters does make the indentation considerations mentioned in the writeup more prominent

Richard Feldman (Feb 23 2023 at 03:00):

Joshua Warner (Feb 23 2023 at 03:02):

Joshua Warner (Feb 23 2023 at 03:03):

What I'm suggesting is that we enforce that constraint (effectively, that you can't end a block in a statement nor have an expression in the middle of a block) in canonicalization rather than enforcing it in the parser.

Richard Feldman (Feb 23 2023 at 03:05):

Joshua Warner (Feb 23 2023 at 03:06):

Richard Feldman (Feb 23 2023 at 03:06):

Joshua Warner (Feb 23 2023 at 03:07):

Richard Feldman (Feb 23 2023 at 03:07):

I vaguely remember thinking there was an interesting potential design for something if we had the "indentation is required to continue an expression or statement" design, but I don't recall exactly what it was at the moment :thinking:

Joshua Warner (Feb 23 2023 at 03:08):

Joshua Warner (Feb 23 2023 at 03:10):

... and, even more confusingly, what does this look like in terms of where the ',' is required to be, separating the values?

Joshua Warner (Feb 23 2023 at 03:11):

One option would be to allow statements inside parens - but to specifically require that tuples must be on a single line.

Richard Feldman (Feb 23 2023 at 03:12):

from a teaching perspective I think it's better if we allow them, in the sense that otherwise the rules have to state "...except if the expression has parens around it, in which case you can't, also inside certain collection literals it's not allowed either"

Richard Feldman (Feb 23 2023 at 03:13):

Joshua Warner (Feb 23 2023 at 03:14):

That throws a bit of a wrench in the works of the idea to do indent checking purely in the lexer

Richard Feldman (Feb 23 2023 at 03:15):

Richard Feldman (Feb 23 2023 at 03:16):

Joshua Warner (Feb 23 2023 at 03:16):

The question is - how does the lexer know that it should emit an INDENT token and check the indentation of the following statements/expressions as part of a block.

Richard Feldman (Feb 23 2023 at 03:17):

Richard Feldman (Feb 23 2023 at 03:18):

I assumed the idea is that it just produces a stream of tokens and then it's totally up to the parser where expressions and statements begin and end, based on the tokens it encounters

Richard Feldman (Feb 23 2023 at 03:19):

do you mean that the algorithm for deciding whether to emit an INDENT or DEDENT (as opposed to just like "there are some spaces here") becomes tricky?

Joshua Warner (Feb 23 2023 at 03:19):

Richard Feldman (Feb 23 2023 at 03:20):

Joshua Warner (Feb 23 2023 at 03:20):

Also, we need to be careful about the distinction between 'blocks' (as I've been calling them) and exprs that just happen to be on multiple lines, but that don't contain '=' / etc (maybe a long binary op)

Joshua Warner (Feb 23 2023 at 03:21):

In the latter case, it feels weird to me to require all the lines of the expr to be exactly aligned

Joshua Warner (Feb 23 2023 at 03:21):

Richard Feldman (Feb 23 2023 at 03:24):

oh yeah, something I hadn't considered - does this mean that |> lines have to be indented? :thinking:

Joshua Warner (Feb 23 2023 at 03:24):

Richard Feldman (Feb 23 2023 at 03:24):

not anymore, they're at the same level of indentation as the preceding expression now

Joshua Warner (Feb 23 2023 at 03:24):

a
  |> foo
  |> bar
  |> etc

Joshua Warner (Feb 23 2023 at 03:25):

Richard Feldman (Feb 23 2023 at 03:25):

Joshua Warner (Feb 23 2023 at 03:25):

Richard Feldman (Feb 23 2023 at 03:25):

Joshua Warner (Feb 23 2023 at 03:25):

Joshua Warner (Feb 23 2023 at 03:27):

Ok, cool - I think under that rule we can make statements inside parens 'just work', along with any other collection.

Richard Feldman (Feb 23 2023 at 03:27):

Richard Feldman (Feb 23 2023 at 03:28):

Joshua Warner (Feb 23 2023 at 03:28):

Richard Feldman (Feb 23 2023 at 03:28):

like if we want to do a good job providing helpful errors, does splitting out the lexer help us, hurt us, or make basically no difference?

Joshua Warner (Feb 23 2023 at 03:31):

Outdents inside parens, square brackets, etc - I think is 100% solvable, since we can use the matching delimiters to work out what you really meant there.

Richard Feldman (Feb 23 2023 at 03:33):

Joshua Warner (Feb 23 2023 at 03:33):

foo = \a, b ->
    x = baz a b
y = bar x
    y + 1

Joshua Warner (Feb 23 2023 at 03:34):

The first def there will not end in an expression, so we'll get a canonicalization error

Joshua Warner (Feb 23 2023 at 03:35):

I think in that case the best way to disambiguate the user intent might be to look at the names involved in the second def.

Joshua Warner (Feb 23 2023 at 03:36):

If that references names not defined in the outer scope, but that _are_ defined as locals inside the broken def, we can probably infer the user intent here and give a helpful error.

Joshua Warner (Feb 23 2023 at 03:36):

(conveniently, canonicalization has exactly that information available, I think)

Richard Feldman (Feb 23 2023 at 03:38):

ok another question is performance, although that's easier to measure. We're introducing a new IR, and potentially a lot of memory to traverse; are we signing up for a bunch of cache misses?

Richard Feldman (Feb 23 2023 at 03:38):

I mean maybe it's fine because the whole parsing step is already so fast, maybe slowing it down a bit isn't noticeable

Joshua Warner (Feb 23 2023 at 03:42):

There would be some wins here. Specifically, there are some cases where the parser currently has to backtrack that could (probably?) be disambiguated by peeking at a fixed number of tokens [ahead] in the input

Richard Feldman (Feb 23 2023 at 03:44):

:thinking: do you think that would be enough to compensate for the extra work of creating and then traversing the token stream?

Joshua Warner (Feb 23 2023 at 03:47):

Joshua Warner (Feb 23 2023 at 03:48):

const Parser = struct {
    gpa: Allocator,
    source: []const u8,

    token_tags: []const Token.Tag,
    token_starts: []const Ast.ByteOffset,
    tok_i: TokenIndex,

 ...
}

Richard Feldman (Feb 23 2023 at 03:48):

sure, I just want to make sure we're considering the potential risks as well as the potential benefits :big_smile:

Richard Feldman (Feb 23 2023 at 03:48):

certainly it's possible to lex+parse quickly overall, but that doesn't mean it would necessarily be faster in our case - might be slower, even if still fast in the grand scheme of things

Joshua Warner (Feb 23 2023 at 03:48):

Richard Feldman (Feb 23 2023 at 03:50):

so the rules change seems reasonable to me, and given that, the implementation seems worth a try!

Richard Feldman (Feb 23 2023 at 03:50):

I'd just want to keep an eye on error message quality and performance, make sure they're both still good afterwards

Joshua Warner (Feb 23 2023 at 03:50):

Richard Feldman (Feb 23 2023 at 03:51):

Joshua Warner (Feb 23 2023 at 03:53):

FWIW, I think there will be some immediate wins in error messages - since right now accidentally putting an expr in the middle of a sequence of defs screws up the parse for the rest of the defs.

Joshua Warner (Feb 23 2023 at 03:57):

And on the perf side, we can actually implement these rules _without_ doing the separate lexer adjustment

Joshua Warner (Feb 23 2023 at 03:58):

And also, these rules make it easier to recover parsing after an error by looking for the next statement/expression in the nearest surrounding block (emitting a malformed statement / expression in the syntax tree). That means compilation can continue despite the broken code, and you can still run tests as long as they don't touch that code.

Kesanov (Feb 26 2023 at 08:54):

Anton (Feb 26 2023 at 09:11):

Indentation sensitivity has rarely bothered me but there are some issues that can come up:

dank (Feb 26 2023 at 11:27):

isn't it true to say though that structural editors can basically eliminate this whole class of problems?

dank (Feb 26 2023 at 11:28):

so that in roc building upon the roc editor we wouldn't have that much of an issue introducing this as a syntax constraint

Anton (Feb 26 2023 at 12:23):

Kiryl Dziamura (Jun 02 2024 at 11:26):

What’s the expected amount of work? Can we split it? I don't mind to start working on it. However, the current parsing implementation has not yet settled in my head.

Kiryl Dziamura (Jun 04 2024 at 08:13):

The reason I'm bringing it up again is outlined here and it's also kind of a blocker for this problem

Richard Feldman (Jun 04 2024 at 10:45):

I suspect it’s a pretty big project, although I haven’t really thought about how it would work to modify the current parser to do it

Richard Feldman (Jun 04 2024 at 10:46):

as opposed to doing it as part of a larger change to a non-parser-combinator design

Joshua Warner (Jul 10 2024 at 03:40):

Joshua Warner (Jul 10 2024 at 03:41):

The last thing (I think / I hope) is squashing some bugs / assessing some changes in test_reporting

Joshua Warner (Jul 10 2024 at 03:41):

Snapshot: dbg_without_final_expression
Source: crates/compiler/load/tests/test_reporting.rs:5740
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Expression: golden
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    0       │-── INDENT ENDS AFTER EXPRESSION in tmp/dbg_without_final_expression/Test.roc ───
          0 │+── MISSING FINAL EXPRESSION in tmp/dbg_without_final_expression/Test.roc ───────
    1     1 │
    2       │-I am partway through parsing a dbg statement, but I got stuck here:
          2 │+I am partway through parsing a definition, but I got stuck here:
    3     3 │
          4 │+1│  app "test" provides [main] to "./platform"
          5 │+2│
          6 │+3│  main =
    4     7 │ 4│      dbg 42
    5     8 │               ^
    6     9 │
    7       │-I was expecting a final expression, like so
         10 │+This definition is missing a final expression. A nested definition
         11 │+must be followed by either another definition, or an expression
    8    12 │
    9       │-    dbg 42
   10       │-    "done"
         13 │+    x = 4
         14 │+    y = 2
         15 │+
         16 │+    x + y

Joshua Warner (Jul 10 2024 at 03:44):

The TL;DR there is that, because dbg expression parsing now itself only consists of parsing the dbg 42 part itself and pops back up to a higher level in order to parse the rest of the block, the natural error message is just a "definition missing final expr" error, rather than being something specific to a def.

Joshua Warner (Jul 10 2024 at 03:45):

Should be possible to recover that original dbg-specific error; it just needs some attention on each of the failures

Joshua Warner (Jul 10 2024 at 03:45):

If someone would be interested in helping out, assessing each of these test failures is probably fairly parallelizable

Luke Boswell (Jul 10 2024 at 03:56):

Hopefully we can have a new release of basic-ssg that supports Windows. It's been slow progress finding a compatible set of deps rust is happy with, and also finish the removal of rebuilding host from roc.

Stream: ideas

Topic: Design: Indents and Blocks

Joshua Warner (Feb 23 2023 at 02:44):

Richard Feldman (Feb 23 2023 at 02:48):

Richard Feldman (Feb 23 2023 at 02:49):

Richard Feldman (Feb 23 2023 at 02:50):

Richard Feldman (Feb 23 2023 at 02:51):

Richard Feldman (Feb 23 2023 at 03:00):

Joshua Warner (Feb 23 2023 at 03:02):

Joshua Warner (Feb 23 2023 at 03:03):

Richard Feldman (Feb 23 2023 at 03:05):

Joshua Warner (Feb 23 2023 at 03:06):

Richard Feldman (Feb 23 2023 at 03:06):

Joshua Warner (Feb 23 2023 at 03:07):

Richard Feldman (Feb 23 2023 at 03:07):

Joshua Warner (Feb 23 2023 at 03:08):

Joshua Warner (Feb 23 2023 at 03:08):

Joshua Warner (Feb 23 2023 at 03:10):

Joshua Warner (Feb 23 2023 at 03:11):

Richard Feldman (Feb 23 2023 at 03:12):

Richard Feldman (Feb 23 2023 at 03:13):

Joshua Warner (Feb 23 2023 at 03:14):

Richard Feldman (Feb 23 2023 at 03:15):

Richard Feldman (Feb 23 2023 at 03:16):

Joshua Warner (Feb 23 2023 at 03:16):

Joshua Warner (Feb 23 2023 at 03:16):

Richard Feldman (Feb 23 2023 at 03:17):

Richard Feldman (Feb 23 2023 at 03:18):

Richard Feldman (Feb 23 2023 at 03:19):

Richard Feldman (Feb 23 2023 at 03:19):

Joshua Warner (Feb 23 2023 at 03:19):

Richard Feldman (Feb 23 2023 at 03:20):

Joshua Warner (Feb 23 2023 at 03:20):

Joshua Warner (Feb 23 2023 at 03:21):

Joshua Warner (Feb 23 2023 at 03:21):

Richard Feldman (Feb 23 2023 at 03:24):

Joshua Warner (Feb 23 2023 at 03:24):

Joshua Warner (Feb 23 2023 at 03:24):

Richard Feldman (Feb 23 2023 at 03:24):

Joshua Warner (Feb 23 2023 at 03:24):

Joshua Warner (Feb 23 2023 at 03:25):

Richard Feldman (Feb 23 2023 at 03:25):

Richard Feldman (Feb 23 2023 at 03:25):

Richard Feldman (Feb 23 2023 at 03:25):

Joshua Warner (Feb 23 2023 at 03:25):

Richard Feldman (Feb 23 2023 at 03:25):

Joshua Warner (Feb 23 2023 at 03:25):

Joshua Warner (Feb 23 2023 at 03:27):

Richard Feldman (Feb 23 2023 at 03:27):

Richard Feldman (Feb 23 2023 at 03:27):

Richard Feldman (Feb 23 2023 at 03:28):

Joshua Warner (Feb 23 2023 at 03:28):

Richard Feldman (Feb 23 2023 at 03:28):

Richard Feldman (Feb 23 2023 at 03:28):

Richard Feldman (Feb 23 2023 at 03:28):

Joshua Warner (Feb 23 2023 at 03:31):

Joshua Warner (Feb 23 2023 at 03:31):

Richard Feldman (Feb 23 2023 at 03:33):

Joshua Warner (Feb 23 2023 at 03:33):

Joshua Warner (Feb 23 2023 at 03:34):

Joshua Warner (Feb 23 2023 at 03:34):

Joshua Warner (Feb 23 2023 at 03:35):

Joshua Warner (Feb 23 2023 at 03:36):

Joshua Warner (Feb 23 2023 at 03:36):

Richard Feldman (Feb 23 2023 at 03:38):

Richard Feldman (Feb 23 2023 at 03:38):

Richard Feldman (Feb 23 2023 at 03:38):

Joshua Warner (Feb 23 2023 at 03:42):

Richard Feldman (Feb 23 2023 at 03:44):

Joshua Warner (Feb 23 2023 at 03:47):

Joshua Warner (Feb 23 2023 at 03:47):

Joshua Warner (Feb 23 2023 at 03:48):

Richard Feldman (Feb 23 2023 at 03:48):

Richard Feldman (Feb 23 2023 at 03:48):

Joshua Warner (Feb 23 2023 at 03:48):

Richard Feldman (Feb 23 2023 at 03:50):

Richard Feldman (Feb 23 2023 at 03:50):

Joshua Warner (Feb 23 2023 at 03:50):

Richard Feldman (Feb 23 2023 at 03:51):

Joshua Warner (Feb 23 2023 at 03:53):