outdent parsing · ideas · Zulip Chat Archive

a = {
    b: c,
    d: {
        e: f,
    },
}

Richard Feldman (Feb 09 2022 at 12:54):

so the closing } at the end of this code is a parse error, because it's outdented

Richard Feldman (Feb 09 2022 at 12:54):

Richard Feldman (Feb 09 2022 at 12:55):

everything within the a definition is indented more than the letter a itself, unlike in the first example where the } at the end is at the same indentation level as a, indicating that the definition has ended

Richard Feldman (Feb 09 2022 at 12:56):

a = foo
bar baz

Richard Feldman (Feb 09 2022 at 12:56):

a = foo

bar baz

Richard Feldman (Feb 09 2022 at 12:57):

but if we don't recognize outdents as the end of expressions, then it becomes ambiguous

Richard Feldman (Feb 09 2022 at 12:58):

so the upside of the current rule is that it's simple, but the downside is that with delimiters like {, [, and (, formatting gets pretty widely spaced out (like the second example above, which is what the formatter does today)

Richard Feldman (Feb 09 2022 at 12:58):

it's also surprising to people that the first example doesn't work, because that's how it looks in most languages

Richard Feldman (Feb 09 2022 at 12:58):

so I want to explore an idea here of making the rule more complex, for the sake of allowing that

Richard Feldman (Feb 09 2022 at 12:59):

a = {
    b: c,
    d: {
        e: f,
    },
}

Richard Feldman (Feb 09 2022 at 12:59):

well, one idea is that we assume that if you outdent but are missing a closing delimiter, it means you're not done yet

Richard Feldman (Feb 09 2022 at 13:00):

in other words, instead of "outdent with unclosed delimiter" is no longer an automatic parse error, but rather means "this expression must not be done yet, so continue parsing even though there was an outdent"

Richard Feldman (Feb 09 2022 at 13:00):

a = {
    b: c,
    d: {
        e: f,
    },
} foo bar

Richard Feldman (Feb 09 2022 at 13:01):

that one is sort of "obviously wrong" since records aren't functions, and you can't pass them arguments, but here's another example that actually could make sense:

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar baz

Richard Feldman (Feb 09 2022 at 13:01):

should that be allowed? with the modified "keep parsing if there's an unclosed delimiter" rule, it would be allowed. Is that okay?

Richard Feldman (Feb 09 2022 at 13:02):

another argument is that the parser should accept it, and then the formatter should rewrite it to something that looks nicer - but then there's a reasonable question: specifically what would the formatter format that to that looks better?

Richard Feldman (Feb 09 2022 at 13:03):

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar
baz

Richard Feldman (Feb 09 2022 at 13:04):

according to the "unclosed delimiter means keep parsing" rule, this should be accepted, but should be different from the previous one in that baz is no longer an argument to foo

Richard Feldman (Feb 09 2022 at 13:04):

because we had an outdent (with respect to a) but there was no unclosed delimiter preventing it from continuing to be a part of a's definition

Richard Feldman (Feb 09 2022 at 13:04):

Richard Feldman (Feb 09 2022 at 13:05):

right now if we have an unclosed delimiter, we know pretty soon where it happened - the end of the def, which we detect as an outdent

Richard Feldman (Feb 09 2022 at 13:06):

if the rule changed, then we'd potentially not be able to detect it until later on

Richard Feldman (Feb 09 2022 at 13:06):

although, granted, if you had another def next (e.g. b = ... after a =), or an outdent to even further than a (indicating that the original def expression is now done) then maybe that wouldn't be too bad

Richard Feldman (Feb 09 2022 at 13:07):

Richard Feldman (Feb 09 2022 at 13:08):

a = {
    b: c,
    d: {
        e: f,
    },
} foo bar

a =
    {
        b: c,
        d:
            {
                e: f,
            },
    } foo bar

Richard Feldman (Feb 09 2022 at 13:08):

jan kili (Feb 09 2022 at 13:09):

Is it worth taking a step back and asking what we want Roc's formatting "culture"/"vibe"/"approach"/"system" to be? Is flexibility prioritized over consistency? Is formatting a task for bots? What are the hard and fast rules that newbies can learn on day one that will guide their expectations?

Richard Feldman (Feb 09 2022 at 13:09):

    a = {
        b: c,
        d: {
            e: f,
    },
} foo bar

does unclosed delimiter mean outdents get ignored completely? or just that you're allowed specifically to outdent to the same level as a but no further?

Folkert de Vries (Feb 09 2022 at 13:09):

jan kili (Feb 09 2022 at 13:09):

(maybe those questions are already answered or out of scope, and this is just a spot fix)

Folkert de Vries (Feb 09 2022 at 13:10):

it should recognize both that it is parsing a record, and something is wrong with the indentation

Folkert de Vries (Feb 09 2022 at 13:10):

also given the editor, I'd prefer the parser to be simple and a bit on the strict side

Richard Feldman (Feb 09 2022 at 13:10):

Anton (Feb 09 2022 at 13:12):

Some types of flexibility greatly complicate the parser, so that's also something to consider.

jan kili (Feb 09 2022 at 13:12):

(side question, does the Roc CLI provide Prettier-style file formatting today? If not, is that on the way?)

Folkert de Vries (Feb 09 2022 at 13:12):

Folkert de Vries (Feb 09 2022 at 13:13):

Anton (Feb 09 2022 at 13:15):

We have cargo run format dir/file.roc working, or do you have something else in mind?

jan kili (Feb 09 2022 at 13:16):

Ooh! Does that get bundled into the build as roc format dir/file.roc? I almost exclusively use the built CLI executable.

Anton (Feb 09 2022 at 13:17):

jan kili (Feb 09 2022 at 13:18):

Great. Recursive directory-wide formatting would be nice, but this will help a lot. Thank you!

Richard Feldman (Feb 09 2022 at 13:20):

Richard Feldman (Feb 09 2022 at 13:21):

Richard Feldman (Feb 09 2022 at 13:22):

this is the one case I can think of in the current parser where the rule is surprising to people in practice

jan kili (Feb 09 2022 at 13:23):

Should the formatter accept input that the parser can't, in order to clean it up? Should the formatter have warning/error messages of its own?

Richard Feldman (Feb 09 2022 at 13:23):

oh, I forgot to mention: the way other languages with ML syntax (e.g. Elm, Haskell) deal with this is to format it like the following:

a =
    { b: c,
    , d:
        { e : f
        }
    } foo bar

Richard Feldman (Feb 09 2022 at 13:23):

so you have leading commas instead of trailing commas, and never outdent to the level of the initial def

Richard Feldman (Feb 09 2022 at 13:23):

jan kili (Feb 09 2022 at 13:24):

This could be alleviated with clear a order-of-operations lesson in a tutorial - the reason it's surprising is that in most languages delimiters have higher precedence than indentation

Richard Feldman (Feb 09 2022 at 13:24):

because I have seen people literally lose interest in learning Elm just because this looks so alien

Richard Feldman (Feb 09 2022 at 13:25):

and although I don't think that's a good reason to walk away from a language, as many Lisp people will report - people not using a language because the syntax looks too aesthetically displeasing is a very real thing

Richard Feldman (Feb 09 2022 at 13:25):

Richard Feldman (Feb 09 2022 at 13:26):

huh, interesting! I actually hadn't thought about this - the only languages I know of where indentation matters are Python, CoffeeScript, Elm, and Haskell

Richard Feldman (Feb 09 2022 at 13:26):

jan kili (Feb 09 2022 at 13:27):

Richard Feldman (Feb 09 2022 at 13:31):

I have to admit - personally, having tried the "outdent is the only rule, ignore unclosed delimiters" rule for a couple of years now, I still aesthetically prefer how the more typical formatting looks :big_smile:

Richard Feldman (Feb 09 2022 at 13:31):

Richard Feldman (Feb 09 2022 at 13:32):

Anton (Feb 09 2022 at 13:39):

a = {
    b: c,
    d: {
        e: f,
    },
}

Those parsing rules are not very complicated but I'm not sure how well it would work for errors.

Richard Feldman (Feb 09 2022 at 13:41):

Richard Feldman (Feb 09 2022 at 13:42):

the current parser on trunk won't accept that, but maybe it's a quick change to accept it? :thinking:

Anton (Feb 09 2022 at 14:27):

Well, I'm not sure :p I've looked at the current parser quite a bit but I don't feel I understand it well. For the grammar I also rely on the tokenizer, adding the tokenizer to the current parser would take some work.

Richard Feldman (Feb 09 2022 at 14:57):

button : {
    label : Elem state,
    onPress : state, PressEvent -> Action state,
}
-> Elem state

Richard Feldman (Feb 09 2022 at 14:58):

if so, I think that requires backtracking (could be wrong), which is bad for parsing performance

Richard Feldman (Feb 09 2022 at 14:58):

button : {
    label : Elem state,
    onPress : state, PressEvent -> Action state,
} -> Elem state

Richard Feldman (Feb 09 2022 at 14:58):

button : {
    label : Elem state,
    onPress : state, PressEvent -> Action state,
} -> {
    blah : Str,
    thing : Etc,
}

Richard Feldman (Feb 09 2022 at 14:58):

Richard Feldman (Feb 09 2022 at 14:59):

maybe the formatter could reformat it to something else, but I'm not sure what that would be!

jan kili (Feb 09 2022 at 15:36):

a = {
    ...
}

a =
    {
        ...
    }

a =
    b = foo
    c = bar
    {
        d: b + c,
        ...
    }

jan kili (Feb 09 2022 at 15:39):

button :
    {
        label : Elem state,
        onPress : state, PressEvent -> Action state,
    } -> {
        blah : Str,
        thing : Etc,
    }

Its consistency with value defs is nice, but it doesn't benefit from the same "extensible with pre-return defs" pattern... unless Roc has some crazy type def block feature like

button :
    a :
        {
            label : Elem state,
            onPress : state, PressEvent -> Action state,
        }
    b :
        {
            blah : Str,
            thing : Etc,
        }
    a -> b

Richard Feldman (Feb 09 2022 at 18:14):

Richard Feldman (Feb 09 2022 at 18:15):

-> Blah

} -> Blah

actually read pretty similarly even though the -> isn't quite at the beginning of the line in the latter

Joshua Warner (Feb 09 2022 at 18:27):

My general thoughts: I'm fairly ambivolent on exactly what the "correct according to the formatter" way to indent things - but I think it's pretty important for beginners that the parser is as forgiving as it can reasonably be. I've put down more than one language because I spent hours fighting with nuances of the syntax and got frustrated.

For this reason, I'd personally be willing to bend over backwards in the parser to accept as many of these indentation/newline combinations as possible - optionally issuing warnings and/or autocorrects in case things might be ambiguous.

Ju Liu (Feb 09 2022 at 19:25):

is there a chance of someone calling a function pass a list of arguments over multiple lines? such as:

myRealSweetFunction {
  foo: 1,
  bar: 2
} secondArg thirdArg

foobar : { a : Int } -> Int -> Int -> Int
foobar _ _ _ = 0

magicValue : Int
magicValue =
    foobar {
      a = 99
    } 1 2

Richard Feldman (Feb 09 2022 at 21:02):

that should parse, although I think I'd want the formatter to put the second and third args on their own lines

Pit Capitain (Feb 10 2022 at 06:49):

I would second this, IF we would write Roc code in a normal text editor and then hand it off to the parser/compiler. But I expect that our editor will be smart enough to detect and correct wrong indents on the fly and even explain what was wrong and how to avoid such mistakes in the future.

Joshua Warner (Feb 10 2022 at 16:35):

Yep, agree that the editor can be smarter here. However, I think making a normal text editor "inviting" will be a critical path for onboarding new users. IMO, the "normal text editor" experience of roc needs to be at par with other languages - and the roc-editor experience needs to be even better.

I don't think having the roc-editor is an acceptable excuse for making the normal-text-editor experience painful.

Chad Stearns (Feb 14 2022 at 03:13):

a = {
    b: c,
    d: {
        e: f,
    },
} foo bar

And maybe illegal for the closing brace to have an indent level lower than the line of the opening brace.

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar
baz

a = foo
    {
        b: c,
        d: {
            e: f,
        },
    }
    bar
    baz

.. because some kind of multi-line rule will kick in, that requires "if any part of this expression is multiline, then the whole thing should be"

Chad Stearns (Feb 14 2022 at 03:17):

I accidentally violate this same line syntax rule for brackets every time I approach some Roc code- just because of what I am familiar with in other languages- so I definitely see the value in making a syntax rule exception for opening and closing braces like this in order to make the language more accessible to new people. And given that its more accessible, why not just make that the default all the time?

Richard Feldman (Feb 14 2022 at 04:23):

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar
baz

I think this one has to parse differently than the first one, because the baz at the end would be treated as the expression at the end of a def

Stream: ideas

Topic: outdent parsing

Richard Feldman (Feb 09 2022 at 12:53):

Richard Feldman (Feb 09 2022 at 12:54):

Richard Feldman (Feb 09 2022 at 12:54):

Richard Feldman (Feb 09 2022 at 12:55):

Richard Feldman (Feb 09 2022 at 12:55):

Richard Feldman (Feb 09 2022 at 12:56):

Richard Feldman (Feb 09 2022 at 12:56):

Richard Feldman (Feb 09 2022 at 12:56):

Richard Feldman (Feb 09 2022 at 12:56):

Richard Feldman (Feb 09 2022 at 12:57):

Richard Feldman (Feb 09 2022 at 12:57):

Richard Feldman (Feb 09 2022 at 12:58):

Richard Feldman (Feb 09 2022 at 12:58):

Richard Feldman (Feb 09 2022 at 12:58):

Richard Feldman (Feb 09 2022 at 12:59):

Richard Feldman (Feb 09 2022 at 12:59):

Richard Feldman (Feb 09 2022 at 12:59):

Richard Feldman (Feb 09 2022 at 13:00):

Richard Feldman (Feb 09 2022 at 13:00):

Richard Feldman (Feb 09 2022 at 13:01):

Richard Feldman (Feb 09 2022 at 13:01):

Richard Feldman (Feb 09 2022 at 13:02):

Richard Feldman (Feb 09 2022 at 13:02):

Richard Feldman (Feb 09 2022 at 13:03):

Richard Feldman (Feb 09 2022 at 13:04):

Richard Feldman (Feb 09 2022 at 13:04):

Richard Feldman (Feb 09 2022 at 13:04):

Richard Feldman (Feb 09 2022 at 13:04):

Richard Feldman (Feb 09 2022 at 13:04):

Richard Feldman (Feb 09 2022 at 13:05):

Richard Feldman (Feb 09 2022 at 13:06):

Richard Feldman (Feb 09 2022 at 13:06):

Richard Feldman (Feb 09 2022 at 13:07):

Richard Feldman (Feb 09 2022 at 13:07):

Richard Feldman (Feb 09 2022 at 13:08):

Richard Feldman (Feb 09 2022 at 13:08):

Richard Feldman (Feb 09 2022 at 13:08):

jan kili (Feb 09 2022 at 13:09):

Richard Feldman (Feb 09 2022 at 13:09):

Folkert de Vries (Feb 09 2022 at 13:09):

jan kili (Feb 09 2022 at 13:09):

Folkert de Vries (Feb 09 2022 at 13:10):

Folkert de Vries (Feb 09 2022 at 13:10):

Richard Feldman (Feb 09 2022 at 13:10):

Richard Feldman (Feb 09 2022 at 13:10):

Anton (Feb 09 2022 at 13:12):

jan kili (Feb 09 2022 at 13:12):

Folkert de Vries (Feb 09 2022 at 13:12):

Folkert de Vries (Feb 09 2022 at 13:13):

Anton (Feb 09 2022 at 13:15):

jan kili (Feb 09 2022 at 13:16):

Anton (Feb 09 2022 at 13:17):

jan kili (Feb 09 2022 at 13:18):

Richard Feldman (Feb 09 2022 at 13:20):

Richard Feldman (Feb 09 2022 at 13:21):

Richard Feldman (Feb 09 2022 at 13:22):

jan kili (Feb 09 2022 at 13:23):

Richard Feldman (Feb 09 2022 at 13:23):

Richard Feldman (Feb 09 2022 at 13:23):

Richard Feldman (Feb 09 2022 at 13:23):

jan kili (Feb 09 2022 at 13:24):

Richard Feldman (Feb 09 2022 at 13:24):

Richard Feldman (Feb 09 2022 at 13:25):

Richard Feldman (Feb 09 2022 at 13:25):

Richard Feldman (Feb 09 2022 at 13:25):

Richard Feldman (Feb 09 2022 at 13:26):

Richard Feldman (Feb 09 2022 at 13:26):

jan kili (Feb 09 2022 at 13:27):

Richard Feldman (Feb 09 2022 at 13:31):

Richard Feldman (Feb 09 2022 at 13:31):

Richard Feldman (Feb 09 2022 at 13:32):

Richard Feldman (Feb 09 2022 at 13:32):

Anton (Feb 09 2022 at 13:39):

Richard Feldman (Feb 09 2022 at 13:41):

Richard Feldman (Feb 09 2022 at 13:42):

Anton (Feb 09 2022 at 14:27):

Richard Feldman (Feb 09 2022 at 14:57):

Richard Feldman (Feb 09 2022 at 14:58):