Stream: ideas

Topic: outdent parsing


view this post on Zulip Richard Feldman (Feb 09 2022 at 12:53):

so today, this doesn't parse:

a = {
    b: c,
    d: {
        e: f,
    },
}

this is for a pretty simple reason: defs are defined to end when you outdent

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:54):

so the closing } at the end of this code is a parse error, because it's outdented

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:54):

if you instead wrote it like this:

a =
    {
        b: c,
        d:
            {
                e: f,
            },
    }

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:55):

...then it parses successfully, because there's no outdent at the end

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:55):

everything within the a definition is indented more than the letter a itself, unlike in the first example where the } at the end is at the same indentation level as a, indicating that the definition has ended

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:56):

so why is this rule important?

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:56):

consider this example:

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:56):

a = foo
bar baz

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:56):

is this a = foo bar baz? or is it this expression:

a = foo

bar baz

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:57):

today, we know it's the latter, because of the outdent

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:57):

but if we don't recognize outdents as the end of expressions, then it becomes ambiguous

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:58):

so the upside of the current rule is that it's simple, but the downside is that with delimiters like {, [, and (, formatting gets pretty widely spaced out (like the second example above, which is what the formatter does today)

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:58):

it's also surprising to people that the first example doesn't work, because that's how it looks in most languages

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:58):

so I want to explore an idea here of making the rule more complex, for the sake of allowing that

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:59):

so let's say this parsed:

a = {
    b: c,
    d: {
        e: f,
    },
}

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:59):

what would that require?

view this post on Zulip Richard Feldman (Feb 09 2022 at 12:59):

well, one idea is that we assume that if you outdent but are missing a closing delimiter, it means you're not done yet

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:00):

in other words, instead of "outdent with unclosed delimiter" is no longer an automatic parse error, but rather means "this expression must not be done yet, so continue parsing even though there was an outdent"

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:00):

that would solve the above case, but would also allow this:

a = {
    b: c,
    d: {
        e: f,
    },
} foo bar

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:01):

that one is sort of "obviously wrong" since records aren't functions, and you can't pass them arguments, but here's another example that actually could make sense:

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar baz

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:01):

should that be allowed? with the modified "keep parsing if there's an unclosed delimiter" rule, it would be allowed. Is that okay?

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:02):

one argument is that it's fine, even if it looks a bit weird

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:02):

another argument is that the parser should accept it, and then the formatter should rewrite it to something that looks nicer - but then there's a reasonable question: specifically what would the formatter format that to that looks better?

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:03):

another question here is: what about this one?

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar
baz

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:04):

according to the "unclosed delimiter means keep parsing" rule, this should be accepted, but should be different from the previous one in that baz is no longer an argument to foo

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:04):

because we had an outdent (with respect to a) but there was no unclosed delimiter preventing it from continuing to be a part of a's definition

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:04):

so that's unambiguous, but perhaps surprising

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:04):

one more concern here: what does it do to parsing errors?

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:04):

an unclosed delimiter is a common mistake to make

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:05):

right now if we have an unclosed delimiter, we know pretty soon where it happened - the end of the def, which we detect as an outdent

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:06):

if the rule changed, then we'd potentially not be able to detect it until later on

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:06):

although, granted, if you had another def next (e.g. b = ... after a =), or an outdent to even further than a (indicating that the original def expression is now done) then maybe that wouldn't be too bad

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:07):

also using the editor should prevent unclosed delimiter parse errors

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:07):

okay, those are all the tradeoffs i can think of here!

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:08):

in summary, the question boils down to: should the parser allow this?

a = {
    b: c,
    d: {
        e: f,
    },
} foo bar

...or require that it be something more like this?

a =
    {
        b: c,
        d:
            {
                e: f,
            },
    } foo bar

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:08):

...given all the considerations above :big_smile:

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:08):

any thoughts on this welcome!

view this post on Zulip jan kili (Feb 09 2022 at 13:09):

Is it worth taking a step back and asking what we want Roc's formatting "culture"/"vibe"/"approach"/"system" to be? Is flexibility prioritized over consistency? Is formatting a task for bots? What are the hard and fast rules that newbies can learn on day one that will guide their expectations?

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:09):

oh, one other scenario I just thought of:

    a = {
        b: c,
        d: {
            e: f,
    },
} foo bar

does unclosed delimiter mean outdents get ignored completely? or just that you're allowed specifically to outdent to the same level as a but no further?

view this post on Zulip Folkert de Vries (Feb 09 2022 at 13:09):

the parser should be able to give a good error here

view this post on Zulip jan kili (Feb 09 2022 at 13:09):

(maybe those questions are already answered or out of scope, and this is just a spot fix)

view this post on Zulip Folkert de Vries (Feb 09 2022 at 13:10):

it should recognize both that it is parsing a record, and something is wrong with the indentation

view this post on Zulip Folkert de Vries (Feb 09 2022 at 13:10):

also given the editor, I'd prefer the parser to be simple and a bit on the strict side

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:10):

Is it worth taking a step back and asking what we want Roc's formatting "culture"/"vibe"/"approach"/"system" to be? Is flexibility prioritized over consistency? Is formatting a task for bots? What are the hard and fast rules that newbies can learn on day one that will guide their expectations?

I think formatting should be done by the formatter, not by humans

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:10):

and the formatter should never have any configuration options

view this post on Zulip Anton (Feb 09 2022 at 13:12):

Some types of flexibility greatly complicate the parser, so that's also something to consider.

view this post on Zulip jan kili (Feb 09 2022 at 13:12):

(side question, does the Roc CLI provide Prettier-style file formatting today? If not, is that on the way?)

view this post on Zulip Folkert de Vries (Feb 09 2022 at 13:12):

I would really like that

view this post on Zulip Folkert de Vries (Feb 09 2022 at 13:13):

don't think anything is blocking that? might just try it

view this post on Zulip Anton (Feb 09 2022 at 13:15):

We have cargo run format dir/file.roc working, or do you have something else in mind?

view this post on Zulip jan kili (Feb 09 2022 at 13:16):

Ooh! Does that get bundled into the build as roc format dir/file.roc? I almost exclusively use the built CLI executable.

view this post on Zulip Anton (Feb 09 2022 at 13:17):

I think so

view this post on Zulip jan kili (Feb 09 2022 at 13:18):

Great. Recursive directory-wide formatting would be nice, but this will help a lot. Thank you!

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:20):

cc @Chad Stearns @Joshua Warner - this discussion may be of interest!

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:21):

Some types of flexibility greatly complicate the parser, so that's also something to consider.

definitely, which also makes it harder for humans to understand the rules.

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:22):

this is the one case I can think of in the current parser where the rule is surprising to people in practice

view this post on Zulip jan kili (Feb 09 2022 at 13:23):

I think formatting should be fine by the formatter, not by humans

Should the formatter accept input that the parser can't, in order to clean it up? Should the formatter have warning/error messages of its own?

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:23):

oh, I forgot to mention: the way other languages with ML syntax (e.g. Elm, Haskell) deal with this is to format it like the following:

a =
    { b: c,
    , d:
        { e : f
        }
    } foo bar

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:23):

so you have leading commas instead of trailing commas, and never outdent to the level of the initial def

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:23):

however, I ruled this out early on

view this post on Zulip jan kili (Feb 09 2022 at 13:24):

surprising to people

This could be alleviated with clear a order-of-operations lesson in a tutorial - the reason it's surprising is that in most languages delimiters have higher precedence than indentation

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:24):

because I have seen people literally lose interest in learning Elm just because this looks so alien

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:25):

and although I don't think that's a good reason to walk away from a language, as many Lisp people will report - people not using a language because the syntax looks too aesthetically displeasing is a very real thing

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:25):

and I don't think this is worth that cost

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:25):

so I don't think leading commas are the way to go, for that reason alone

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:26):

in most languages delimiters have higher precedence than indentation

huh, interesting! I actually hadn't thought about this - the only languages I know of where indentation matters are Python, CoffeeScript, Elm, and Haskell

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:26):

I guess in Python and CoffeeScript that's the rule? :thinking:

view this post on Zulip jan kili (Feb 09 2022 at 13:27):

"indentation matters" =?= "indentation overrules delimiters"

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:31):

I have to admit - personally, having tried the "outdent is the only rule, ignore unclosed delimiters" rule for a couple of years now, I still aesthetically prefer how the more typical formatting looks :big_smile:

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:31):

where the final } is at the same indentation level as a =

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:32):

so I'm genuinely open to changing the rule here!

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:32):

but I'd like to get some other perspectives on it

view this post on Zulip Anton (Feb 09 2022 at 13:39):

I have the grammar setup to accept the following def:

a = {
    b: c,
    d: {
        e: f,
    },
}

Those parsing rules are not very complicated but I'm not sure how well it would work for errors.

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:41):

whoa, nice!

view this post on Zulip Richard Feldman (Feb 09 2022 at 13:42):

the current parser on trunk won't accept that, but maybe it's a quick change to accept it? :thinking:

view this post on Zulip Anton (Feb 09 2022 at 14:27):

Well, I'm not sure :p I've looked at the current parser quite a bit but I don't feel I understand it well. For the grammar I also rely on the tokenizer, adding the tokenizer to the current parser would take some work.

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:57):

here's another case to consider: should this parse?

button : {
    label : Elem state,
    onPress : state, PressEvent -> Action state,
}
-> Elem state

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:58):

if so, I think that requires backtracking (could be wrong), which is bad for parsing performance

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:58):

alternatively, could require that it be:

button : {
    label : Elem state,
    onPress : state, PressEvent -> Action state,
} -> Elem state

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:58):

but then what if the return value is multiline?

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:58):

it would have to be:

button : {
    label : Elem state,
    onPress : state, PressEvent -> Action state,
} -> {
    blah : Str,
    thing : Etc,
}

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:58):

which I don't think looks great :sweat_smile:

view this post on Zulip Richard Feldman (Feb 09 2022 at 14:59):

maybe the formatter could reformat it to something else, but I'm not sure what that would be!

view this post on Zulip jan kili (Feb 09 2022 at 15:36):

I also have to admit: Though I default to writing multilines like

a = {
    ...
}

I don't think it's crazy or ugly for Roc to require

a =
    {
        ...
    }

since it's clear and extensible with pre-record defs like

a =
    b = foo
    c = bar
    {
        d: b + c,
        ...
    }

view this post on Zulip jan kili (Feb 09 2022 at 15:39):

However, would the same pattern extend to type defs?

button :
    {
        label : Elem state,
        onPress : state, PressEvent -> Action state,
    } -> {
        blah : Str,
        thing : Etc,
    }

Its consistency with value defs is nice, but it doesn't benefit from the same "extensible with pre-return defs" pattern... unless Roc has some crazy type def block feature like

button :
    a :
        {
            label : Elem state,
            onPress : state, PressEvent -> Action state,
        }
    b :
        {
            blah : Str,
            thing : Etc,
        }
    a -> b

view this post on Zulip Richard Feldman (Feb 09 2022 at 18:14):

the more I look at it, the more I'm fine with "values and types" are consistent

view this post on Zulip Richard Feldman (Feb 09 2022 at 18:15):

e.g.

-> Blah

vs.

} -> Blah

actually read pretty similarly even though the -> isn't quite at the beginning of the line in the latter

view this post on Zulip Joshua Warner (Feb 09 2022 at 18:27):

My general thoughts: I'm fairly ambivolent on exactly what the "correct according to the formatter" way to indent things - but I think it's pretty important for beginners that the parser is as forgiving as it can reasonably be. I've put down more than one language because I spent hours fighting with nuances of the syntax and got frustrated.

For this reason, I'd personally be willing to bend over backwards in the parser to accept as many of these indentation/newline combinations as possible - optionally issuing warnings and/or autocorrects in case things might be ambiguous.

view this post on Zulip Ju Liu (Feb 09 2022 at 19:25):

is there a chance of someone calling a function pass a list of arguments over multiple lines? such as:

myRealSweetFunction {
  foo: 1,
  bar: 2
} secondArg thirdArg

this Elm snippet compiles:

foobar : { a : Int } -> Int -> Int -> Int
foobar _ _ _ = 0

magicValue : Int
magicValue =
    foobar {
      a = 99
    } 1 2

view this post on Zulip Richard Feldman (Feb 09 2022 at 21:02):

that should parse, although I think I'd want the formatter to put the second and third args on their own lines

view this post on Zulip Pit Capitain (Feb 10 2022 at 06:49):

Joshua Warner said:

I think it's pretty important for beginners that the parser is as forgiving as it can reasonably be. I've put down more than one language because I spent hours fighting with nuances of the syntax and got frustrated.

I would second this, IF we would write Roc code in a normal text editor and then hand it off to the parser/compiler. But I expect that our editor will be smart enough to detect and correct wrong indents on the fly and even explain what was wrong and how to avoid such mistakes in the future.

view this post on Zulip Joshua Warner (Feb 10 2022 at 16:35):

Yep, agree that the editor can be smarter here. However, I think making a normal text editor "inviting" will be a critical path for onboarding new users. IMO, the "normal text editor" experience of roc needs to be at par with other languages - and the roc-editor experience needs to be even better.

I don't think having the roc-editor is an acceptable excuse for making the normal-text-editor experience painful.

view this post on Zulip Chad Stearns (Feb 14 2022 at 03:13):

I think parsing and formatting to..

a = {
    b: c,
    d: {
        e: f,
    },
} foo bar

..sounds good.

And maybe illegal for the closing brace to have an indent level lower than the line of the opening brace.

And then maybe

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar
baz

should format to..

a = foo
    {
        b: c,
        d: {
            e: f,
        },
    }
    bar
    baz

.. because some kind of multi-line rule will kick in, that requires "if any part of this expression is multiline, then the whole thing should be"

view this post on Zulip Chad Stearns (Feb 14 2022 at 03:17):

Just my impression. Does that all make sense?

I accidentally violate this same line syntax rule for brackets every time I approach some Roc code- just because of what I am familiar with in other languages- so I definitely see the value in making a syntax rule exception for opening and closing braces like this in order to make the language more accessible to new people. And given that its more accessible, why not just make that the default all the time?

view this post on Zulip Richard Feldman (Feb 14 2022 at 04:23):

a = foo {
    b: c,
    d: {
        e: f,
    },
} bar
baz

I think this one has to parse differently than the first one, because the baz at the end would be treated as the expression at the end of a def

view this post on Zulip Chad Stearns (Feb 14 2022 at 07:07):

Oh yeah. That makes sense.


Last updated: Jun 16 2026 at 16:19 UTC