shadowing and redeclaration except at top level · ideas

Stream: ideas

Topic: shadowing and redeclaration except at top level

Richard Feldman (Feb 05 2024 at 02:03):

I'm working on canonicalization that allows shadowing, and I'm finding some tricky edge cases in even just defining (and trying to explain in a way where people can understand them) what the rules are

Richard Feldman (Feb 05 2024 at 02:07):

here's an idea I currently like for what the rules would be:

At the top level, redeclaration is not allowed. You can say foo = at the top level, but then later on at the top level you can't say foo = again; that's an error (just like today).
Anything can always refer to anything in the top level regardless of order. So if blah is defined at the top level, I you can refer to blah before it has been defined. (Just like today.) You can define mutually recursive functions this way, but of course you can't make mutually recursive non-functions (also like today).
When not at the top level, you can shadow and redeclare things. (So in a nested scope, you can write foo = to shadow the top-level one if you like, and in fact you can also write foo = again in that same scope to redeclare it.)
When not at the top level, ordering is strictly enforced. You can still reference top-level things that haven't been declared yet, but you cannot reference any non-top-level thing unless it has already been declared earlier in scope. Otherwise, you get an error.

Richard Feldman (Feb 05 2024 at 02:08):

these rules seem hopefully pretty straightforward to learn and to apply

Richard Feldman (Feb 05 2024 at 02:09):

an implication of these rules is that mutually recursive functions can only be defined at the top level

Richard Feldman (Feb 05 2024 at 02:10):

that seems fine to me since they come up very rarely, even when they do come up they're usually written at the top level anyway, and of course it wouldn't block anything because you can always rewrite any lambda to be a top-level function by having it take explicit parameters for whatever it would have been closing over

Richard Feldman (Feb 05 2024 at 02:12):

it might seem needlessly restrictive, but consider (as I just got through doing!) what happens if you lift the "When not at the top level, ordering is strictly enforced" restriction: what happens if I want to write mutually recursive closures? One must necessarily refer to the other one before it has been declared, and the other one might be declared later on in an enclosing scope rather than in the current block of defs

Richard Feldman (Feb 05 2024 at 02:13):

then think about how that interacts with shadowing: what if one of the closures in the mutual recursion is named foo, but foo is redeclaring (or shadowing) something that was previously named foo, but that one wasn't a function

Richard Feldman (Feb 05 2024 at 02:13):

so theoretically we have enough information to infer that foo and some other function are mutually recursive, by ignoring the non-function foo that's declared in between the function foo and the thing that mutually recurses with it...

Richard Feldman (Feb 05 2024 at 02:14):

...but at this point it might be very confusing, to say the least, to figure out what is referring to what

Richard Feldman (Feb 05 2024 at 02:15):

the nice thing about the proposed rule set is that:

you can still define your top-level declarations in any order, like today
outside the top level, order always matters and is strictly enforced, so it's trivial to understand what a variable lookup of foo refers to: "either a foo that's declared earlier in the source file, or if there is none, then a foo top-level declaration later in the file" - otherwise it's definitely a naming error!

Richard Feldman (Feb 05 2024 at 02:20):

thoughts?

Kevin Gillette (Feb 05 2024 at 07:17):

When not at the top level, should we disallow redeclaration of same-scope functions? It seems it would be confusing to have:

foo = -1
foo = \x -> x * foo
foo = \x ->
  if x >= 5 then
    x
  else
    foo (x * 3)

foo 2

Does this example (which iiuc may be permitted by the proposed rules) result in:

-6, in case the last foo uses the middle foo instead of self-recursion?
A compilation error because the middle foo is self-referential and thus fails due multiplication involving a function value?

In any case, at least for non-dev builds, we should perhaps disallow unused declarations that are not top-level?

Anton (Feb 05 2024 at 09:54):

in a nested scope, you can write foo = to shadow the top-level one if you like

Can you give an example when this is useful?

Richard Feldman (Feb 05 2024 at 11:38):

sure, I'm writing a parser and I want a top level value named str so I can expose it as Parser.str but I also want to name something str locally as an intermediate value when implementing a different parser

Richard Feldman (Feb 05 2024 at 11:41):

that said, we can also address that by letting you name the top level one as like strInner and then expose it as strInner as str

Richard Feldman (Feb 05 2024 at 11:42):

so I could see an argument for "top level values can't be shadowed/redeclared at all" as opposed to just "they can't do it themselves"

Richard Feldman (Feb 05 2024 at 11:50):

Kevin Gillette said:

In any case, at least for non-dev builds, we should perhaps disallow unused declarations that are not top-level?

I don't think that's necessary; you get a warning for unused declarations regardless, so if you ship it it's because you were informed about it and decided to ignore it. I don't think we need to disallow shipping under those circumstances!

Richard Feldman (Feb 05 2024 at 11:52):

Kevin Gillette said:

When not at the top level, should we disallow redeclaration of same-scope functions? It seems it would be confusing to have:
foo = -1
foo = \x -> x * foo
foo = \x ->
  if x >= 5 then
    x
  else
    foo (x * 3)

foo 2

self-recursion should always be allowed (like today), so in this example:

foo = \x -> x * foo

here, foo refers to itself (self-recursion) because itself is always the "most recent declaration"

Richard Feldman (Feb 05 2024 at 11:52):

same here: foo refers to itself

foo = \x ->
  if x >= 5 then
    x
  else
    foo (x * 3)

Richard Feldman (Feb 05 2024 at 11:53):

foo 2

this refers to the most recent foo, also as normal

Richard Feldman (Feb 05 2024 at 11:53):

so I don't think that example needs to be disallowed

Richard Feldman (Feb 05 2024 at 11:54):

worth noting: another factor which led me to start thinking about this is that allowing things like nested mutual recursion both complicates and slows down the compiler significantly (at least percentage-wise) compared to this rule set

Richard Feldman (Feb 05 2024 at 11:55):

which led me to start wondering "why pay such a high runtime and implementation complexity cost to support writing confusing code?"

Richard Feldman (Feb 05 2024 at 11:56):

I don't know if I'd personally choose to redefine a function with foo = several times in a row, but I don't think it's hard to figure out what it's doing :big_smile:

Isaac Van Doren (Feb 05 2024 at 12:24):

This sounds great to me!

Isaac Van Doren (Feb 05 2024 at 12:26):

Can you shadow with lambda params? I.e.

foo = 10

bar = \foo ->
Str.concat foo “bar”

Isaac Van Doren (Feb 05 2024 at 12:26):

This is something I’ve run into a few times in practice

Richard Feldman (Feb 05 2024 at 12:35):

yeah that would be allowed in this idea

Isaac Van Doren (Feb 05 2024 at 12:37):

Okay perfect

Pearce Keesling (Feb 05 2024 at 12:47):

I agree, same level re declaration is still a lot better than mutation because at any point in the code there is still only one declaration that matters and the editor can point me right to it

Brendan Hansknecht (Feb 05 2024 at 16:10):

foo = -1
foo = \x -> x * foo

self-recursion should always be allowed (like today), so in this example:
foo = \x -> x * foo
here, foo refers to itself (self-recursion) because itself is always the "most recent declaration"

I think this should be reconsidered. I think this code would confuse most users.
I think that most users would read this as:

-- Foo is -1
foo = -1
-- Foo is a function that takes in x, captures foo(which is -1), and returns x * foo
foo = \x -> x * foo
-- This returns -7
foo 7

If instead, the first declaration of foo is unused. The second is self recursive.
Then this code would instead return a compile time error. Cause in x * foo, foo is being used as a Num a instead of a Num a -> Num a cause no args are being passed to the self recursive foo function. That error alone would probably be quite confusing to users given just above that line is a version of foo that is defined as a Num a

Richard Feldman (Feb 05 2024 at 16:27):

interesting, so basically recursive closures would be disallowed in that design

Richard Feldman (Feb 05 2024 at 16:27):

not just mutually recursive, but recursive in general

Richard Feldman (Feb 05 2024 at 16:28):

so if you wanted to do any type of recursion, it would have to be at the top level

Richard Feldman (Feb 05 2024 at 16:34):

I guess that's how it works in a lot of languages, to be fair :thinking:

Brendan Hansknecht (Feb 05 2024 at 17:11):

I think if no shadowing was involved, a recursive closure would makes sense. That said, once shadowing is involved, I think referencing the previous value makes more senese.

Brendan Hansknecht (Feb 05 2024 at 17:11):

At least from the just quickly looking at the code what would I expect most users to immediately expect it to do perspective.

Richard Feldman (Feb 05 2024 at 17:14):

I dunno, honestly I prefer the simpler rule at that point: "if you want to recurse, do it at the top level"

Richard Feldman (Feb 05 2024 at 17:15):

as opposed to "you can recurse outside the top level, but only if you are specifically recursing on something that has not been defined earlier in scope"

Brendan Hansknecht (Feb 05 2024 at 17:16):

I think I would actually prefer "shadowing of closures isn't allowed". This allows for closures to be recursive.

Brendan Hansknecht (Feb 05 2024 at 17:16):

But yeah, should pick that or your other simple rule

Brendan Hansknecht (Feb 05 2024 at 17:18):

Also, the reason for my preference is that it can often be nice to hide recursive helper functions and give them simple names

Instead of:

thisIsAlreadyALongName = \... ->
    ...

thisIsAlreadyALongNameHelper = \... ->
    ...

You can write:

thisIsAlreadyALongName = \... ->
    helper = \... ->
        ...
    ...

Richard Feldman (Feb 05 2024 at 17:18):

so to summarize, the concrete idea I'm now thinking of is:

top-level defs cannot be shadowed or redeclared, but they can be declared in any order, and they can be recursive
non-top-level defs can be shadowed and redeclared, but order matters and they can't be recursive

...and if you need them to be recursive, move them to the top level!

Brendan Hansknecht (Feb 05 2024 at 17:20):

Given only top level functions are free from ordering, I think it makes a lot of logical sense.

Richard Feldman (Feb 05 2024 at 17:21):

oh yeah, good point - I edited it to note that :big_smile:

Richard Feldman (Feb 05 2024 at 17:23):

interestingly, in conjunction with compile-time evaluation, these can be taught to Rust programmers as:

top-level defs in Roc are automatically either fn or const depending on whether you're assigning to a lambda
non-top-level defs are always let

Richard Feldman (Feb 05 2024 at 17:23):

with the one additional rule that top-level fn and const in Rust are allowed to be shadowed

Richard Feldman (Feb 05 2024 at 17:23):

but not in Roc

Richard Feldman (Feb 05 2024 at 17:24):

a thing I like about this is that the rules only come up when you're writing code, but when you're reading it you probably don't need to be aware of them at all

Richard Feldman (Feb 05 2024 at 18:15):

Brendan Hansknecht said:

Also, the reason for my preference is that it can often be nice to hide recursive helper functions and give them simple names

yeah thinking about it more, and talking to @Folkert de Vries about it, I think this is worth preserving

Isaac Van Doren (Feb 05 2024 at 18:43):

One nice thing about being able to define a recursive helper as a sub definition is that then you can close over values that don’t change during the recursion which can make the code a lot cleaner.

Eli Dowling (Feb 05 2024 at 21:55):

A note on the recursion discussion. Coming from any other language I would immediately assume that foo is recursive and the first definition is unused.

foo = -1
foo = \x -> x * foo

I would think the other behaviour would just cause regular frustration. I think the more weird edge cases you have the less pleasant a language is to use. Like "you can recurse, but not within shadowing within local functions" that's just feels like another silly thing to remember.

Brendan Hansknecht (Feb 05 2024 at 21:58):

Coming from any other language?

Norbert Hajagos (Feb 06 2024 at 09:02):

foo = -1
foo = \x -> x * foo

I thought it was using the previous foo in the function body and wasn't recursing. The recursive case makes more sense if I do some reasoning from a lang. designer point of view, but I wouldn't want to tink about such things while coding. I don't think this kind of code should be allowed.

Fabian Schmalzried (Feb 06 2024 at 09:49):

foo = -1
foo = \x -> x * foo

I don't even know what I would expect here. I think in practive, the foo = -1 would be somewhere else most of the time, and you would only see foo= \x -> x * foo, and logically would assume recursion here.
But another Idea: Should this even be allowed? I want shadowing for stuff like model = doSomethingWith model, not for functions. Would it be possible to just not allow reassigning a variable with a function?

Richard Feldman (Feb 06 2024 at 15:21):

in general I'm not a fan of the idea of more complicated shadowing rules than "is allowed at the top level but not anywhere else"

Richard Feldman (Feb 06 2024 at 15:21):

like shadowing/redeclaration rules varying by type sounds more complicated than it's worth to me

Richard Feldman (Feb 06 2024 at 15:21):

as opposed to (for example) allowing it but culturally discouraging it

Richard Feldman (Feb 06 2024 at 15:24):

another thing that just occurred to me: all the reasons that non-function defs should not be allowed to be out-of-order are just as applicable in the top level as they are in function bodies

Richard Feldman (Feb 06 2024 at 15:25):

e.g. if any of them has a dbg, or a failed expect, or a crash, and they're out of order, then those will print in a surprising order

Richard Feldman (Feb 06 2024 at 15:25):

because they got silently reordered by the compiler

Richard Feldman (Feb 06 2024 at 15:26):

so I think the actual rule we want here is "top-level functions can be declared in any order"

Richard Feldman (Feb 06 2024 at 15:26):

(and of course type aliases and opaque types and abilities, since those can be declared in any order anywhere)

Richard Feldman (Feb 06 2024 at 15:26):

but top-level constants that aren't functions still need to be in order

Richard Feldman (Feb 06 2024 at 15:27):

that becomes especially true when we evaluate those at compile time, because then they will always be getting evaluated in exactly the order they appear in the source file, so dbg/crash/expect output appearing in a different order will be even more surprising

Richard Feldman (Feb 06 2024 at 15:32):

so then in that world, the overall proposed rules would be:

Top-level values cannot be shadowed or redeclared, but others can
Top-level functions can be declared in any order, but other functions must be declared in order (and therefore can only self-recurse)
All non-function constants must be declared in order (including at the top level)

Brendan Hansknecht (Feb 06 2024 at 15:42):

It is kinda interesting. In most languages functions aren't values (at least the standard written way, they may still have lambdas separately) so they wouldn't hit this issue. Functions and values are just a separate class.

Only since functions are values does roc even have to consider this issue as needing special rules

Brendan Hansknecht (Feb 06 2024 at 15:44):

e.g. if any of them has a dbg, or a failed expect, or a crash, and they're out of order, then those will print in a surprising order

I honestly wouldn't worry about that. Those messages and such will be part of compile time. Or the top level constants will be secretly lambdas like today (which means order really doesn't matter).

Brendan Hansknecht (Feb 06 2024 at 15:46):

I think forcing top level constants to be declared in order would be non-ergonomic. All top levels being out of order is very nice for code organization.

Brendan Hansknecht (Feb 06 2024 at 15:48):

Roc isn't python. I don't think anyone expects the top level declarations to run in order like an imperative scripting language. So I would really push against that assumption.

Brendan Hansknecht (Feb 06 2024 at 15:48):

I think it would just make a the language semantics worse for no reason.

Brendan Hansknecht (Feb 06 2024 at 15:49):

Once inside a function definition or a nested scope, that is different. Now you are in the world of a list of steps to do something or build something.

Richard Feldman (Feb 06 2024 at 16:09):

Brendan Hansknecht said:

I think forcing top level constants to be declared in order would be non-ergonomic. All top levels being out of order is very nice for code organization.

I think the thing that's nice ergonomically is that I can refer to any top-level constant from any function

Richard Feldman (Feb 06 2024 at 16:10):

like I don't think it really impacts my ergonomics whether this is allowed or not:

bar = foo + 1
foo = 2

Brendan Hansknecht (Feb 06 2024 at 16:10):

Can I put constants used by a function after the function is defined?

Richard Feldman (Feb 06 2024 at 16:10):

sure

Richard Feldman (Feb 06 2024 at 16:10):

the constants would just need to be ordered with respect to each other

Brendan Hansknecht (Feb 06 2024 at 16:10):

Ah. Then either is fine to me

Richard Feldman (Feb 06 2024 at 16:10):

so that they're ordered in the same order they'll be evaluated

Richard Feldman (Feb 06 2024 at 16:10):

cool!

Richard Feldman (Feb 07 2024 at 04:45):

Richard Feldman said:

so I could see an argument for "top level values can't be shadowed/redeclared at all" as opposed to just "they can't do it themselves"

I want to revisit this, actually - as I'm going through the implementation, I realize that enforcing this has a nontrivial performance cost; it either requires a second pass over the AST, or else much higher memory usage

Richard Feldman (Feb 07 2024 at 04:46):

what are people's thoughts about allowing top-level constants to be shadowed in nested scopes?

Richard Feldman (Feb 07 2024 at 04:51):

(given that the performance cost is nontrivial, I think the strength of the preference for this over the faster alternative design should also be nontrivial!)

Brendan Hansknecht (Feb 07 2024 at 04:55):

Would this mean they can be shadowed anywhere, even at the top level? I assume not, I assume it is just that they can be shadowed within a function/ body of another top level?

Richard Feldman (Feb 07 2024 at 05:03):

yeah not at the top level

Richard Feldman (Feb 07 2024 at 05:03):

so the rule would be "top-level can't shadow" rather than "top-level can't be shadowed"

Brendan Hansknecht (Feb 07 2024 at 05:13):

Sounds fine

Isaac Van Doren (Feb 07 2024 at 13:00):

Yeah, that sounds desirable to me

Norbert Hajagos (Feb 07 2024 at 19:01):

Top level defs can't be shadowed seems arbitrary to me anyways. As a language user, knowing that top level definitions are constants doesn't change that fact that I may want to shadow them, so I actually prefer the more performant one from a design point as well.

Brian Carroll (Feb 07 2024 at 21:00):

Yeah this makes sense to me too.
The way I would say this is: top level values can be shadowed but not redeclared.
And that's great, I prefer it. Fine to reuse a name as long as you have scope to distinguish it.
And redeclaring in the module scope would feel weird to me, so I'm happy not to have it.

Andrew C (Feb 09 2024 at 01:08):

Please reconsider the decision to allow shadowing. It allows non local edits to code to change meaning. You're making confusing rules about where recursion can be used in a pure functional language just in order to be able to footgunly reuse variable names and confuse the poor maintainer of your code six months down the line.

How hard is it to add an extra character or two versus how hard it is to fix a bug that you couldn't see because you read the foo, parsed the foo, transformed the foo, updated the foo, used the foo and sent the foo, but we found out last week that somehow since we started logging the foo a month ago, it isn't getting transformed at all and all the logged foos are out of date because they weren't updated. If I'd been forced by the super fast, super helpful compiler to call it something ugly like updatedFoo two years ago , none of this would have happened.

There's a function called str already. Don't let me call my variable str. There's an x in scope already. Don't let me make a nested lambda with x as the new parameter. I'll get this x sometimes when I meant the other one. I won't be able to immediately see what I did wrong by reading the code.

Brendan Hansknecht (Feb 09 2024 at 03:01):

Yes, shadowing can be abused. Code style/review is important. Many languages have mutable variables. Many languages have shadowing. These languages are able to function just fine even with these features. This doesn't mean we should or shouldn't have shadowing, but it is an important piece of context.

Using many numeric suffixes on a variable in roc has already led to bugs and friction. So this isn't a clean tradeoff on one being buggy and messy while the other is clean and less buggy. This is a more complex tradeoff that needs more nuance in discussion. It isn't simply about adding an extra character to a variable name.

Anton (Feb 09 2024 at 10:00):

@Andrew C we plan on trying out shadowing, if it doesn't turn out good we'll change it.

Last updated: Jul 23 2026 at 13:15 UTC