Shadowing benefits via code acton and/or formatter · ideas

I really appreciate the benefits of not having shadowing, as detailed here: https://www.roc-lang.org/functional#no-reassignment I appreciate that it makes code easier to refactor (as described in the link) as well as making code easier to mentally process when reading as well (since we have the guarantee that the meaning of a name hasn't changed in a given context).

Perhaps these benefits (of not having shadowing) and most of the benefits of shadowing could both be achieved if the code formatter and/or language server take responsibility for re-naming duplicated variables. i.e.

    {state:seed, value:first} = generator seed
    {state:seed, value:second} = generator seed

    {state:seed2, value:first} = generator seed1
    {state:seed3, value:second} = generator seed2

With the 1 and 2... (above) automatically inserted by the code formatter/code action.
By the way, I do actually find the version with different names for different values clearer and easier to understand than the example with everything called seed).

    {state:seed2, value:first} = generator seed1
    {state:seed3, value:second} = generator seed2
    {state:seed3, value:third} = generator seed2         # duplicated line with "second" changed to "third"

    {state:seed2, value:first} = generator seed1
    {state:seed3, value:second} = generator seed2
    {state:seed4, value:third} = generator seed3         # duplicated line with "second" changed to "third"

Sven van Caem (Aug 14 2024 at 19:15):

I believe the current plan is to introduce shadowing into the language! I think this is an idea worth considering, though.

Sven van Caem (Aug 14 2024 at 19:16):

My worry would be that this gives the formatter license to make semantic alterations to your code, and it might be difficult to discover without being told this feature exists

Sven van Caem (Aug 14 2024 at 19:17):

But provided you understand why and are okay with the formatter doing this, it does kind of offer the best of both worlds

Matthias Toepp (Aug 14 2024 at 19:50):

If formatter is not acceptable, a lang server code action is another possibility.

Luke Boswell (Aug 14 2024 at 20:40):

Luke Boswell (Aug 14 2024 at 20:44):

I had a skim through the conversations, but it's not clear what the latest thinking is.

My understanding is that we are wanting to re-write Can at some point to enable incremental compilation, and include support for shadowing at the same time.

Richard Feldman (Aug 14 2024 at 21:00):

Richard Feldman (Aug 14 2024 at 21:01):

if we don't like it, then having it implemented means we can make it be a non-blocking warning instead of an error like it is today, which is an improvement over status quo regardless

Sam Mohr (Aug 14 2024 at 21:02):

Yes, all language parsers need to handle supersets of the real language to give useful errors!

Luke Boswell (Aug 14 2024 at 21:09):

### [Shadowing](#shadowing) {#shadowing}

Shadowing is [currently disallowed](https://www.roc-lang.org/functional#no-reassignment), which means that once a name has been assigned to a value, nothing in the same scope can assign it again.

The plan is to enable shadowing in a future re-write of the Canonicalisation pass as a trial to see if it's a good idea. If it turns out that shadowing isn't the best fit for roc, at least we will be able to provide a good warning for users.

Luke Boswell (Aug 14 2024 at 21:10):

Sam Mohr (Aug 14 2024 at 21:13):

Good overall, but two notes:
1) Can you link to the wiki for https://en.wikipedia.org/wiki/Canonicalization in the second sentence?
2) "If it turns out that shadowing isn't the best fit for Roc, we'll remove it as we've done for other experiments, e.g. backpassing." I think explaining the value of having implemented shadowing is not helpful

Sam Mohr (Aug 14 2024 at 21:39):

Matthias Toepp (Aug 14 2024 at 23:08):

I was hoping to suggest that the idea proposed here might be a better approach than trying shadowing.:grinning_face_with_smiling_eyes:

Richard Feldman (Aug 14 2024 at 23:31):

I appreciate that, but I think based on the mountain of discussion on the topic, at this point "don't even try it once" would be a pretty clear mistake

Matthias Toepp (Aug 14 2024 at 23:49):

If roc could retain the clarity and refactoring benefits of no shadowing, and also have a language server or code formatter that could manage naming so that you could write code with the benefits of shadowing, isn't that better then trying only one set of benefits?

Richard Feldman (Aug 15 2024 at 00:01):

part of the thing that's been discussed a ton is that there are examples of code being clearer and less error-prone when shadowing is permitted

Richard Feldman (Aug 15 2024 at 00:02):

there are examples on both sides, of cases where the code in question is best off if shadowing is allowed, and other cases where the code in question is best off if shadowing is disallowed

Richard Feldman (Aug 15 2024 at 00:03):

so in general I always appreciate trying to find a way of "what if we could get the benefit without sacrificing the guarantee" but unfortunately in this case the benefit is how the code looks

Richard Feldman (Aug 15 2024 at 00:04):

so making the code look un-shadowed doesn't give us data on what it feels like to have shadowing

Matthias Toepp (Aug 15 2024 at 00:25):

It just seems to me that this is one of those things where you have clear disadvantage in giving up shadowing, and it doesn't seem mysterious how things will be.

I've included one of the key examples above and to me it looks uglier perhaps but it's still more understandable without shadowing. But thanks for considering.

Matthias Toepp (Aug 15 2024 at 06:08):

I'll copy and paste the following here from the website for posterity as presumably this will be deleted from the website, but it gives a nice description of the advantages and disadvantages of shadowing.

No reassignment or shadowing

In Roc, this will give a compile-time error. Once a name has been assigned to a value, nothing in the same scope can assign it again. (This includes shadowing, which is disallowed.)

This can make Roc code easier to read, because the answer to the question "might this have a different value later on in the scope?" is always "no."

That said, this can also make Roc code take longer to write, due to needing to come up with unique names to avoid shadowing—although pipelining (as shown in the previous section) reduces how often intermediate values need names.

Avoiding regressions

A benefit of this design is that it makes Roc code easier to rearrange without causing regressions. Consider this code:

# …

message = welcome "friend"

# …

Suppose I decide to extract the welcome function to the top level, so I can reuse it elsewhere:

message = welcome "Hello" "friend"

# …

Even without knowing the rest of func, we can be confident this change will not alter the code's behavior.

In contrast, suppose Roc allowed reassignment. Then it's possible something in the # … parts of the code could have modified greeting before it was used in the message = declaration. For example:

# …

if someCondition then
    greeting = "Hi"
    # …
else
    # …

# …
message = welcome "friend"
# …

If we didn't read the whole function and notice that greeting was sometimes (but not always) reassigned from "Hello" to "Hi", we might not have known that changing it to message = welcome "Hello" "friend" would cause a regression due to having the greeting always be "Hello".

Even if Roc disallowed reassignment but allowed shadowing, a similar regression could happen if the welcome function were shadowed between when it was defined here and when message later called it in the same scope. Because Roc allows neither shadowing nor reassignment, these regressions can't happen, and rearranging code can be done with more confidence.

In fairness, reassignment has benefits too. For example, using it with early-exit control flow operations such as a break keyword can be a nice way to represent certain types of logic without incurring extra runtime overhead.

Roc does not have early-exits or loop syntax; looping is done either with convenience functions like List.walkUntil or with recursion (Roc implements tail-call optimization, including modulo cons), but early-exit operators can potentially make some code easier to follow (and potentially even slightly more efficient) when used in scenarios where breaking out of nested loops with a single instruction is desirable.

Richard Feldman (Aug 15 2024 at 11:42):

Matthias Toepp (Aug 16 2024 at 12:53):

As far as has been discussed, if roc had no support for shadowing and support for automatic renaming of names as suggested in this idea, then the only remaining disadvantage is aesthetic correct? I.e. that it will look "cleaner" with shadowing?

Richard Feldman (Aug 16 2024 at 14:32):

no, there are also cases where you can intentionally choose to shadow something in order to prevent accidentally reusing a stale value that would result in bugs

Richard Feldman (Aug 16 2024 at 14:34):

for example, instead of the pattern where you have foo and then later updatedFoo and then later finalFoo, you can use foo and then shadow foo in the other two places, which prevents accidentally referencing the stale updatedFoo when you should have referenced finalFoo, or (more likely) the very stale foo when you should have said updatedFoo or finalFoo

Matthias Toepp (Aug 16 2024 at 15:19):

Richard Feldman (Aug 16 2024 at 15:40):

in that if you come back and make changes later after the transformation has been done, you're back in status quo land

Richard Feldman (Aug 16 2024 at 15:41):

just to be super clear by the way, I have probably spent over 100 hours talking and thinking about shadowing in Roc, and at this point I am not open to the idea of not trying it.

Richard Feldman (Aug 16 2024 at 15:42):

I'm completely open to the possibility that we try it and don't like it and go back

Richard Feldman (Aug 16 2024 at 15:43):

but I don't want to give you the misimpression that maybe if this idea is good enough we won't even try shadowing after all :big_smile:

Matthias Toepp (Aug 16 2024 at 16:41):

I don't think that's true with this proposal. The idea is that you could keep using the base name or even a mis-numbered name and the numbers are managed for you. It's similar to how if you use a mark down editor where the editor can manage the numbers of a list for you even if you add an item in the middle. (Except that the numbers are on the right end of a name instead the left of a numbered list)

It's fine if you want to give up the advantages of no shadowing for basically just the asthetic aspect of it, it's your choice, I just wanted it to be clear that that is what seems to be happening, since it seems, at this point, that all other concerns could be handled with this proposal.

Matthias Toepp (Aug 16 2024 at 17:55):

I expect shadowing to fail silently. I think people will be delighted with shadowing and sing its praises and the loss will not be felt but still be real.

Anton (Aug 16 2024 at 18:04):

That's possible, given that we both make and use Roc, I do think we're more likely to be attentive to such losses.

Matthias Toepp (Aug 16 2024 at 18:09):

It sure seems like you guys have got an awful lot right so far. Congratulations on that!!!! :tada:

Brendan Hansknecht (Aug 16 2024 at 18:12):

One thing to add extra color here. I think this formatter idea would actually be super complex.

{state:seed2, value:first} = generator seed1
{state:seed3, value:second} = generator seed2
{state:seed3, value:third} = generator seed2

This sort of error is the simple case. In reality, it gets a lot more complex and it might actually be an optimization that is pretty complex to apply to the parse ast that the formatter is using.

You have to deal with control flow, intentional reuse of old values, and lambda argument names at a minimum:

(seed2, first) = genTag seed1
(seed6, computed1) =
    when first is
        Up ->
             (seed3, out) = genInt seed2
             (seed3, someData + out)
        Down ->
             if safe a b then
                 (seed3, x) = genInt seed2
                 (seed4, y) = genInt seed3
                 (seed5, z) = genInt seed4
                 (seed5, someData + x*y + z)
             else
                 (seed2, someData)
        Left ->
            # more complex logic

(seed17, computed2) =
    # More complex generation
runSomethingComplex seed17 computed1 computed2 \seed18 -> ...

# Note, I have seen cases where here you may reset the seed back to an old state on failure.
# So this might intentionally do more compute with seed2.

Tracking these numbers becomes really complex. You also likely will make mistakes if a users wants input1 and input2, which is really common. I think the above is at a minimum very noisy. Though I would label it as much harder to follow than the shadowed equivalent.

Matthias Toepp (Aug 16 2024 at 18:28):

Why is it harder to follow? Having matching names for things that are the same and different names for different things seems to make it clear what is being referred to.

Sam Mohr (Aug 16 2024 at 18:32):

I'd say that humans are much better at understanding semantically different things than numerically different things, which is why the meme correct horse battery staple floats around. When we redefine (_, seed) multiple times, seed is semantically the same thing, and the downwards flow of the program intuitively tells us which seed depends on which. However, if we have seed1, seed2, and seed3, we now need to carry in our head which seed is which.

Sam Mohr (Aug 16 2024 at 18:33):

It seems minor (because it is), but communicating via variable names can be over-complicated by imbuing the lifetime of the variable in its name, which is what the seed1, seed2, etc. approach does

Matthias Toepp (Aug 16 2024 at 18:44):

Honestly for me, when I read that code I kind of feel re-asured that I'm correctly finding the matching variables by using the numbers. I.e. the numbers are actually not just noise for me. They are literally making it possible to be quickly confident about what things are referring to.

Brendan Hansknecht (Aug 16 2024 at 18:55):

I think it is a case where if you scan all of the code linearly, it has a manually tracking that is possible. That said, there are still really inconvenient exceptions. One simple example is that to ensure seed6 is the correct name, you need to scan every single branch of the when ... is. Also, it gets really confusing to understand intentional vs accidental reuse. Did the author mean to have compute1 compute2 or should that be compute2 compute2? In general, is compute1 and compute2 supposed to be reuse or is it two distinct values? Did the author mean to jump back and use seed7 or should that be seed12?

I think this becomes especially complex when jumping into the middle of a chunk of code. You might see an error due to a bug in the middle of a block of code written like this. It is very hard to start reasoning if you start with seed13. So you might need to go back to the top of the code, linearly scan it and understand it much more deeply before you are able to interact with it.

If on the other hand, you just have seed and oldSeed where oldSeed is explicitly reused later on during an error case. You don't need to question it at all.

Brendan Hansknecht (Aug 16 2024 at 18:57):

Brendan Hansknecht (Aug 16 2024 at 18:58):

oldSeed = seed
(seed, first) = genTag seed
(seed, second) = genTag seed
(seed, third) = genTag seed

Here, the authre is explicitly letting me know that an old version of seed will be reused. Also, they are letting me know that know of the intermediate values of seed matter.

Brendan Hansknecht (Aug 16 2024 at 18:59):

I do think that accidental shadowing can lead to bugs (most often happens with nested scopes and far away code), but shadowing is not simply a loss of information for aesthetics. It also is an explicit communication of certain information.

Matthias Toepp (Aug 16 2024 at 19:13):

Yes, thank you. I see now that there is something that shadowing offers over the proposal beyond a suposidly cleaner look. With real shadowing old values become unreachable and and there is a simplification in that as well. I.e. there are less names and less values in scope at a certain point with shadowing. Well darn it all! :smile:

Matthias Toepp (Aug 16 2024 at 19:29):

Matthias Toepp (Aug 16 2024 at 19:36):

Matthias Toepp (Aug 17 2024 at 09:19):

I'm trying to process what the consequence of this limitation is for the proposed idea.... Optimistically it seems to mean that with this proposal you retain the benefits of no shadowing, and you can write code into an editor as if roc had shadowing, and assistance is provided to add numbering to make names unique and the numbering is managed for you. (So far so good).

Assuming that this works and could be implemented... it still seems that when you write code as if roc had shadowing and when you know that this auto-numbering system has operated successfully AND the code compiles, then it seems you would have all the benefits of shadowing and no shadowing (except arguably the asthetic aspect).

It seems that the second rate aspect of this, compared to actually having the best of both shadowing and no shadowing, is the word AND (in the previous paragraph).... To have confidence that you are reading the equivalent of shadowing you must know that both the compiler AND auto-numbering system have completed successfully. Outside of an editor you wouldn't necessarily know whether that was the case or not! In this scenario, you may well only know that the code compiles, in which case you could not read shadow-simulated code with the confidence of knowing that it is the equivalent of actual shadowing and you would be left to your own devices to track the names accross the full code, as one would currently be left to do in unshadowed code.

Inside an editor, however, it seems that the programmer could still have all the advantages of shadowing and no shadowing as there they could be made aware if the code doesn't compile or conform to the auto-numbering system.

I know I haven't specifically addressed everything that has been said, but that seems to sum up my conclusion at this point. And boy am I ever looking forward to finding out where I'm wrong! :upside_down:

Matthias Toepp (Aug 18 2024 at 08:59):

@Richard Feldman @Brendan Weibrecht I promise this is my last word on this, at this stage.

The only thing I'd like people to take away from this is that its one thing to explore the addition of shadowing, a new feature which also undoes some of the qualities of roc, a tradeoff that may be worth the cost... but just please also keep in mind that exploring the possibilities of tooling without shadowing to address these pain points and provide the desired features may be surprisingly fruitful and may not yet have been extensively explored either, and should potentially be a part of the final analysis of how things can be without shadowing before the tradeoffs of shadowing are fully accepted.

Thanks so much for building something beautiful! There is much more to be said in defense of this proposal but I want to be respectful and not take up more bandwidth.

Joshua Warner (Aug 18 2024 at 15:41):

If we're discussing tooling changes, I think that can be equivalently done by having the editor automatically re-number shadowed variables when loading a file, and "collapse" that numbering when saving (when possible, anyway). Or this can even be a "display-only" renumbering, where your IDE adds non-editable suffixes to variable names, kinda like those type annotation hints that rust analyzer / vscode add in rust.

I point this out not because I think it's actually the right direction (I don't know!) - but just to make it clear that tooling can help improve code understandability on either side of the decision boundary.

Stream: ideas

Topic: Shadowing benefits via code acton and/or formatter

Matthias Toepp (Aug 14 2024 at 18:40):

Sven van Caem (Aug 14 2024 at 19:15):

Sven van Caem (Aug 14 2024 at 19:16):

Sven van Caem (Aug 14 2024 at 19:17):

Matthias Toepp (Aug 14 2024 at 19:50):

Luke Boswell (Aug 14 2024 at 20:40):

Luke Boswell (Aug 14 2024 at 20:44):

Richard Feldman (Aug 14 2024 at 21:00):

Richard Feldman (Aug 14 2024 at 21:00):

Richard Feldman (Aug 14 2024 at 21:01):

Sam Mohr (Aug 14 2024 at 21:02):

Luke Boswell (Aug 14 2024 at 21:09):

Luke Boswell (Aug 14 2024 at 21:10):

Sam Mohr (Aug 14 2024 at 21:13):

Sam Mohr (Aug 14 2024 at 21:39):

Matthias Toepp (Aug 14 2024 at 23:08):

Richard Feldman (Aug 14 2024 at 23:31):

Matthias Toepp (Aug 14 2024 at 23:49):

Richard Feldman (Aug 15 2024 at 00:01):

Richard Feldman (Aug 15 2024 at 00:02):

Richard Feldman (Aug 15 2024 at 00:03):

Richard Feldman (Aug 15 2024 at 00:04):

Matthias Toepp (Aug 15 2024 at 00:25):

Matthias Toepp (Aug 15 2024 at 06:08):

No reassignment or shadowing

Avoiding regressions

Richard Feldman (Aug 15 2024 at 11:42):

Matthias Toepp (Aug 16 2024 at 12:53):

Richard Feldman (Aug 16 2024 at 14:32):

Richard Feldman (Aug 16 2024 at 14:34):

Matthias Toepp (Aug 16 2024 at 15:19):

Richard Feldman (Aug 16 2024 at 15:40):

Richard Feldman (Aug 16 2024 at 15:40):

Richard Feldman (Aug 16 2024 at 15:41):

Richard Feldman (Aug 16 2024 at 15:42):

Richard Feldman (Aug 16 2024 at 15:43):

Matthias Toepp (Aug 16 2024 at 16:41):

Matthias Toepp (Aug 16 2024 at 17:55):

Anton (Aug 16 2024 at 18:04):

Matthias Toepp (Aug 16 2024 at 18:09):

Brendan Hansknecht (Aug 16 2024 at 18:12):

Matthias Toepp (Aug 16 2024 at 18:28):

Sam Mohr (Aug 16 2024 at 18:32):

Sam Mohr (Aug 16 2024 at 18:33):

Matthias Toepp (Aug 16 2024 at 18:44):

Brendan Hansknecht (Aug 16 2024 at 18:55):

Brendan Hansknecht (Aug 16 2024 at 18:57):

Brendan Hansknecht (Aug 16 2024 at 18:58):

Brendan Hansknecht (Aug 16 2024 at 18:59):

Matthias Toepp (Aug 16 2024 at 19:13):

Matthias Toepp (Aug 16 2024 at 19:29):

Matthias Toepp (Aug 16 2024 at 19:36):

Matthias Toepp (Aug 17 2024 at 09:19):

Matthias Toepp (Aug 18 2024 at 08:59):

Joshua Warner (Aug 18 2024 at 15:41):