Scope behaviour · compiler development

I haven't touched any code today. Instead I spent my time re-reading through the design documents and trying to understand/scope out the intended behaviour of Scopes specifically... and their interaction with lambdas, var re-assignment, and the loops for/while.

I've put it all together into the following design document
https://gist.github.com/lukewilliamboswell/9175249d51fa89b26d7a32bd308fc531

I've been through the scenarios a few times, but I'm not super confident I've got everything correct. I feel like I'm close, but would appreciate a second pair of eyes to sense check the behaviour.

It will be much easier for me to implement (and verify correctness) once I know for sure how it's meant to behave.

Anthony Bullard (Jun 18 2025 at 11:38):

I read this today and discussed in detail with Luke, but in summary this is my feedback:

Top level var's are illegal, but their ident should still be introduced
for isn't special
Levels should be renamed Scope
Scopes are owned by the canonicalizer and doesn't need a separate struct to own/manage them. So get rid of the current Scope struct and move the functions for managing the Levels if they are useful to the Canonicalizer.
We need to align with Richard on how var can be used. Is it a special type of statement as it is now in the ParseIR, or is it a part of a pattern? I think the latter has some issues.
I also would prefer if we always treat x and x_ as different identifiers.

Luke Boswell (Jun 18 2025 at 11:44):

I thought we needed Scope in the parser to avoid an extra pass through the AST. Anthony has an idea for how we avoid that. It sounds like a good idea to try. If we can keep the Scope functionality in Can, then we should rename Level to Scope and move the levels: std.ArrayListUnmanaged(Level) = .{}, -> scopes: std.ArrayListUnmanaged(Level) = .{}, into src/check/canonicalize.zig.

Anthony Bullard (Jun 18 2025 at 11:46):

I don't think a single iteration through the top-level decls in a module is expensive enough to justify Scope in parsing

Anthony Bullard (Jun 18 2025 at 11:47):

And we can toss levels away after Can, as we only need CanIR and the exposed members after that

Luke Boswell (Jun 18 2025 at 11:47):

It wouldn't be that hard to change it later if we decided we really needed to save a pass through the AST (in the worst case)

Anthony Bullard (Jun 18 2025 at 11:49):

To be clear, it's not a pass through the AST. The module has the top-level decl StatementIds in a slice, iteration should be _very_ straightforward as we only need to find decls and introduce idents in the patterns

Luke Boswell (Jun 18 2025 at 11:49):

I also would prefer if we always treat x and x_ as different identifiers.

Does that mean I can have both of these next to each other?

x = 2
var x_ = 3

Anthony Bullard (Jun 18 2025 at 11:49):

Luke Boswell said:

I also would prefer if we always treat x and x_ as different identifiers.

Does that mean I can have both of these next to each other?
x = 2
var x_ = 3

Yes. Though I doubt that exact scenario would often creep up.

You might want to do something like

var count_ = 0
for something in some_list {
    count = something.get_count()
    count_ = if count > 5 count_ + count else count_
}

Which looks dumb but in a realistic scenario like this, not naming count that seems like an unnecessary inconvenience

Luke Boswell (Jun 18 2025 at 11:56):

My thought process is a bit silly, consider these scenarios the _;

in 100_000_000 is a separator for thousands
in _foo means this ident foo is not used
in foo_ means this ident foo is re-used

So it feels to me more like a semantic meaning thing that modifies an ident, and not a syntax thing.

Anthony Bullard (Jun 18 2025 at 11:59):

So, do you think _foo and foo are different?

Luke Boswell (Jun 18 2025 at 12:01):

They feel like the same thing.

Anthony Bullard (Jun 18 2025 at 12:05):

Interesting. I disagree, but we should get a lot of feedback and thoughts on it.

Anthony Bullard (Jun 18 2025 at 12:06):

I also don't mind (and actually kind of love) being able to use ' as much as you want at the end of an ident like you can in OCaml and F#

Anthony Bullard (Jun 18 2025 at 12:06):

And they are all unique

Anton (Jun 18 2025 at 12:29):

_foo and foo are different

This seems to be the simplest approach

Richard Feldman (Jun 18 2025 at 12:56):

yeah

Richard Feldman (Jun 18 2025 at 12:56):

I appreciate the concerns about it, but in literally every language I've ever heard of, identifiers are different if their names are different, full stop

Richard Feldman (Jun 18 2025 at 12:57):

I don't think this is worth the strangeness cost

Richard Feldman (Jun 18 2025 at 13:04):

regarding reassignment being disallowed across function boundaries, here's what I was thinking:

when we're canonicalizing and we encounter a lambda, push its Region onto a list
when we leave that lambda, pop the list
now the "current function" is always the last entry in that list (only its Region; I'll explain why later)
when we see a var, we record the Region of the current function along with it, so later we can ask "what function was this var declared in?"
when we've just decided something is a reassignment, based on it resolving to a var, go ask that var what function it was declared in (which we just wrote down in the previous step)
if that's different from the current function, give a hard error and emit a crash instead of a Reassign (the rest of the compiler will not be capable of making reassignment across function boundaries Just Work, so we have to crash if you try to do that)
the hard error needs the Region in order to report where the current function is, and that's all it needs - so that's why we only bother writing that down. (We don't yet have a Can Idx since we just started working on the function, the region uniquely identifies it, and we need the region anyway for error reporting)

Richard Feldman (Jun 18 2025 at 13:18):

happy to elaborate/clarify/restate any of that :smile:

Brendan Hansknecht (Jun 18 2025 at 16:39):

Richard Feldman said:

I appreciate the concerns about it, but in literally every language I've ever heard of, identifiers are different if their names are different, full stop

I can think of a few exceptions, but they are very niche.

Brendan Hansknecht (Jun 18 2025 at 16:39):

What does for is not special mean?

Anthony Bullard (Jun 18 2025 at 17:27):

Brendan Hansknecht said:

What does for is not special mean?

just means that for introduces a normal scope just like a when branch, not anything special like a function scope

Luke Boswell (Jun 18 2025 at 23:13):

I've updated my scope analysis above. I've also started implementing this refactor.

There's a few things that need implementing/fixing along the way before I can properly validate the behaviour is correct :sweat_smile:

Anthony Bullard (Jun 18 2025 at 23:40):

such as...?

Luke Boswell (Jun 18 2025 at 23:41):

Parsing statements in lambdas

Anthony Bullard (Jun 18 2025 at 23:41):

Anthony Bullard (Jun 18 2025 at 23:42):

that should be relatively easy if we constrain it at first to decls and a limited set of exprs

Luke Boswell (Jun 18 2025 at 23:42):

My methodology is basically write out a snapshot test that I want to work.

Then step through the tokens, parser, problems etc and work my way down the compiler stages ensuring everything is behaving the way I think it should.

Anthony Bullard (Jun 18 2025 at 23:42):

and then we can move to ifs, when's, crash, expect, etc

Anthony Bullard (Jun 18 2025 at 23:43):

so is that what you would like me to help
with?

Luke Boswell (Jun 18 2025 at 23:46):

No thank you. I'm chipping away at that. I'm posting things here just to keep everyone informed of what I'm doing so we avoid duplicating efforts.

Luke Boswell (Jun 19 2025 at 00:20):

We talked about a top-level var being an error, but we thought we should introduce it to our scope anyway. It's currently a parser error so we have a malformed node and therefore we don't have any information to handle it in Can.

~~~SOURCE
module []

# This should cause an error - var not allowed at top level
var topLevelVar_ = 0

(file (1:1-39:33)
    (module (1:1-1:10) (exposes (1:8-1:10)))
    (statements
        (malformed_stmt (4:1-4:4) "var_only_allowed_in_a_body")

If we want to continue, we could potentially make this a diagnostic instead of a malformed AST node, then convert it to an assignment without the var or trailing underscore. This would be a change to the Parser.

Luke Boswell (Jun 19 2025 at 08:38):

#7842 -- DRAFT

Implements some of what I described in the Scope behaviour above. I'd like to re-read with fresh eyes tomorrow before marking as ready for review.

Here is a demo of it so far...

module []

my_long_ident = "global"

foo = |_| {
    my_long_ident = "shadowing here"

    var sum_ = 0

    sum_ = sum_ + 1
    sum_ = sum_ + 1
    sum_ = sum_ + 1

    sum_
}

Screenshot 2025-06-19 at 18.38.10.png

Luke Boswell (Jun 19 2025 at 08:42):

I am reasonably sure the CI failures are from snapshots we have that include invalid things in them, and then when a PROBLEM includes a slice of the original source this causes issues across OS's. I need to investigate further if we want to have pretty rendered problem reports included in snapshot files.

One hack solution might be to add a flag in the META section to include pretty rendered problems, otherwise by default it just prints out the tag name, for example PARSER .not_implemented.

Anthony Bullard (Jun 19 2025 at 10:11):

I know I would prefer to have the pretty printed errors so that we can have the snapshots also acting as test for the problem reports as well.

Luke Boswell (Jun 19 2025 at 11:23):

We can have them. Maybe the flag behaviour is to turn the pretty off instead. So we could flag snapshots that are deliberately testing "misuse" or bad utf8 etc.

Last updated: Jul 26 2025 at 12:14 UTC

Stream: compiler development

Topic: Scope behaviour

Luke Boswell (Jun 18 2025 at 10:51):

Anthony Bullard (Jun 18 2025 at 11:38):

Luke Boswell (Jun 18 2025 at 11:44):

Anthony Bullard (Jun 18 2025 at 11:46):

Anthony Bullard (Jun 18 2025 at 11:47):

Luke Boswell (Jun 18 2025 at 11:47):

Anthony Bullard (Jun 18 2025 at 11:49):

Luke Boswell (Jun 18 2025 at 11:49):

Anthony Bullard (Jun 18 2025 at 11:49):

Luke Boswell (Jun 18 2025 at 11:56):

Anthony Bullard (Jun 18 2025 at 11:59):

Luke Boswell (Jun 18 2025 at 12:01):

Anthony Bullard (Jun 18 2025 at 12:05):

Anthony Bullard (Jun 18 2025 at 12:06):

Anthony Bullard (Jun 18 2025 at 12:06):

Anton (Jun 18 2025 at 12:29):

Richard Feldman (Jun 18 2025 at 12:56):

Richard Feldman (Jun 18 2025 at 12:56):

Richard Feldman (Jun 18 2025 at 12:57):

Richard Feldman (Jun 18 2025 at 13:04):

Richard Feldman (Jun 18 2025 at 13:18):

Brendan Hansknecht (Jun 18 2025 at 16:39):

Brendan Hansknecht (Jun 18 2025 at 16:39):

Anthony Bullard (Jun 18 2025 at 17:27):

Luke Boswell (Jun 18 2025 at 23:13):

Anthony Bullard (Jun 18 2025 at 23:40):

Luke Boswell (Jun 18 2025 at 23:41):

Anthony Bullard (Jun 18 2025 at 23:41):

Anthony Bullard (Jun 18 2025 at 23:42):

Luke Boswell (Jun 18 2025 at 23:42):

Anthony Bullard (Jun 18 2025 at 23:42):

Anthony Bullard (Jun 18 2025 at 23:43):

Luke Boswell (Jun 18 2025 at 23:46):

Luke Boswell (Jun 19 2025 at 00:20):

Luke Boswell (Jun 19 2025 at 08:38):

Luke Boswell (Jun 19 2025 at 08:42):

Anthony Bullard (Jun 19 2025 at 10:11):

Luke Boswell (Jun 19 2025 at 11:23):