I haven't touched any code today. Instead I spent my time re-reading through the design documents and trying to understand/scope out the intended behaviour of Scope
s specifically... and their interaction with lambdas, var
re-assignment, and the loops for
/while
.
I've put it all together into the following design document
https://gist.github.com/lukewilliamboswell/9175249d51fa89b26d7a32bd308fc531
I've been through the scenarios a few times, but I'm not super confident I've got everything correct. I feel like I'm close, but would appreciate a second pair of eyes to sense check the behaviour.
It will be much easier for me to implement (and verify correctness) once I know for sure how it's meant to behave.
I read this today and discussed in detail with Luke, but in summary this is my feedback:
var
's are illegal, but their ident should still be introducedfor
isn't specialLevels
should be renamed Scope
Scope
s are owned by the canonicalizer and doesn't need a separate struct to own/manage them. So get rid of the current Scope
struct and move the functions for managing the Levels
if they are useful to the Canonicalizer
.var
can be used. Is it a special type of statement as it is now in the ParseIR, or is it a part of a pattern? I think the latter has some issues.x
and x_
as different identifiers.I thought we needed Scope
in the parser to avoid an extra pass through the AST. Anthony has an idea for how we avoid that. It sounds like a good idea to try. If we can keep the Scope
functionality in Can, then we should rename Level
to Scope
and move the levels: std.ArrayListUnmanaged(Level) = .{},
-> scopes: std.ArrayListUnmanaged(Level) = .{},
into src/check/canonicalize.zig
.
I don't think a single iteration through the top-level decls in a module is expensive enough to justify Scope
in parsing
And we can toss levels away after Can, as we only need CanIR and the exposed members after that
It wouldn't be that hard to change it later if we decided we really needed to save a pass through the AST (in the worst case)
To be clear, it's not a pass through the AST. The module has the top-level decl StatementIds in a slice, iteration should be _very_ straightforward as we only need to find decls and introduce idents in the patterns
- I also would prefer if we always treat
x
andx_
as different identifiers.
Does that mean I can have both of these next to each other?
x = 2
var x_ = 3
Luke Boswell said:
- I also would prefer if we always treat
x
andx_
as different identifiers.Does that mean I can have both of these next to each other?
x = 2 var x_ = 3
Yes. Though I doubt that exact scenario would often creep up.
You might want to do something like
var count_ = 0
for something in some_list {
count = something.get_count()
count_ = if count > 5 count_ + count else count_
}
Which looks dumb but in a realistic scenario like this, not naming count
that seems like an unnecessary inconvenience
My thought process is a bit silly, consider these scenarios the _
;
100_000_000
is a separator for thousands_foo
means this ident foo
is not usedfoo_
means this ident foo
is re-usedSo it feels to me more like a semantic meaning thing that modifies an ident, and not a syntax thing.
So, do you think _foo
and foo
are different?
They feel like the same thing.
Interesting. I disagree, but we should get a lot of feedback and thoughts on it.
I also don't mind (and actually kind of love) being able to use '
as much as you want at the end of an ident like you can in OCaml and F#
And they are all unique
_foo
andfoo
are different
This seems to be the simplest approach
yeah
I appreciate the concerns about it, but in literally every language I've ever heard of, identifiers are different if their names are different, full stop
I don't think this is worth the strangeness cost
regarding reassignment being disallowed across function boundaries, here's what I was thinking:
var
, we record the Region of the current function along with it, so later we can ask "what function was this var
declared in?"var
, go ask that var
what function it was declared in (which we just wrote down in the previous step)happy to elaborate/clarify/restate any of that :smile:
Richard Feldman said:
I appreciate the concerns about it, but in literally every language I've ever heard of, identifiers are different if their names are different, full stop
I can think of a few exceptions, but they are very niche.
What does for
is not special mean?
Brendan Hansknecht said:
What does
for
is not special mean?
just means that for introduces a normal scope just like a when branch, not anything special like a function scope
I've updated my scope analysis above. I've also started implementing this refactor.
There's a few things that need implementing/fixing along the way before I can properly validate the behaviour is correct :sweat_smile:
such as...?
Parsing statements in lambdas
Ok
that should be relatively easy if we constrain it at first to decls and a limited set of exprs
My methodology is basically write out a snapshot test that I want to work.
Then step through the tokens, parser, problems etc and work my way down the compiler stages ensuring everything is behaving the way I think it should.
and then we can move to ifs, when's, crash, expect, etc
so is that what you would like me to help
with?
No thank you. I'm chipping away at that. I'm posting things here just to keep everyone informed of what I'm doing so we avoid duplicating efforts.
We talked about a top-level var being an error, but we thought we should introduce it to our scope anyway. It's currently a parser error so we have a malformed node and therefore we don't have any information to handle it in Can.
~~~SOURCE
module []
# This should cause an error - var not allowed at top level
var topLevelVar_ = 0
(file (1:1-39:33)
(module (1:1-1:10) (exposes (1:8-1:10)))
(statements
(malformed_stmt (4:1-4:4) "var_only_allowed_in_a_body")
If we want to continue, we could potentially make this a diagnostic instead of a malformed AST node, then convert it to an assignment without the var
or trailing underscore. This would be a change to the Parser.
#7842 -- DRAFT
Implements some of what I described in the Scope behaviour above. I'd like to re-read with fresh eyes tomorrow before marking as ready for review.
Here is a demo of it so far...
module []
my_long_ident = "global"
foo = |_| {
my_long_ident = "shadowing here"
var sum_ = 0
sum_ = sum_ + 1
sum_ = sum_ + 1
sum_ = sum_ + 1
sum_
}
Screenshot 2025-06-19 at 18.38.10.png
I am reasonably sure the CI failures are from snapshots we have that include invalid things in them, and then when a PROBLEM includes a slice of the original source this causes issues across OS's. I need to investigate further if we want to have pretty rendered problem reports included in snapshot files.
One hack solution might be to add a flag in the META
section to include pretty rendered problems, otherwise by default it just prints out the tag name, for example PARSER .not_implemented
.
I know I would prefer to have the pretty printed errors so that we can have the snapshots also acting as test for the problem reports as well.
We can have them. Maybe the flag behaviour is to turn the pretty off instead. So we could flag snapshots that are deliberately testing "misuse" or bad utf8 etc.
Last updated: Jul 06 2025 at 12:14 UTC