You should probably scroll near the bottom for the actual meat of this convo, which started off talking about parser issues and a different solution
A similar issue has reared it's head, but now it is with type annos as the first statement. I even left a comment about it in the parser
Without requiring a (new) token like NoSpaceOpColon be used for record fields (and forbidding it in type annotation statements). Without doing that, the only way to tell that a Expr introduced by an OpenCurly is a block (versus a record) is to have a backtracking function that tries to parse a record to completion (without saving the nodes) if we see a LowerIdent followed by a OpColon
Because today, this:
foo = |x| {
something : a
}
Should be valid, parseable code - but it is unclear _what_ it should be. Do we have a function that returns a record with a something field with the value of a? Or a block with a single, useless type annotation statement in it?
But with my suggestion the above would always be the later and:
foo = |x| {
something: a
}
Would always be the former
But it does come at a cost of flexibility
Another, worse idea would be that Records require a sigil before the OpenCurly (like zig). I don't condone or promote that idea at all, but it is a valid solution.
So TL;DR:
I recall, we had a subtle thing with formatting where the types would have a space, but the values wouldn't.
my_record : {
name : Str,
age : U64,
}
my_record = {
name: "foo",
age: 42,
}
Yep, and inside of a Anno or Pattern Record Field it's not really a problem to parse. It is only the one that separates the header from the anno
So we would need to throw an error on:
my_record: {
name : Str,
age : U64,
}
But
my_record : {
name: Str,
age: U64,
}
Is fine to parse
But I know that doesn't have design consistency with say = in decls
Which need not worry about whitespace on either side
It also disallows comments after the header and before the OpColon
But I think it's better than
I'm not opposed to the sigil idea tbh
There is plenty of prior art for that, but that's a big change for an important data structure
I don't like the idea of introduce whitespace significance (after we just removed it)
Dude
Also there is now a very subtle difference between OpColon and NoSpaceOpColon
The most obvious, least offensive move would be to have type annos use ::
Like Haskell
OpDoubleColon
I haven't spent a lot of time writing the new syntax but while writing these snapshots for records... I have been finding the "is this a block or a record" a little confusing
Anthony Bullard said:
Because today, this:
foo = |x| { something : a }Should be valid, parseable code - but it is unclear _what_ it should be. Do we have a function that returns a record with a something field with the value of a? Or a block with a single, useless type annotation statement in it?
Maybe we should just make a rule that this parses one way or the other. How big of an issue would it be I wonder
Changing the syntax would be a pretty major downside. Making the parser slightly less performant or tolerant might be acceptable.
the only way to tell that a Expr introduced by an OpenCurly is a block (versus a record) is to have a backtracking function that tries to parse a record to completion (without saving the nodes) if we see a LowerIdent followed by a OpColon
How bad is backtracking in this situation?
it depends on the annotation
could be very bad in situations with a big annotation
but i think the confusion is a bigger deal
we started using : in annos when we didn't have {} delimited blocks
this would be very clear :
foo = |x| {
something :: a
}
that's a block and either
foo = |x| {
something : a
}
or
foo = |x| {
something: a
}
would be a record and there is NO room for confusion by the actual human
and the latter would be the canonical formatting for records
And there is no change needed in a TypeRecordField
so
my_record : {
name: Str,
age: U64,
}
would just be
my_record :: {
name: Str,
age: U64,
}
It shifts the annotation by one character in single line form
my_record :: { name : Str, age : U64 }
my_record = { name: "john", age: 64 }
my_list :: List(Str)
my_list = [ "one", "two", "three"]
my_int :: U64
my_int = 42
that's true, but hasn't bothered Haskell users for the past 27 years :rolling_on_the_floor_laughing:
We're not using :: anywhere else right?
Wait... does the :: only go in the declaration part, not inside the record??
yes
the record could use the same as in pattern or expr record
see above sample
here's some haskell for example
main = do
str <- getContents
let rna :: [RNA]
rna = map (\c -> read [c]) str
let aminoAcids :: [AminoAcid]
aminoAcids = decodeAll rna
putStrLn (concatMap show aminoAcids)
I'm trying to think of a valid example where we have a lower ident after the colon. Is this the only way?
foo : List(a) -> U64
foo = |x| {
something : a # refers to the `a` from the foo annotation
something = x
x.len()
}
if you are on desktop and can do it, could you move all of this from https://roc.zulipchat.com/#narrow/stream/304641-ideas/topic/Needed.20Function.20signature.20and.20lambda.20expr.20change/near/525987327 to a new topic here in ideas?
something like "Move type annotations to use :: between header and type"
45 messages were moved here from #ideas > Needed Function signature and lambda expr change by Luke Boswell.
thanks @Luke Boswell
the problem isn't only lower ident after the colon, it's any token that could start an expr
and you could go quite a ways before you realized this isn't a record field
At the end of the day, this is Richard's decision but i'm glad we laid out the scenario for everyone and him
I could change the formatter to format code like this and format a big module with lots of annotations
to get a good sample
(deleted) to avoid confusion
i'm not sure about the last one
I guess the proposal here is only changing the type anno.
# Type Annotation
foo :: List(a) -> U64
foo = ...
# Nominal Type Declaration
Foo(a) := [Good(a), Bad]
# Alias Type Declaration
Foo(a) : [Good(a), Bad]
My line of thinking was that these all look nice
(edited to clarify wording of statements)
i prefer this
type aliases don't really cause confusion
createUser :: UserId, UserName, UserAge -> User
createUser = |id, name, age| { id, name, age }
getUserName :: User -> UserName
getUserName = |user| user.name
main! = |_| {
user :: User
user = createUser(123, "Alice", 25)
getUserName(user)
}
Yeah, I've been working through examples... and it's definitely growing on me too.
It feels much clearer and visually distinct where we add type annotations.
i'd love to see what someone like @Niclas Ahden thinks about it, as an actual Roc practitioner
I've gone through most of our snapshot examples and in every case I think the :: is an improvement.
i hope Richard feels the same
I like that it visually distinguishes the annotations from the aliases (further than just upper/lower case idents)
Here's my attempt at a summary
So the only downsides I can think of are;
:: as new operatorfoo :: U64
foo = 42
bar :: { age : U8 }
bar = { age: 42 }
The upsides I can think of are;
:: vs := vs :Also type annotations are always optional, so the :: isn't required on everything, so when it is included it stands out. This feels appropriate because the author has deliberately inserted an annotation.
i love this summary and i'll let it stand. hopefully someone else will step in and speak up
Anthony Bullard said:
Because today, this:
foo = |x| { something : a }Should be valid, parseable code - but it is unclear _what_ it should be. Do we have a function that returns a record with a something field with the value of a? Or a block with a single, useless type annotation statement in it?
that would be a block with no expression at the end, which isn't allowed, right?
so I think a record would be the only valid way to parse it
a block can have one or more statements in parsing
in Can a block needs a trailing expr
Isn't the issue that there is an unbounded amount you would have to parse ahead before you could know it's not a record.
ah I see
Luke Boswell said:
Isn't the issue that there is an unbounded amount you would have to parse ahead before you could know it's not a record.
this is the issue for the Parser. the thing Richard is picking on is more about how a human seeing code would parse it
so there is two distinct issues and i think both are solved with annos using ::
I strongly don't want to do :: so I'd very much prefer to explore alternatives :smile:
i was worried you'd say that
is there a reason, or just aesthetics
Or Haskell PTSD? :rolling_on_the_floor_laughing:
I guess we can go on a brief tangent about that :laughing:
so basically every language except the Haskell family uses : over :: for types (assuming they use one or the other)
the other ML family languages used :: for cons
Anthony Bullard said:
But I think it's better than
- backtracking
- requiring a sigil for record
- changing the symbol used for separating the header and anno
- a symbol or keyword needed to begin a type anno
I think these were the options i found
we don't have to backtrack, there's another fix
the Haskell committee reversed it and used : for cons and :: for types because they thought that people were going to be using cons way more often than type annotations so it would be a nice ergonomics improvement
obviously that decision did not age well
every other language except for Haskell and direct descendents of Haskell (PureScript comes to mind) either always used : for types or went back to it (e.g. Idris and Elm came after Haskell and went with : for types)
separately, it's super common in modern languages to use : without a space for types, e.g. foo: bar
we use a space, which is already a little bit weird; adding a second : just does not seem like a justifiable use of weirdness budget to solve a parsing edge case that can be solved in another way
yeah so the other option i found which i know you want is inline annos
but what's this alternative to backtracking
so this is an exploratory idea for sure, but I've been thinking about it for awhile and I haven't been able to come up with a reason that it wouldn't work
i know we could share nodes between exprs and patterns
exactly
they have total overlap in terms of syntax, and the only things that are invalid in one but not the other can be checked during canonicalization
yeah but do we just bail on the pattern parsing as soon as we find something that couldn't be a pattern?
nh
*nah
well Alternatives and as as not part of expr
the idea would be that the parser is just concerned with the structure
right, but canonicalization could give an error for that just as easily
one of the reasons to do the design would be that it could make formatting faster because the parser can do less work
by deferring some of the checks that normally happen during parsing to canonicalization
so parsing becomes about turning tokens into a valid "shape" but not about deciding which things are patterns vs expressions vs record fields vs type annotations etc.
that becomes canonicalization's job, and the job becomes easier because canonicalization has a more complete picture to work with
ok
Does that mean we need to do Can before we format then?
I don't think so
I can't think of a situation where it would matter :thinking:
Or am I going to just blow up the formatter when I get a pattern in the middle of formatting an expr?
Ok, maybe that's true
I'm down to try
do we format patterns and exprs differently?
I don't think we do, but I could be missing something :smile:
pretty sure they just follow the same rules
They are different today
interesting!
Though this is a type
Different functions
ah so they're separate just because the types are different
I think they are very similar
Yes
gotcha
But the thing we are talking about in this topic is about a TYPE
if we're going to try combining them, there's another cool Zig thing we can try - a technique I liked in Layout
I can only tell this is NOT a record by trying to parse at least one record field
Which depending on the type of annotation, could be a large number of tokens
I can talk more in like an hour, putting kids to bed
(deleted)
(deleted)
so the basic idea is that this is similar to a Zig tagged union, except that you can store all the tags, and then separately store all the unions - like what we do with lhs and rhs right now, except that you get to specify that lhs and rhs are unions
code example: https://github.com/roc-lang/roc/blob/main/src/layout/store.zig#L209-L220
return switch (layout.tag) {
.scalar => switch (layout.data.scalar.tag) {
.int => layout.data.scalar.data.int.size(),
.frac => layout.data.scalar.data.frac.size(),
.bool => 1, // bool is 1 byte
.str, .opaque_ptr => target_usize.size(), // str and opaque_ptr are pointer-sized
},
.box, .box_of_zst => target_usize.size(), // a Box is just a pointer to refcounted memory
.list, .list_of_zst => target_usize.size(), // TODO: get this from RocStr.zig and RocList.zig
.record => self.record_data.get(@enumFromInt(layout.data.record.idx.int_idx)).size,
.tuple => self.tuple_data.get(@enumFromInt(layout.data.tuple.idx.int_idx)).size,
};
the relevant technique there is
.int => layout.data.scalar.data.int.size()
so we know that since the tag was int we can use the data.int union variant
and what's cool about this is that in debug builds, Zig actually secretly tracks at runtime which union variant you instantiated
so if in this code I wrote data.frac instead of data.int there, I'd get a runtime panic in debug builds
saying that I'd put an int in that union originally, but now I'm trying to read it as a frac
of course in release builds it doesn't do this
this is the "untagged unions" feature
https://ziglang.org/documentation/master/#toc-Anonymous-Union-Literals
so Data could use union for its lhs and rhs in this way
and then we'd get that extra runtime safety, plus the code could be more self-documenting in various places
So related to the original problem above.. we'd parse a block/record shaped thing with statement shaped things in them. Then in Can if we have valid statements followed by a final expression we have a block, otherwise it's a record?
isn't the point of lhs and dhs is that we have a low-byte fixed layout for all nodes and any extra data is referenced is other nodes via indexes stored in extra data?
maybe i should read more about this but it seems like this could lead to much fatter nodes where the data list has items the size of the largest union variant
yeah so a union in Zig (without an enum in there) - e.g. const Foo = union { ... } is just saying "Foo could be any one of these types at runtime, and I'm not storing any metadata about which it would be"
so yes it's taking up the space of whatever its biggest variant is, but none of its variants would be bigger than u32 anyway
another way to say it is that union is just a way to be more formal than u32 about what different types that u32 could be referring to
but it doesn't change the runtime representation in any way - at least not in a release build
but in a debug build Zig keeps extra info (I guess in a side table somewhere or something?) so you can also get at least a runtime type mismatch if you think you've got one type in there, but actually that's not the type that was put in there in practice when you set the value of lhs or rhs
i feel like i must be misunderstanding. you are saying data is a union, but nothing in the union would be larger than a u32, but also that it has several nested structs in it?
.int => layout.data.scalar.data.int.size()
what's the union here?
oh no, no nested structs
let me give a concrete example, 1 sec
ok so this code is currently:
.module => |mod| {
node.tag = .module_header;
node.data.lhs = @intFromEnum(mod.exposes);
node.region = mod.region;
},
.hosted => |hosted| {
node.tag = .hosted_header;
node.data.lhs = @intFromEnum(hosted.exposes);
node.region = hosted.region;
},
.package => |package| {
node.tag = .package_header;
node.data.lhs = @intFromEnum(package.exposes);
node.data.rhs = @intFromEnum(package.packages);
node.region = package.region;
},
...but it could be:
.module => |mod| {
node.tag = .module_header;
node.data = .{ .mod = .{ exposes = mod.exposes } };
node.region = mod.region;
},
.hosted => |hosted| {
node.tag = .hosted_header;
node.data = .{ .hosted = .{ .exposes = hosted.exposes } };
node.region = hosted.region;
},
.package => |package| {
node.tag = .package_header;
node.data = .{
.package = {
.packages = package.packages,
.exposes = package.exposes,
}
};
node.region = package.region;
},
...and then Data would be something like:
const Data = union {
mod: struct {
exposes: Collection.Idx,
},
hosted: struct {
exposes: Collection.Idx,
},
package: struct {
exposes: Collection.Idx,
packages: Collection.Idx,
},
}
memory would be exactly the same ones and zeros as today (at least in release builds)
but now we've documented what the different possibilities are for what could be in Data, and the Zig compiler can use that so if I set node.data = .{ .hosted = ... }; in a particular node, and then later access node.data.package instead of node.data.hosted in that node, I get a runtime exception because I put a .hosted in that node, not a .package
(in debug builds only)
I see
That's cool, but I guess I'm at a loss for how this helps resolve the issue that's the root of this particular topic
oh it's separate, sorry
Were you trying to say earlier that we should share the same nodes for Exprs, Patterns, AND Type Annotations? Because even that doesn't help
Unless we go SUPER abstract with the syntax tree to the point of barely doing more than tokenization
Which is basically an entire re-rewrite of the parser at that points
So taking out backtracking for the moment the other options (besides :: which I still didn't feel there was a compelling argument against):
:: if that is out)let as found in many ML languages)hm, so I just thought of a potentially easier fix:
foo = |x| {
something : a
}
let's suppose that when we start parsing, we assume we're building up a record
as soon as we hit a , after the expr, we know that's confirmed and we're all set. so for example this comma after a:
foo = |x| {
something : a,
other : b
}
conversely, if we later hit something that tells us we're not a record, such as the above without a comma...
foo = |x| {
something : a
other : b
}
(which is unambiguously two consecutive type annotations)
then we know we've actually been parsing a block
but an important observation here is that at the point where we make this realization, it is for sure the case that we have parsed exactly:
something - which we have internedBut what if you just get a curly after?
then it's definitely a record
(for the reasons discussed earlier)
So I'll get a LSP error about an undeclared variable when the code is in this state?
no, because we assume it's a record
until proven otherwise
couldn't it be another record type annotation?
Since there is a type variable (above the function is a top-level annotation (which is unambiguous) that introduces it. But here a is not a defined variable
@Luke Boswell not in that position
That could only be an expr
And there are two Exprs that start with OpenCurly: Record and Block
Richard Feldman said:
but an important observation here is that at the point where we make this realization, it is for sure the case that we have parsed exactly:
- exactly 1 record field, namely
something- which we have interned- the expr that goes after it
to finish this thought:
at this point, if we want to change our mind about what we've been building up, we don't have to backtrack and redo work; we can just reach back and swap the node type in constant time
just say "instead of a record with exactly 1 field, whose name we have already interned, this is now a type annotation where the pattern ident is the record field name we interned, and the type is the thing we thought was an expr after a record field"
but that relies on being able to have one node type for types and exprs
The big thing here is that if we have (the actual motivating snapshot):
identity : a -> a
identity = |x| {
thing : a # refers to the type var introduced in function type annotation
thing = x # refers to the value from the function parameter
thing
}
What is thing : a stored in the NodeStore as?
Richard Feldman said:
but that relies on being able to have one node type for types and exprs
Yeah, and therein lies the problem
We've now collapsed three nodes types into one node type
at the parsing stage, but they all have the same structure right?
Which I guess for Nodes maybe isn't the biggest deal
yeah
there are cases where it would be really confusing but I don't actually think this is one of them
like ok [A, B, C] could be a pattern, or a tag union type, or a list expression
I have to constantly remind myself that Nodes are just a stored representation, whereas the Typed values are what's important to downstream consumers
like the canonicalization logic would be almost the same
it's still "lhs is pattern, rhs is expression"
just the arguments to the function you pass lhs and rhs have a different types
The can logic wouldn't have to change at all
but they still have the same conditionals and the same branches
yeah exactly
Except catch things that don't make sense
aside from whatever logic we do or don't decide to move to canonicalization
right!
Ok, so luckily type annotations come pretty var down the tutorial - and at that point we'll just have to let people know (because of auto-bracketing editors) that a type annotation by itself in a block will be treated as a record
In case they see weird errors
That would have to be exceedingly rare
Like with the motivating example, if at some point of entering it they end up in this state:
identity : a -> a
identity = |x| {
thing : a # refers to the type var introduced in function type annotation
}
The LSP will report that a is undefined and that the return type of the function doesn't match the annotation of it
Because when they typed { their editor gave them the final }
You could special case this one scenario
It'll be a transitory error, but still confusing
hm, why wouldn't it see that as a record? :thinking:
It would -- that's why it would be confusing
It does see it as a record
oh I see
But a is not defined in the scope
yeah
I gotcha
This kind of thing is why with arrow function in JS you have to wrap returning an object as a bare expression in ()s
yeah I'm not worried about that being a problem in practice haha
especially with AI autocomplete probably suggesting a thing = right below
you'd probably tab-complete before the LSP even had a chance to complain haha
I think it'll be surprising still to many
Because I know you work for Zed, but there are those of us out there not using AI - even for completions
But yeah, it's not likely a big issue. But you seem really concerned about confusing new users
fair, but also inline type annotations are super rare in practice
Which is laudable
Yeah, to the point of my wondering if they are even necessary outside of the top-level
they are occasionally
sometimes it's really nice for clarifying what something is, because it's nonobvious from other context
I love using them in blocks
Well, never necessary technically, but convenient
but come to think of it, in those scenarios I almost always find myself adding them after the fact
rather than up front
which also doesn't run into that LSP scenario
It's documentation
Yeah, it's a special kind of user who knows exactly what type something is, feels the need to have the annotation, and then writes it before even writing the declaration for it
Anthony Bullard said:
Like with the motivating example, if at some point of entering it they end up in this state:
identity : a -> a identity = |x| { thing : a # refers to the type var introduced in function type annotation }The LSP will report that
ais undefined and that the return type of the function doesn't match the annotation of it
By special case, I mean if we are not expecting a record return type, but we have a record with exactly one field then give a different warning that also suggests this might be a block expression or something.
And also likes a language with total type inference
unfortunately I'd be that guy
Luke Boswell said:
Anthony Bullard said:
Like with the motivating example, if at some point of entering it they end up in this state:
identity : a -> a identity = |x| { thing : a # refers to the type var introduced in function type annotation }The LSP will report that
ais undefined and that the return type of the function doesn't match the annotation of itBy special case, I mean if we are not expecting a record return type, but we have a record with exactly one field then give a different warning that also suggests this might be a block expression or something.
That could be something for error reporting for sure if Can can give us that info
Luke Boswell said:
unfortunately I'd be that guy
I thought Claude wrote all of your code :stuck_out_tongue_wink:
So it is resolved: We are not moving to :: for type annotations (sorry @Luke Boswell who I sold it to, and actually came to really like it).
My action plan going forward, aligning the node structure for Expr, TypeAnno, and Pattern to be the same. And then adopting the strategy of: If I see a statement start with LowerIdent, UpperIdent, OpenRound, OpenSquare, or OpenCurly just start parsing it and then once I know what I'm working with convert the node id to the appropriate typed id
To be fair, @Anthony Bullard you could sell ice to an eskimo
Hahahahaaha. Try telling that to my Director
But maybe I don't know how to sell AI solutions yet since I still hate it :stuck_out_tongue:
The selling, the AI, or the solutions?
At least the first two
Nils Hjelte has marked this topic as resolved.
Last updated: Jun 16 2026 at 16:19 UTC