I've been working on efficiently calculating attributes for idents, including legal attributes like ! suffix, _ suffix, etc. as well as problems like double underscores. I think this information is useful throughout the rest of the compiler, so it feels appropriate to store these in a byte as bitflags and pass them around the parsing code that way. It would require slightly more space for every string we keep in the AST, but it would ensure we don't have to repeat parsing work for every single ident in the compiler.
Does anyone think this is a bad idea? I'm gonna try this out post implementing string interpolation, but will avoid that work if people think it's better to duplicate work rather than store more info in the AST
The reason why this is tempting to me is because I'm trying to report all problems with an ident in a single message for every string we see in Roc
And it seems like we calculate a lot of these attributes later during canonicalization, but we can actually just do it while chomping characters. That way, we don't have to worry about handling non-ASCII or whatever in calculating in roc_can
because we know exactly what we parsed earlier.
This aligns better with "always inform, never block"
Currently, we give a runtime error if a function has two underscores. I think we should just parse it, give a warning, and run anyway
i don't have a preference for the specific mechanism but I think these should definitely be consolidated behind an API and the specific API implementation should be opaque to the caller. Ideally these are methods is_effectful
etc on the ident or symbol itself. If that's not possible, they should be methods on whatever object manages the symbol.
That's what I was thinking.
After simultaneously parsing an ident and its attributes, we'd return a
struct Ident<'a> {
text: &'a str,
attributes: u8,
problems: u8,
}
And you'd access those values via methods
I've already written these methods for a post-parsing IdentAttributes
But I think they belong at parsing and later instead
seems fine. as long as the fields are always private and immutable
Yep, sounds good
what attributes specifically did you have in mind?
(out of curiosity)
These are the current valid flags I'm parsing:
And these are the current ident errors, (though some may go away without needing to verify the ident parsed):
_!
at the end instead of !_
?My plan for these errors is to combine them into a single warning per ident with a list of errors to maximize info without clustering multiple warnings on the same ident
For the flags, not all of them are needed the whole time. The uppercase thing isn't needed once we figure out if a thing is a tag/alias or a var/function/value/typevar
Underscore at the start gets the ident ignored, but we probably still want to track the ident in case someone tries to run code that looks like
_val = 123
do_something_else!()?
_val = 456
The current technique that Agus is employing is using the first bit of the 32bit ident ID as a bitflag for whether the var is ! suffixed
That gives us about 2 billion idents available instead of 4 billion per module
Using 4 bits brings us down to a quarter of a billion
We may not even need 4 bits, but if we use 4 bits that doesn't feel like much of a cost
This change also means that we probably can get away with not storing string values during parsing. We can just store a 64-bit rolling hash for each string and check for equality based on that value.
If you need the original string, you use the region of the string to get it back
the string is just a pointer right? so there should be no size difference
If it's always a pointer, then yes
But string equality checking in can right now is O(n)
yeah but the strings are small and that only matters before you turn the string into a symbol. it should be benchmarked but i would be a lot of money that is very low on the flamegraph
Yeah, fair
My thought is that we use a lot of VecMap
s
And that's because we want to avoid the cost of hashing
for small collections yes
debugability is probably more important than perf in the current implementation though
But if strings are always hashed already, then you can use them as keys in a proper hashmap
Okay, for now I'll keep strings as strings
you still have to deal with collisions right? or write your own string hasher
You'd make your object a Vec<(string hash, Vec<item>)>
Write your own string hasher would just mean a rolling hash
you would have to do something to ensure there are no hash conflicts right?
at which point it's an interner b/c 1-to-1 mapping
Yeah, true
This is where we walk the entire string interner every time we look for a symbol that matches what we have in scope
It'd be nice for the interner to get closer to O(1) than O(n) by hashing the strings for insert
The way to avoid that is to have your string interners keys be the hashes of the strings, and the values be Vec<(&'a str, other, data)>
I'll be improving this performance by maintaining a separate data structure in the can rewrite for tracking ident IDs in scope
But I'm very conscious of the time we put into interning strings
As you point out, this doesn't matter right now
But O(n) searches of all idents interned so far every time we want to deduplicate ident IDs by string search seems pretty sus
the way i would do it is to keep a hash map instead. then look up by hash and check for any collisions. if there are no collisions insert
i wouldn't get too nerdsniped by this though. the really expensive parts are typechecking and monomorphization. everything else is kind of a drop in the bucket comparatively
yeah and +1 on "correctness first"
Ok. An aside: is just mono
that expensive?
like architecturally we want to be set up for caching
but on the level of an individual data structure, we can optimize those later as desired
yes
the reason c++/rust style languages have slow compilers is because of specialization
specialization is the #1 cost in any compiler
everything else doesn't matter (comparatively). specialization forces a potentially-exponential growth in program size, and hence compute during compilation
Wow, good to know
That makes me wonder if long-term, Roc would have a faster dev backend startup time if we did dynamic dispatch for everything
That would be very long-term, though
I'm not sure about that...seems like Zig and JAI are way faster than Rust or C++ even though they both specialize, and they specialize via compile-time interpreting of source code, which is likely to be slower than parametric polymorphism type inference
"specialization is costly" is definitely true, but "specializing compilers are all slow" is empirically false thanks to those counterexamples, so there is obviously more to the story
i'm talking about relative speeds, not in absolute terms. zig is faster than c++ and rust but that doesn't mean that zig without specialization would be just as fast as zig with specialization. it would certainly be faster
oh yeah for sure!
c++ and rust have a lot of other stuff that doesn't help. auto derefs are a good example
yep haha
but what i mean is, in terms what is the bottleneck for any given compiler, specialization is definitely at the top
totally
specialization is the only thing that cannot be O(N) where N is program size
well, after linking and code gen in the case of LLVM - although specialization does create more work for both of those steps
Can typechecking be O(N)-ish with our type system?
It feels very graph traversal-y at the moment
yes typechecking is o(n)
in practice
Ah
just don't write a giant pyramid of nested defs please :stuck_out_tongue:
I can see in practice
even just for generating the specializations, it's bigger than o(n)
because you emit equal or more code than is fed in
so just producing the specializations is expensive
I get that specialization gen is bigger than O(n), but just making concretely-typed copies seems cheap even if you do a lot of it
and it's compounded by the fact that you have to duplicate data, etc (and there is little getting around that)
But the part that would suck is everything after it working on more than the original N functions/values
yeah the reason is that specialization is more than making trivial copies
you are basically chopping up graphs of the program and rewriting them
I figured
Oh, even just the naive graph expansion is bottleneck number one for relative time spent by compiler passes??
there are things that you can do to make the constant cost better, eg avoid over allocating, have a dense union/find representation of types, compiling to a good representation (e.g. passing large objects by pointer instead of by value everywhere). but the non-constant cost can't change
maybe a better way to put it is anything that is rewriting the program to a non-constant equivalent is going to be an expensive part. optimizations are another good example of this
another kind of interesting thing about specialization is that it is inherently at odds with separate compilation
this is why you don't really have separate compilation in Rust and in C++ you have to include generic functions in header files (so you have separate compilation until you need a generic function)
which adds to the feeling of compilation time overall
there are ways to get around this though. for example boxing values at compilation unit boundaries so there is a uniform representation. This is what Swift to separate comp/dynamic linking with generics IIRC
yeah that route is tricky for us because of host ABIs unfortunately
like if we give a List Str
to the host, it needs to look the same way in memory with or without optimizations, and we definitely want specialization in optimized builds even if we wanted dev builds to do something else
i honestly think the long term answer for dev builds if you care about compiler perf (for any specializing compiler) is an interpreter
in roc's case the interpreter would have some c ffi and the rest is just interpreted, no specialization
you also don't need to worry about it for the host ABI because the host ABI has to be concretized - there is no polymorphism over it
it only matters within same-language (Roc<>Roc in this case) calls
so like convert at runtime right at the boundary?
long term answer for dev builds if you care about compiler perf (for any specializing compiler) is an interpreter
Yeah, this is probably correct. Probably a bytecode interpreter like python. (Though this does leave weird holes where dev and optimized differ even more).
Like host calls into roc (which just launches an interpreter over the roc bytecode). When calling back into the host, just use libffi to deal with all the types and calling the host pointers.
This should be very doable with lots of gains.
Likely still want to do at a minimum tail call optimization, but most things could remain boxed and etc to run faster.
The problem with the dev backend compilation time long term, isn't the dev backend. It is all the parts above the dev backend to deal with type checking and specialization.
Those are the slow parts that benefit signifcantally from a boxed representation and such.
Also, I'm not sure which would be faster, dev assembly or a bytecode interpreter. Probably still the dev assembly, but less clear especially if you do some smart bytecode assembler tricks (not to mention some lambda chaining tricks that get pretty optimized code without writing any assembly).
Obviously, we could box everything and do assembly gen fast just change the assembly to understand boxed types.
so like convert at runtime right at the boundary?
yes
yeah to Brendan's point, interpreter vs fast machine code generator is orthogonal to specialization
the real benefit would be if we could have the dev backend do something other than specialization at the cost of (much) more boxing at runtime
and yet at the boundary convert things over to the representation the host needs
which in some cases might be very expensive, unfortunately :sweat_smile:
well the great thing is you don't need the boundary conversion at the host, that's what im saying
What about something like List Str
? in the boxed non-specialized world, wouldn't it be List GenericRocType
hm, but unless I'm missing something, List Str
would be different in memory depending on whether it's been monomorphized or not
i don't think it's that expensive. it's only a pointer dereference and indirect jumps for abilities
right, there would be some runtime metadata there I'd think
at the host there's no polymorphism
at the host boundary*
but you can give a List Str
to a host
there are no polymorphic variables
they're all concrete
sure, but you don't need to box it because you know that the function will always have the same concrete inputs and outputs
but List Str
and List U8
have different memory layouts if they've been specialized
right, but no host-exposed function has more than one specialization
Other languages are more free to do this because the language runtime and the "platform" are not separated and are cohesive
but let's say I have a call to List.map
which returns a List Str
which I hand off to the platform
how can List.map
return a List Str
sometimes and a List U8
other times I call it, without either specializtion or some runtime metadata on List
?
okay, i see what you're getting at now
I'm not sure that is supported though
but you don't need to box it because you know that the function will always have the same concrete inputs and outputs
Wait, your suggestion is more meta than I realized. You don't want to box everything, You just want to box anything that has more than one specialization. That said, you want to box only over the possible specializations. So for a function that in practice has two specializations List Str -> ...
or List U8 -> ...
, you would essentially convert it into a [ ListStr (List Str), ListU8 (List U8) ] -> ...
So very different from the standard interpreter that boxes everything and does so in a recursive manner.
i don't think you can pass a function to the host that is polymorphic for exactly that reason - the host has to have a concrete (non-generic) interface so i'm not sure how you would pass such a function through?
like I can't say on the host side "i expose an effect f: Str -> (a -> a) -> Str
"
you have to say f: Str -> (Str -> Str) -> Str
at which point you know exactly what specialization you need and you don't need anything more
Wait, your suggestion is more meta than I realized. You don't want to box everything, You just want to box anything that has more than one specialization. That said, you want to box only over the possible specializations. So for a function that in practice has two specializations
List Str -> ...
orList U8 -> ...
, you would essentially convert it into a[ ListStr (List Str), ListU8 (List U8) ] -> ...
Yes, that could work too. But i'm not quite following how the host can take any function that is generic. I don't think it's possible but maybe i'm missing something
I'm not worried about directly interacting with the host. I am worried about indirectly interacting with the host.
what is the difference?
I thought part of the goal of this work was to avoid specialization. As such, even though the host has concrete types, it has to be able to call functions that have multiple specializations and thus take generic boxed types. As such, there has to be a way to convert concrete types to/from generic types.
Aside, I think an interpreter could theoretically avoid boxing and lazily specialize when a function with a specific specialization is called.
My argument is that the host can never interact with any Roc function that is polymorphic, hence you don't need to worry about generics over the host boundary. Because how could that work if it did? You would need to re-compile the host every time the app uses another specialization
yes, i think you're right
sorry for devolving your thread Sam lmao
This was educational
I got what I wanted, which was approval for me doing code cleanup
i'm still confused on if i'm missing something about the app<>host boundary. am i? or is it correct that the host actually has no way to call a polymorphic function
Host could call something with a Box model
But that is kinda different
but that has to be equivalent to a single function, it can't dispatch to one of N functions right
yeah
Yeah
but the host can't actually call List.map
polymorphically
theres no way to do that and be able to compile the host separately i think
without boxed polymorphic types obviously
Yeah
okay, so then this works just fine at host boundaries. you don't need any boxing at all.
I just don't understand how without specializing list.map, we go from the concrete list passed in from the host to calling a polymorphic function.
Like if main is simply:
main : List Str -> List U64
main =
if #some arbitrary conditional then
List.map Str.toU64
else
List.map Str.toI64
|> List.map Num.abs
sorry i think i wasn't clear
List.map would be specialized
I was talking about 3p functions at compilation boundaries (let's say libraries for now)
but the case is similar. in this example you would simply unwrap the boxed type at the end and return that.
yeah the problem with that is what if you have like List (List Str)
or something
why is that a problem?
now you gotta traverse the whole thing and unbox all the inner lists
i don't think that's necessarily true. you only need to do the wrap/unwrap if you definitely know you are calling a polymorphic-but-unspecialized function. which in practice is going to be a small amount of the functions in your program (since it's only at 3p library API boundaries)
Also, it doesn't need to be nested unwrap. If you pass List (List Str)
to a function like that, you only need to wrap the whole structure, not the nested ones
this is succinct explanation of what i was saying: https://lobste.rs/s/ei5bp4/apple_is_killing_swift#c_zwjc1i. very coincidental
Last updated: Jul 06 2025 at 12:14 UTC