Stream: compiler development

Topic: Calculating name attributes during parsing


view this post on Zulip Sam Mohr (Jan 10 2025 at 16:08):

I've been working on efficiently calculating attributes for idents, including legal attributes like ! suffix, _ suffix, etc. as well as problems like double underscores. I think this information is useful throughout the rest of the compiler, so it feels appropriate to store these in a byte as bitflags and pass them around the parsing code that way. It would require slightly more space for every string we keep in the AST, but it would ensure we don't have to repeat parsing work for every single ident in the compiler.

Does anyone think this is a bad idea? I'm gonna try this out post implementing string interpolation, but will avoid that work if people think it's better to duplicate work rather than store more info in the AST

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:10):

The reason why this is tempting to me is because I'm trying to report all problems with an ident in a single message for every string we see in Roc

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:12):

And it seems like we calculate a lot of these attributes later during canonicalization, but we can actually just do it while chomping characters. That way, we don't have to worry about handling non-ASCII or whatever in calculating in roc_can because we know exactly what we parsed earlier.

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:13):

This aligns better with "always inform, never block"

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:13):

Currently, we give a runtime error if a function has two underscores. I think we should just parse it, give a warning, and run anyway

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 16:22):

i don't have a preference for the specific mechanism but I think these should definitely be consolidated behind an API and the specific API implementation should be opaque to the caller. Ideally these are methods is_effectful etc on the ident or symbol itself. If that's not possible, they should be methods on whatever object manages the symbol.

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:22):

That's what I was thinking.

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:23):

After simultaneously parsing an ident and its attributes, we'd return a

struct Ident<'a> {
    text: &'a str,
    attributes: u8,
    problems: u8,
}

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:23):

And you'd access those values via methods

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:23):

I've already written these methods for a post-parsing IdentAttributes

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:24):

But I think they belong at parsing and later instead

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 16:24):

seems fine. as long as the fields are always private and immutable

view this post on Zulip Sam Mohr (Jan 10 2025 at 16:24):

Yep, sounds good

view this post on Zulip Richard Feldman (Jan 10 2025 at 18:31):

what attributes specifically did you have in mind?

view this post on Zulip Richard Feldman (Jan 10 2025 at 18:31):

(out of curiosity)

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:15):

These are the current valid flags I'm parsing:

And these are the current ident errors, (though some may go away without needing to verify the ident parsed):

My plan for these errors is to combine them into a single warning per ident with a list of errors to maximize info without clustering multiple warnings on the same ident

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:15):

For the flags, not all of them are needed the whole time. The uppercase thing isn't needed once we figure out if a thing is a tag/alias or a var/function/value/typevar

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:18):

Underscore at the start gets the ident ignored, but we probably still want to track the ident in case someone tries to run code that looks like

_val = 123
do_something_else!()?
_val = 456

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:18):

The current technique that Agus is employing is using the first bit of the 32bit ident ID as a bitflag for whether the var is ! suffixed

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:19):

That gives us about 2 billion idents available instead of 4 billion per module

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:19):

Using 4 bits brings us down to a quarter of a billion

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:20):

We may not even need 4 bits, but if we use 4 bits that doesn't feel like much of a cost

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:22):

This change also means that we probably can get away with not storing string values during parsing. We can just store a 64-bit rolling hash for each string and check for equality based on that value.

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:23):

If you need the original string, you use the region of the string to get it back

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:27):

the string is just a pointer right? so there should be no size difference

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:27):

If it's always a pointer, then yes

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:27):

But string equality checking in can right now is O(n)

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:28):

yeah but the strings are small and that only matters before you turn the string into a symbol. it should be benchmarked but i would be a lot of money that is very low on the flamegraph

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:29):

Yeah, fair

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:29):

My thought is that we use a lot of VecMaps

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:29):

And that's because we want to avoid the cost of hashing

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:29):

for small collections yes

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:30):

debugability is probably more important than perf in the current implementation though

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:30):

But if strings are always hashed already, then you can use them as keys in a proper hashmap

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:30):

Okay, for now I'll keep strings as strings

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:30):

you still have to deal with collisions right? or write your own string hasher

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:31):

You'd make your object a Vec<(string hash, Vec<item>)>

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:31):

Write your own string hasher would just mean a rolling hash

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:32):

you would have to do something to ensure there are no hash conflicts right?

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:33):

at which point it's an interner b/c 1-to-1 mapping

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:33):

Yeah, true

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:33):

https://github.com/roc-lang/roc/blob/a69a326161a49c19186b8bf07b50c625d3b95686/crates/compiler/collections/src/small_string_interner.rs#L185

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:34):

This is where we walk the entire string interner every time we look for a symbol that matches what we have in scope

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:34):

It'd be nice for the interner to get closer to O(1) than O(n) by hashing the strings for insert

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:35):

The way to avoid that is to have your string interners keys be the hashes of the strings, and the values be Vec<(&'a str, other, data)>

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:36):

I'll be improving this performance by maintaining a separate data structure in the can rewrite for tracking ident IDs in scope

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:36):

But I'm very conscious of the time we put into interning strings

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:37):

As you point out, this doesn't matter right now

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:38):

But O(n) searches of all idents interned so far every time we want to deduplicate ident IDs by string search seems pretty sus

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:39):

the way i would do it is to keep a hash map instead. then look up by hash and check for any collisions. if there are no collisions insert

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 19:40):

i wouldn't get too nerdsniped by this though. the really expensive parts are typechecking and monomorphization. everything else is kind of a drop in the bucket comparatively

view this post on Zulip Richard Feldman (Jan 10 2025 at 19:42):

yeah and +1 on "correctness first"

view this post on Zulip Sam Mohr (Jan 10 2025 at 19:42):

Ok. An aside: is just mono that expensive?

view this post on Zulip Richard Feldman (Jan 10 2025 at 19:42):

like architecturally we want to be set up for caching

view this post on Zulip Richard Feldman (Jan 10 2025 at 19:42):

but on the level of an individual data structure, we can optimize those later as desired

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:45):

yes

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:46):

the reason c++/rust style languages have slow compilers is because of specialization

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:46):

specialization is the #1 cost in any compiler

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:47):

everything else doesn't matter (comparatively). specialization forces a potentially-exponential growth in program size, and hence compute during compilation

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:51):

Wow, good to know

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:53):

That makes me wonder if long-term, Roc would have a faster dev backend startup time if we did dynamic dispatch for everything

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:53):

That would be very long-term, though

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:53):

I'm not sure about that...seems like Zig and JAI are way faster than Rust or C++ even though they both specialize, and they specialize via compile-time interpreting of source code, which is likely to be slower than parametric polymorphism type inference

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:54):

"specialization is costly" is definitely true, but "specializing compilers are all slow" is empirically false thanks to those counterexamples, so there is obviously more to the story

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:56):

i'm talking about relative speeds, not in absolute terms. zig is faster than c++ and rust but that doesn't mean that zig without specialization would be just as fast as zig with specialization. it would certainly be faster

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:56):

oh yeah for sure!

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:56):

c++ and rust have a lot of other stuff that doesn't help. auto derefs are a good example

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:56):

yep haha

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:57):

but what i mean is, in terms what is the bottleneck for any given compiler, specialization is definitely at the top

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:57):

totally

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:57):

specialization is the only thing that cannot be O(N) where N is program size

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:58):

well, after linking and code gen in the case of LLVM - although specialization does create more work for both of those steps

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:58):

Can typechecking be O(N)-ish with our type system?

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:58):

It feels very graph traversal-y at the moment

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:58):

yes typechecking is o(n)

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:58):

in practice

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:58):

Ah

view this post on Zulip Richard Feldman (Jan 10 2025 at 20:58):

just don't write a giant pyramid of nested defs please :stuck_out_tongue:

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:58):

I can see in practice

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:58):

even just for generating the specializations, it's bigger than o(n)

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:59):

because you emit equal or more code than is fed in

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:59):

so just producing the specializations is expensive

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:59):

I get that specialization gen is bigger than O(n), but just making concretely-typed copies seems cheap even if you do a lot of it

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 20:59):

and it's compounded by the fact that you have to duplicate data, etc (and there is little getting around that)

view this post on Zulip Sam Mohr (Jan 10 2025 at 20:59):

But the part that would suck is everything after it working on more than the original N functions/values

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:02):

yeah the reason is that specialization is more than making trivial copies

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:02):

you are basically chopping up graphs of the program and rewriting them

view this post on Zulip Sam Mohr (Jan 10 2025 at 21:02):

I figured

view this post on Zulip Sam Mohr (Jan 10 2025 at 21:03):

Oh, even just the naive graph expansion is bottleneck number one for relative time spent by compiler passes??

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:04):

there are things that you can do to make the constant cost better, eg avoid over allocating, have a dense union/find representation of types, compiling to a good representation (e.g. passing large objects by pointer instead of by value everywhere). but the non-constant cost can't change

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:06):

maybe a better way to put it is anything that is rewriting the program to a non-constant equivalent is going to be an expensive part. optimizations are another good example of this

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:09):

another kind of interesting thing about specialization is that it is inherently at odds with separate compilation

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:09):

this is why you don't really have separate compilation in Rust and in C++ you have to include generic functions in header files (so you have separate compilation until you need a generic function)

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:09):

which adds to the feeling of compilation time overall

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:10):

there are ways to get around this though. for example boxing values at compilation unit boundaries so there is a uniform representation. This is what Swift to separate comp/dynamic linking with generics IIRC

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:14):

yeah that route is tricky for us because of host ABIs unfortunately

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:15):

like if we give a List Str to the host, it needs to look the same way in memory with or without optimizations, and we definitely want specialization in optimized builds even if we wanted dev builds to do something else

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:16):

i honestly think the long term answer for dev builds if you care about compiler perf (for any specializing compiler) is an interpreter

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:17):

in roc's case the interpreter would have some c ffi and the rest is just interpreted, no specialization

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:18):

you also don't need to worry about it for the host ABI because the host ABI has to be concretized - there is no polymorphism over it

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:19):

it only matters within same-language (Roc<>Roc in this case) calls

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:30):

so like convert at runtime right at the boundary?

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:33):

long term answer for dev builds if you care about compiler perf (for any specializing compiler) is an interpreter

Yeah, this is probably correct. Probably a bytecode interpreter like python. (Though this does leave weird holes where dev and optimized differ even more).

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:35):

Like host calls into roc (which just launches an interpreter over the roc bytecode). When calling back into the host, just use libffi to deal with all the types and calling the host pointers.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:36):

This should be very doable with lots of gains.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:36):

Likely still want to do at a minimum tail call optimization, but most things could remain boxed and etc to run faster.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:37):

The problem with the dev backend compilation time long term, isn't the dev backend. It is all the parts above the dev backend to deal with type checking and specialization.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:38):

Those are the slow parts that benefit signifcantally from a boxed representation and such.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:40):

Also, I'm not sure which would be faster, dev assembly or a bytecode interpreter. Probably still the dev assembly, but less clear especially if you do some smart bytecode assembler tricks (not to mention some lambda chaining tricks that get pretty optimized code without writing any assembly).

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:41):

Obviously, we could box everything and do assembly gen fast just change the assembly to understand boxed types.

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:55):

so like convert at runtime right at the boundary?
yes

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:56):

yeah to Brendan's point, interpreter vs fast machine code generator is orthogonal to specialization

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:57):

the real benefit would be if we could have the dev backend do something other than specialization at the cost of (much) more boxing at runtime

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:57):

and yet at the boundary convert things over to the representation the host needs

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:57):

which in some cases might be very expensive, unfortunately :sweat_smile:

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:58):

well the great thing is you don't need the boundary conversion at the host, that's what im saying

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 21:59):

What about something like List Str? in the boxed non-specialized world, wouldn't it be List GenericRocType

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:59):

hm, but unless I'm missing something, List Str would be different in memory depending on whether it's been monomorphized or not

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:59):

i don't think it's that expensive. it's only a pointer dereference and indirect jumps for abilities

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:59):

right, there would be some runtime metadata there I'd think

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:59):

at the host there's no polymorphism

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 21:59):

at the host boundary*

view this post on Zulip Richard Feldman (Jan 10 2025 at 21:59):

but you can give a List Str to a host

view this post on Zulip Richard Feldman (Jan 10 2025 at 22:00):

there are no polymorphic variables

view this post on Zulip Richard Feldman (Jan 10 2025 at 22:00):

they're all concrete

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:00):

sure, but you don't need to box it because you know that the function will always have the same concrete inputs and outputs

view this post on Zulip Richard Feldman (Jan 10 2025 at 22:00):

but List Str and List U8 have different memory layouts if they've been specialized

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:01):

right, but no host-exposed function has more than one specialization

view this post on Zulip Anthony Bullard (Jan 10 2025 at 22:01):

Other languages are more free to do this because the language runtime and the "platform" are not separated and are cohesive

view this post on Zulip Richard Feldman (Jan 10 2025 at 22:01):

but let's say I have a call to List.map which returns a List Str which I hand off to the platform

view this post on Zulip Richard Feldman (Jan 10 2025 at 22:02):

how can List.map return a List Str sometimes and a List U8 other times I call it, without either specializtion or some runtime metadata on List?

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:02):

okay, i see what you're getting at now

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:02):

I'm not sure that is supported though

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 22:03):

but you don't need to box it because you know that the function will always have the same concrete inputs and outputs

Wait, your suggestion is more meta than I realized. You don't want to box everything, You just want to box anything that has more than one specialization. That said, you want to box only over the possible specializations. So for a function that in practice has two specializations List Str -> ... or List U8 -> ..., you would essentially convert it into a [ ListStr (List Str), ListU8 (List U8) ] -> ...

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 22:03):

So very different from the standard interpreter that boxes everything and does so in a recursive manner.

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:04):

i don't think you can pass a function to the host that is polymorphic for exactly that reason - the host has to have a concrete (non-generic) interface so i'm not sure how you would pass such a function through?

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:04):

like I can't say on the host side "i expose an effect f: Str -> (a -> a) -> Str"

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:05):

you have to say f: Str -> (Str -> Str) -> Str at which point you know exactly what specialization you need and you don't need anything more

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:07):

Wait, your suggestion is more meta than I realized. You don't want to box everything, You just want to box anything that has more than one specialization. That said, you want to box only over the possible specializations. So for a function that in practice has two specializations List Str -> ... or List U8 -> ..., you would essentially convert it into a [ ListStr (List Str), ListU8 (List U8) ] -> ...

Yes, that could work too. But i'm not quite following how the host can take any function that is generic. I don't think it's possible but maybe i'm missing something

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 22:08):

I'm not worried about directly interacting with the host. I am worried about indirectly interacting with the host.

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:08):

what is the difference?

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 22:09):

I thought part of the goal of this work was to avoid specialization. As such, even though the host has concrete types, it has to be able to call functions that have multiple specializations and thus take generic boxed types. As such, there has to be a way to convert concrete types to/from generic types.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 22:10):

Aside, I think an interpreter could theoretically avoid boxing and lazily specialize when a function with a specific specialization is called.

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:11):

My argument is that the host can never interact with any Roc function that is polymorphic, hence you don't need to worry about generics over the host boundary. Because how could that work if it did? You would need to re-compile the host every time the app uses another specialization

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:12):

yes, i think you're right

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 22:14):

sorry for devolving your thread Sam lmao

view this post on Zulip Sam Mohr (Jan 10 2025 at 22:20):

This was educational

view this post on Zulip Sam Mohr (Jan 10 2025 at 22:20):

I got what I wanted, which was approval for me doing code cleanup

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:03):

i'm still confused on if i'm missing something about the app<>host boundary. am i? or is it correct that the host actually has no way to call a polymorphic function

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 23:28):

Host could call something with a Box model

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 23:29):

But that is kinda different

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:29):

but that has to be equivalent to a single function, it can't dispatch to one of N functions right

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:29):

yeah

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 23:29):

Yeah

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:29):

but the host can't actually call List.map polymorphically

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:30):

theres no way to do that and be able to compile the host separately i think

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:30):

without boxed polymorphic types obviously

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 23:30):

Yeah

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:30):

okay, so then this works just fine at host boundaries. you don't need any boxing at all.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 23:31):

I just don't understand how without specializing list.map, we go from the concrete list passed in from the host to calling a polymorphic function.

view this post on Zulip Brendan Hansknecht (Jan 10 2025 at 23:33):

Like if main is simply:

main : List Str -> List U64
main =
    if #some arbitrary conditional then
        List.map Str.toU64
    else
        List.map Str.toI64
        |> List.map Num.abs

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:35):

sorry i think i wasn't clear

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:35):

List.map would be specialized

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:36):

I was talking about 3p functions at compilation boundaries (let's say libraries for now)

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:37):

but the case is similar. in this example you would simply unwrap the boxed type at the end and return that.

view this post on Zulip Richard Feldman (Jan 10 2025 at 23:37):

yeah the problem with that is what if you have like List (List Str) or something

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:37):

why is that a problem?

view this post on Zulip Richard Feldman (Jan 10 2025 at 23:38):

now you gotta traverse the whole thing and unbox all the inner lists

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:39):

i don't think that's necessarily true. you only need to do the wrap/unwrap if you definitely know you are calling a polymorphic-but-unspecialized function. which in practice is going to be a small amount of the functions in your program (since it's only at 3p library API boundaries)

view this post on Zulip Ayaz Hafiz (Jan 10 2025 at 23:40):

Also, it doesn't need to be nested unwrap. If you pass List (List Str) to a function like that, you only need to wrap the whole structure, not the nested ones

view this post on Zulip Ayaz Hafiz (Jan 11 2025 at 17:31):

this is succinct explanation of what i was saying: https://lobste.rs/s/ei5bp4/apple_is_killing_swift#c_zwjc1i. very coincidental


Last updated: Jul 06 2025 at 12:14 UTC