Just to ask this concretely, should these effects be allowed:
storeAnything : Box a -> Effect U64
loadAnything : U64 -> Effect (Box a)
They are fundamentally what enable dynamic FFI and would enable caching any data in basic webserver. They just have no type safety. Unless loadAnything
could inspect the type expected to be returned, roc user code could request a Box Str
when you actually have a Box U64
stored.
These remind me a bit of the Var
type in Haskell. Maybe you're already familiar, their types (translated to Roc syntax) would are like:
new : Box a -> Effect (Var a)
read: Var a -> Effect (Box a)
write: Var a, Box a -> Effect {}
Where Var a
represents a mutable reference to some value of type A
. I think that would allow caching in a type-safe manner.
I don't think roc currently would allow you to define Var a
cause it would have an unused type variable.
Also, just to give the more complex ffi example, it is essentially:
ffi: Lib, Str, Box a -> Effect (Box b)
So I don't think there is any sort of Var a
strategy that could work. But I think that is expected. Obviously ffi, can totally transform types.
I think Roc allows it. Maybe I misunderstand?
$ roc repl
» Var a := U64
» mkvar : a -> Var a
… mkvar = \_ -> @Var 42
<function> : a -> Var a
» mkvar "Hi!"
@Var 42 : Var Str
Oh interesting. I thought we generated a warning for that. Guess not.
I think we don't because phantom types are useful :smile:
yeah it's an error for type aliases but not even a warning for opaque type (because yeah, phantom types are useful!)
Ah, alias vs opaque. Makes sense.
Brendan Hansknecht said:
Just to ask this concretely, should these effects be allowed:
storeAnything : Box a -> Effect U64 loadAnything : U64 -> Effect (Box a)
They are fundamentally what enable dynamic FFI and would enable caching any data in basic webserver. They just have no type safety. Unless
loadAnything
could inspect the type expected to be returned, roc user code could request aBox Str
when you actually have aBox U64
stored.
this is a very good question
there's some other way to do it which is less efficient, right?
like with List U8
instead of a
Yeah, you could do a List U8
instead of Box a
and force some form of encoding and decoding.
You also can make it type safe with Var a
as @Jasper Woudenberg mentioned.
To the platform, box is just a pointer, so they can handle and store it safely for the most part. (refcounting may have some complications, but I'm not 100% sure)
yeah I think for now we should be conservative and disallow this
and consider supporting it in the future if there’s demand for it in practice
because it’s both super unsafe and also prevents an automatic replay feature
because we can’t know the layout of what’s behind that pointer, so we don’t know how to write it down to replay it later
Hmm... You have to allow some form of it due to Box a
being used for models? Or is that different
Like in an elm architecture style app
yeah that's different
I specifically mean *
hm, actually
I guess for replay we can know what the type is after monomorphization :thinking:
because we'll have inferred it
so we could use that to know the layout, which in turn means we'd know how to traverse it for replay
Yeah, roc knows all the types. The platform just doesnt
it's unsafe, but to be fair, literally anything we get from the host could be a bad pointer
so it's all equally unsafe in that sense
the difference is whether application authors can cause UB
The part that makes it less safe is that if exposed directly to userland, the app author can hit it
which they can if we allow this, and can't if we don't allow this
right
so we'd go from "only platform authors can cause UB" to "application authors (or any of the libraries they use which return Task
) can cause UB if the platform supports it"
Yep. Though I don't think UB is the right term. More like can generate totally broken bindings that crash the app or return garbage data.
That said, it is one of those weird cases where depending on how it is exposed, it could be totally safe. Like stored with Var a
. So kinda a case where it enables new features, but those features may have too much power
unfortunately I think UB is accurate here :big_smile:
if you get the layout wrong, all bets are off and anything could happen
including overwriting data in a completely unrelated data structure which was unfortunate enough to be adjacent in memory to the thing with the wrong layout, which in turn leads to other incorrect pointers, which…etc etc
yeah this feels like “we can always introduce support later if it feels worth it, but it would be a big breaking change to take away if we support it now and regret it later”
UB is just a very specific compiler term that means something different. Like when c++ or llvm says UB, it means that they won't specify a specific behaviour generally for performance reasons. Then the optimizer will pick the faster choice for the specific hardware.
Having a wrong layout for a call isn't UB. It is totally incorrect and broken code.
Richard Feldman said:
yeah this feels like “we can always introduce support later if it feels worth it, but it would be a big breaking change to take away if we support it now and regret it later”
Note, we do support this today
i’m pretty sure exposing any generic types to the platform, in either input or output is a bad idea
it feels like a mistake that it can be done today
How are you supposed to do something elm architecture like then?
Where the model depends on the app?
that seems like a special case, because the type is specialized to the app right? in that case the runtime should box the model type, but it’s not generic in the sense that the application cannot call it with any other type
said another way, from the perspective of the app there are no generic inputs or outputs
In all of these cases, I think we are talking about actually sending a Box a
over to the platform. The app is required to know what a
is. I guess in the ffi case specifically, it is so flexible that generally it requires a type annotation or roc compilation will break.
I think any sort of type variable passed to the platform has to be wrapped in an indirection. So like List a
or Box a
. I 100% agree that a raw a
passed to a platform is definitely wrong.
to me the problem is that this allows the programmer to write a function a -> b where a and b are generic. i think this is prone to writing mistakes, and makes it hard to manage your applications (whenever you change the input or output type, you need to make sure the types propagate but the compiler won’t help you). i think the right solution is runtime type tagging (not saying the language runtime should provide that, but some serialization should be used). The model case is a bit different I think because the programmer must explicitly type the model type (as something non-generic), and so you don’t end up being able to write a function a->b
I think nested runtime type tagging would be too slow. On the otherhand, I think sideband type tagging could work great.
Basically taggedEffect (Box.box a) (Type.type a)
Where if a
is a List (Str, I32)
, Type.type
would return a tag explaining that info ListType (TupleType [StrType, I32Type])
.
Then the roc type only has to be boxed and no other serialization has to happen. Type.type
would actually always be a compile time constant. So no cost to building it out. Mono would just know the answer.
yeah, that’s a much larger change though. that has some things to figure out too, for example if you do not want to add type tags to all runtime values you need to figure out where to drop them-and it may not be trivial to do so, because of the vitality of type inference
Why is it such a large change? after type inference runs and mono, won't the variable a
have a concrete type. So it would just be running a mapping over the concrete type of a? I mean today, inspect could be used to build the Type.type
function.
yes, but there a lot of details to discuss - how and when to check the tags, whether to infer whether tags should be added, the tag representation, so on. you’re right that it’s simple, but my guess is it’s not trivial to implement. yes, the actual tag to add is easy to compute
Ok, yeah. Totally agree with that sentiment.
I just want to make sure we don't lock to only decode/encode. I think that would be really sad for the perf story of some common patterns. Even having to box is a bit sad, but it gets around variable sized types so makes sense.
I also empathize with the concern about runtime type data being too slow, but i would suggest using serialization at the ffi boundary for now until performance becomes a concern in practice at which point some ecosystem or language level solution can be devised. i don’t think any real uses of Roc are going to run into a performance problem here while Roc is still being used for small/medium enterprise applications - and if you do run into something that has a perf problem, you can create a special-case effect. reducing the surface area of the language also makes development easier; there are still a ton of holes in compilation, and might be worth avoiding another potential source of those right now.
reducing the surface area of the language also makes development easier; there are still a ton of holes in compilation, and might be worth avoiding another potential source of those right now.
Very true
I guess maybe we just need someone to implement a simple binary serialization format that avoids needing to tag every individual piece of data. Make sure it can represent a List (Str, Int)
without generating the equivalent of everything being nested RocObject
s. Where you have a RocObject (List) -> RocObject (Tuple) -> RocObject(Str/Int)
. Cause that is a horrid amount of wrapping that would definitely ruin perf.
Also, by invent, probably just mean implement. I'm sure something must exist.
Brendan Hansknecht said:
I guess maybe we just need someone to implement a simple binary serialization format that avoids needing to tag every individual piece of data. Make sure it can represent a
List (Str, Int)
without generating the equivalent of everything being nestedRocObject
s. Where you have aRocObject (List) -> RocObject (Tuple) -> RocObject(Str/Int)
. Cause that is a horrid amount of wrapping that would definitely ruin perf.
this would be a nice binary serialization format to have in general, not just for FFI!
Yeah, maybe that will be something I dig into after sqlite. Not really sure what binary format to target though.
Things like protobuf and cap'n proto have no type info in the serialized format and require codegen for decode/encode.
msgpack or maybe bson look reasonable. They have repeated types inline, which is kinda annoying. So a list of 100 strings specifies that each individual element in the list is a string. That said, with both of these, specifying the element type is a single byte. And everything gets built into a single flat buffer. So should be reasonable for perf.
apache avro is one of the few things that seems to have out of band types that are sent as metadata. They send the types as json for some reason. I guess it is meant for big data, so parsing a single json to learn you are decoding a List (Str, Int)
is no big deal. Then the actual data is densely packed with no type info.
Anyone have general input on binary formats with type info? I would guess that bson
is the most popular simply because it is used with mongo db. That said, msgpack
looks a lot cleaner and simpler. But I don't really have much knowledge of the various options in this space.
Not quite sure if its the same. But folkert mentioned this format to me when I was asking about something similar https://postcard.jamesmunns.com/wire-format
Benefit would be interop with rust
I think postcard
is in the same category as protobuf
and cap'n proto
. Totally type unsafe. I think it is kinda a simpler variant of those two libraries.
So if you pick the wrong type to decode into List (I32, Str)
when it should have been List (U64, Str)
, you will just decode wrong. It may fail. It may succeed and just give you garbage data.
I think for our first spec here, we probably want something with some form of type info, but maybe that is a wrong assumption.
One possibility would be to have a header section with a serialized version of the Layout
, then a body section with the actual data, in the same format we use at runtime. Pointers would get translated to byte offsets within the payload.
That pointer translation is something we already do in the Web REPL, where the user's compiled app is in a separate address space from the REPL app.
we can make a roc-specific one like rvn if there’s nothing off the shelf that does what we want!
For sure. I was trying to find something off the shelf simply to help with value. Like bson would both help here and would be useful if someone interacts with mongo DB. Plus it means we don't have to implement it in all host languages.
Brian Carroll said:
One possibility would be to have a header section with a serialized version of the
Layout
, then a body section with the actual data, in the same format we use at runtime. Pointers would get translated to byte offsets within the payload.
This is something I am trying to make as a Encoder/Decoder written in roc. I think it would actually be pretty hard to match Roc's runtime format using an encoder.
Brendan Hansknecht said:
For sure. I was trying to find something off the shelf simply to help with value. Like bson would both help here and would be useful if someone interacts with mongo DB
true, although if someone wants bson and we have this, it should give them a very strong starting point!
Last updated: Jul 06 2025 at 12:14 UTC