Boxing Anything · compiler development

storeAnything : Box a -> Effect U64
loadAnything : U64 -> Effect (Box a)

They are fundamentally what enable dynamic FFI and would enable caching any data in basic webserver. They just have no type safety. Unless loadAnything could inspect the type expected to be returned, roc user code could request a Box Str when you actually have a Box U64 stored.

Jasper Woudenberg (Jul 05 2024 at 19:27):

These remind me a bit of the Var type in Haskell. Maybe you're already familiar, their types (translated to Roc syntax) would are like:

new : Box a -> Effect (Var a)
read: Var a -> Effect (Box a)
write: Var a, Box a -> Effect {}

Where Var a represents a mutable reference to some value of type A. I think that would allow caching in a type-safe manner.

Brendan Hansknecht (Jul 05 2024 at 19:29):

I don't think roc currently would allow you to define Var a cause it would have an unused type variable.

Brendan Hansknecht (Jul 05 2024 at 19:35):

ffi: Lib, Str, Box a -> Effect (Box b)

So I don't think there is any sort of Var a strategy that could work. But I think that is expected. Obviously ffi, can totally transform types.

Jasper Woudenberg (Jul 05 2024 at 19:36):

$ roc repl
» Var a := U64
» mkvar : a -> Var a
… mkvar = \_ -> @Var 42

<function> : a -> Var a
» mkvar "Hi!"

@Var 42 : Var Str

Brendan Hansknecht (Jul 05 2024 at 19:36):

Agus Zubiaga (Jul 05 2024 at 19:41):

Richard Feldman (Jul 05 2024 at 19:44):

yeah it's an error for type aliases but not even a warning for opaque type (because yeah, phantom types are useful!)

Brendan Hansknecht (Jul 05 2024 at 19:45):

Richard Feldman (Jul 05 2024 at 19:48):

Brendan Hansknecht (Jul 05 2024 at 19:51):

Yeah, you could do a List U8 instead of Box a and force some form of encoding and decoding.

Brendan Hansknecht (Jul 05 2024 at 19:52):

Brendan Hansknecht (Jul 05 2024 at 19:53):

To the platform, box is just a pointer, so they can handle and store it safely for the most part. (refcounting may have some complications, but I'm not 100% sure)

Richard Feldman (Jul 05 2024 at 21:21):

Richard Feldman (Jul 05 2024 at 21:22):

because we can’t know the layout of what’s behind that pointer, so we don’t know how to write it down to replay it later

Brendan Hansknecht (Jul 05 2024 at 21:24):

Hmm... You have to allow some form of it due to Box a being used for models? Or is that different

Brendan Hansknecht (Jul 05 2024 at 21:26):

Richard Feldman (Jul 05 2024 at 22:04):

Richard Feldman (Jul 05 2024 at 22:06):

I guess for replay we can know what the type is after monomorphization :thinking:

Richard Feldman (Jul 05 2024 at 22:07):

so we could use that to know the layout, which in turn means we'd know how to traverse it for replay

Brendan Hansknecht (Jul 05 2024 at 22:07):

Richard Feldman (Jul 05 2024 at 22:07):

it's unsafe, but to be fair, literally anything we get from the host could be a bad pointer

Richard Feldman (Jul 05 2024 at 22:07):

Richard Feldman (Jul 05 2024 at 22:08):

Brendan Hansknecht (Jul 05 2024 at 22:08):

The part that makes it less safe is that if exposed directly to userland, the app author can hit it

Richard Feldman (Jul 05 2024 at 22:08):

Richard Feldman (Jul 05 2024 at 22:09):

so we'd go from "only platform authors can cause UB" to "application authors (or any of the libraries they use which return Task) can cause UB if the platform supports it"

Brendan Hansknecht (Jul 05 2024 at 22:20):

Yep. Though I don't think UB is the right term. More like can generate totally broken bindings that crash the app or return garbage data.

That said, it is one of those weird cases where depending on how it is exposed, it could be totally safe. Like stored with Var a. So kinda a case where it enables new features, but those features may have too much power

Richard Feldman (Jul 05 2024 at 22:42):

Richard Feldman (Jul 05 2024 at 22:43):

including overwriting data in a completely unrelated data structure which was unfortunate enough to be adjacent in memory to the thing with the wrong layout, which in turn leads to other incorrect pointers, which…etc etc

Richard Feldman (Jul 05 2024 at 22:45):

yeah this feels like “we can always introduce support later if it feels worth it, but it would be a big breaking change to take away if we support it now and regret it later”

Brendan Hansknecht (Jul 05 2024 at 22:53):

UB is just a very specific compiler term that means something different. Like when c++ or llvm says UB, it means that they won't specify a specific behaviour generally for performance reasons. Then the optimizer will pick the faster choice for the specific hardware.

Having a wrong layout for a call isn't UB. It is totally incorrect and broken code.

Brendan Hansknecht (Jul 05 2024 at 22:54):

Ayaz Hafiz (Jul 05 2024 at 23:45):

i’m pretty sure exposing any generic types to the platform, in either input or output is a bad idea

Ayaz Hafiz (Jul 05 2024 at 23:45):

Brendan Hansknecht (Jul 05 2024 at 23:45):

Brendan Hansknecht (Jul 05 2024 at 23:46):

Ayaz Hafiz (Jul 05 2024 at 23:48):

that seems like a special case, because the type is specialized to the app right? in that case the runtime should box the model type, but it’s not generic in the sense that the application cannot call it with any other type

Ayaz Hafiz (Jul 05 2024 at 23:52):

said another way, from the perspective of the app there are no generic inputs or outputs

Brendan Hansknecht (Jul 05 2024 at 23:55):

In all of these cases, I think we are talking about actually sending a Box a over to the platform. The app is required to know what a is. I guess in the ffi case specifically, it is so flexible that generally it requires a type annotation or roc compilation will break.

Brendan Hansknecht (Jul 05 2024 at 23:55):

I think any sort of type variable passed to the platform has to be wrapped in an indirection. So like List a or Box a. I 100% agree that a raw a passed to a platform is definitely wrong.

Ayaz Hafiz (Jul 06 2024 at 00:02):

to me the problem is that this allows the programmer to write a function a -> b where a and b are generic. i think this is prone to writing mistakes, and makes it hard to manage your applications (whenever you change the input or output type, you need to make sure the types propagate but the compiler won’t help you). i think the right solution is runtime type tagging (not saying the language runtime should provide that, but some serialization should be used). The model case is a bit different I think because the programmer must explicitly type the model type (as something non-generic), and so you don’t end up being able to write a function a->b

Brendan Hansknecht (Jul 06 2024 at 00:15):

I think nested runtime type tagging would be too slow. On the otherhand, I think sideband type tagging could work great.

Where if a is a List (Str, I32), Type.type would return a tag explaining that info ListType (TupleType [StrType, I32Type]).

Brendan Hansknecht (Jul 06 2024 at 00:17):

Then the roc type only has to be boxed and no other serialization has to happen. Type.type would actually always be a compile time constant. So no cost to building it out. Mono would just know the answer.

Ayaz Hafiz (Jul 06 2024 at 00:19):

yeah, that’s a much larger change though. that has some things to figure out too, for example if you do not want to add type tags to all runtime values you need to figure out where to drop them-and it may not be trivial to do so, because of the vitality of type inference

Brendan Hansknecht (Jul 06 2024 at 00:21):

Why is it such a large change? after type inference runs and mono, won't the variable a have a concrete type. So it would just be running a mapping over the concrete type of a? I mean today, inspect could be used to build the Type.type function.

Ayaz Hafiz (Jul 06 2024 at 00:23):

yes, but there a lot of details to discuss - how and when to check the tags, whether to infer whether tags should be added, the tag representation, so on. you’re right that it’s simple, but my guess is it’s not trivial to implement. yes, the actual tag to add is easy to compute

Brendan Hansknecht (Jul 06 2024 at 00:24):

Brendan Hansknecht (Jul 06 2024 at 00:25):

I just want to make sure we don't lock to only decode/encode. I think that would be really sad for the perf story of some common patterns. Even having to box is a bit sad, but it gets around variable sized types so makes sense.

Ayaz Hafiz (Jul 06 2024 at 00:26):

I also empathize with the concern about runtime type data being too slow, but i would suggest using serialization at the ffi boundary for now until performance becomes a concern in practice at which point some ecosystem or language level solution can be devised. i don’t think any real uses of Roc are going to run into a performance problem here while Roc is still being used for small/medium enterprise applications - and if you do run into something that has a perf problem, you can create a special-case effect. reducing the surface area of the language also makes development easier; there are still a ton of holes in compilation, and might be worth avoiding another potential source of those right now.

Brendan Hansknecht (Jul 06 2024 at 00:28):

Brendan Hansknecht (Jul 06 2024 at 00:31):

I guess maybe we just need someone to implement a simple binary serialization format that avoids needing to tag every individual piece of data. Make sure it can represent a List (Str, Int) without generating the equivalent of everything being nested RocObjects. Where you have a RocObject (List) -> RocObject (Tuple) -> RocObject(Str/Int). Cause that is a horrid amount of wrapping that would definitely ruin perf.

Brendan Hansknecht (Jul 06 2024 at 00:31):

Richard Feldman (Jul 06 2024 at 01:24):

this would be a nice binary serialization format to have in general, not just for FFI!

Brendan Hansknecht (Jul 06 2024 at 01:50):

Yeah, maybe that will be something I dig into after sqlite. Not really sure what binary format to target though.

Things like protobuf and cap'n proto have no type info in the serialized format and require codegen for decode/encode.

msgpack or maybe bson look reasonable. They have repeated types inline, which is kinda annoying. So a list of 100 strings specifies that each individual element in the list is a string. That said, with both of these, specifying the element type is a single byte. And everything gets built into a single flat buffer. So should be reasonable for perf.

apache avro is one of the few things that seems to have out of band types that are sent as metadata. They send the types as json for some reason. I guess it is meant for big data, so parsing a single json to learn you are decoding a List (Str, Int) is no big deal. Then the actual data is densely packed with no type info.

Anyone have general input on binary formats with type info? I would guess that bson is the most popular simply because it is used with mongo db. That said, msgpack looks a lot cleaner and simpler. But I don't really have much knowledge of the various options in this space.

Luke Boswell (Jul 06 2024 at 01:56):

Luke Boswell (Jul 06 2024 at 01:57):

Brendan Hansknecht (Jul 06 2024 at 02:01):

I think postcard is in the same category as protobuf and cap'n proto. Totally type unsafe. I think it is kinda a simpler variant of those two libraries.

So if you pick the wrong type to decode into List (I32, Str) when it should have been List (U64, Str), you will just decode wrong. It may fail. It may succeed and just give you garbage data.

I think for our first spec here, we probably want something with some form of type info, but maybe that is a wrong assumption.

Brian Carroll (Jul 06 2024 at 09:13):

One possibility would be to have a header section with a serialized version of the Layout, then a body section with the actual data, in the same format we use at runtime. Pointers would get translated to byte offsets within the payload.

That pointer translation is something we already do in the Web REPL, where the user's compiled app is in a separate address space from the REPL app.

Richard Feldman (Jul 06 2024 at 10:49):

we can make a roc-specific one like rvn if there’s nothing off the shelf that does what we want!

Brendan Hansknecht (Jul 06 2024 at 15:36):

For sure. I was trying to find something off the shelf simply to help with value. Like bson would both help here and would be useful if someone interacts with mongo DB. Plus it means we don't have to implement it in all host languages.

Brendan Hansknecht (Jul 06 2024 at 15:38):

This is something I am trying to make as a Encoder/Decoder written in roc. I think it would actually be pretty hard to match Roc's runtime format using an encoder.

Richard Feldman (Jul 06 2024 at 16:27):

true, although if someone wants bson and we have this, it should give them a very strong starting point!

Stream: compiler development

Topic: Boxing Anything

Brendan Hansknecht (Jul 05 2024 at 18:26):

Jasper Woudenberg (Jul 05 2024 at 19:27):

Brendan Hansknecht (Jul 05 2024 at 19:29):

Brendan Hansknecht (Jul 05 2024 at 19:35):

Jasper Woudenberg (Jul 05 2024 at 19:36):

Brendan Hansknecht (Jul 05 2024 at 19:36):

Agus Zubiaga (Jul 05 2024 at 19:41):

Richard Feldman (Jul 05 2024 at 19:44):

Brendan Hansknecht (Jul 05 2024 at 19:45):

Richard Feldman (Jul 05 2024 at 19:48):

Richard Feldman (Jul 05 2024 at 19:48):

Richard Feldman (Jul 05 2024 at 19:48):

Brendan Hansknecht (Jul 05 2024 at 19:51):

Brendan Hansknecht (Jul 05 2024 at 19:52):

Brendan Hansknecht (Jul 05 2024 at 19:53):

Richard Feldman (Jul 05 2024 at 21:21):

Richard Feldman (Jul 05 2024 at 21:21):

Richard Feldman (Jul 05 2024 at 21:22):

Richard Feldman (Jul 05 2024 at 21:22):

Brendan Hansknecht (Jul 05 2024 at 21:24):

Brendan Hansknecht (Jul 05 2024 at 21:26):

Richard Feldman (Jul 05 2024 at 22:04):

Richard Feldman (Jul 05 2024 at 22:04):

Richard Feldman (Jul 05 2024 at 22:06):

Richard Feldman (Jul 05 2024 at 22:06):

Richard Feldman (Jul 05 2024 at 22:07):

Richard Feldman (Jul 05 2024 at 22:07):

Brendan Hansknecht (Jul 05 2024 at 22:07):

Richard Feldman (Jul 05 2024 at 22:07):

Richard Feldman (Jul 05 2024 at 22:07):

Richard Feldman (Jul 05 2024 at 22:08):

Brendan Hansknecht (Jul 05 2024 at 22:08):

Richard Feldman (Jul 05 2024 at 22:08):

Richard Feldman (Jul 05 2024 at 22:08):

Richard Feldman (Jul 05 2024 at 22:09):

Brendan Hansknecht (Jul 05 2024 at 22:20):

Richard Feldman (Jul 05 2024 at 22:42):

Richard Feldman (Jul 05 2024 at 22:43):

Richard Feldman (Jul 05 2024 at 22:43):

Richard Feldman (Jul 05 2024 at 22:45):

Brendan Hansknecht (Jul 05 2024 at 22:53):

Brendan Hansknecht (Jul 05 2024 at 22:54):

Ayaz Hafiz (Jul 05 2024 at 23:45):

Ayaz Hafiz (Jul 05 2024 at 23:45):

Brendan Hansknecht (Jul 05 2024 at 23:45):

Brendan Hansknecht (Jul 05 2024 at 23:46):

Ayaz Hafiz (Jul 05 2024 at 23:48):

Ayaz Hafiz (Jul 05 2024 at 23:52):

Brendan Hansknecht (Jul 05 2024 at 23:55):

Brendan Hansknecht (Jul 05 2024 at 23:55):

Ayaz Hafiz (Jul 06 2024 at 00:02):

Brendan Hansknecht (Jul 06 2024 at 00:15):

Brendan Hansknecht (Jul 06 2024 at 00:17):

Ayaz Hafiz (Jul 06 2024 at 00:19):

Brendan Hansknecht (Jul 06 2024 at 00:21):

Ayaz Hafiz (Jul 06 2024 at 00:23):

Brendan Hansknecht (Jul 06 2024 at 00:24):

Brendan Hansknecht (Jul 06 2024 at 00:25):

Ayaz Hafiz (Jul 06 2024 at 00:26):

Brendan Hansknecht (Jul 06 2024 at 00:28):

Brendan Hansknecht (Jul 06 2024 at 00:31):

Brendan Hansknecht (Jul 06 2024 at 00:31):

Richard Feldman (Jul 06 2024 at 01:24):

Brendan Hansknecht (Jul 06 2024 at 01:50):

Luke Boswell (Jul 06 2024 at 01:56):

Luke Boswell (Jul 06 2024 at 01:57):

Brendan Hansknecht (Jul 06 2024 at 02:01):

Brian Carroll (Jul 06 2024 at 09:13):

Richard Feldman (Jul 06 2024 at 10:49):

Brendan Hansknecht (Jul 06 2024 at 15:36):

Brendan Hansknecht (Jul 06 2024 at 15:38):

Richard Feldman (Jul 06 2024 at 16:27):