Roc Cache discussion - Abomonation serialization · compiler development

There was a back and forth about CapnProto type compression and whether it'd be faster and I'll chip in that there should be benchmarks on this in rust serialization land. Richard's serialization scheme is abomonation and there's also a rust capnproto implementation. I'm not deeply invested in this area but my memory is that abomonation trounces all the other serialization strategies by a significant margin when things aren't coming over a network.

csg (Jul 27 2025 at 18:15):

Sure, but metaphysically what good is a message you can't send to another?

There are a lot of external variables - in particular the size of messages and the rhythm/cadence of the protocol (multi-mesage) interactions that the system produces when normally operating.

Cap'n proto has a "lens" (from haskell) approach on the opaque blob. I can get you down to the part you want to modify or observe. The cost for this is loop-detection to guarantee termination on malformed data.

Protocol-buffers completely parse the message into your local language structure. Protobuf rejects malformed objects at parse. The costs are a full-traversal and transcoding of the message (at least 1 copy in ram) and needing to work for the entire message to be received before starting parsing.

Cap'n proto would be "infinitely faster" for a moderate sized (1M) protobuf message because our first-scan of the data can be used for our program; not decoding into a different buffer we can also read.

Some of these choices could be a function of the larger system architecture -- if you spent the setup to RDMA or IO_URING to use pre-allocated page-aligned buffers for speed; and then protobuf decode you're hacking around on stuff that works; but you're being quite silly about performance.

The variance of in the interactions between service A <-> service B vary much more fundamentally in size, frequency, and computational cost per message - than these overheads.

I can pontificate on RPC systems all day. In the hangout - I was only getting 40% signal due to this being my first interaction in the community. @Richard Feldman can you remind us of the context where you're thinking carefully about the structure packing for ROC ?

Karl (Jul 27 2025 at 18:34):

In case it wasn't clear from the crate name, nobody is arguing in favor of abomination for general purpose RPC including the author. It was written to be a fast way to distribute computing across Timely nodes and comes with a whole stack of caveats. I only mention it because the question came up and I'd expect the comparison to be representative.

csg (Jul 27 2025 at 19:20):

I did not parse abomination as code, but as an adjective followed by awkward grammar. :upside_down: :slight_smile:

Luke Boswell (Jul 27 2025 at 22:18):

Richard's serialization scheme is abomonation and there's also a rust capnproto implementation.

@Karl can you share a link to something I can read about this. I'm interested, but having trouble searching for it.

Luke Boswell (Jul 27 2025 at 22:19):

Oh, nvm I think I found it... https://crates.io/crates/abomonation

Karl (Jul 27 2025 at 22:19):

The crate: https://github.com/TimelyDataflow/abomonation

Karl (Jul 27 2025 at 22:20):

If you can find rust serialization benchmarks it shows up relatively often. It's basically the same scheme as Richard outlined.

Karl (Jul 27 2025 at 22:21):

https://www.frankmcsherry.org/serialization/2015/05/04/unsafe-at-any-speed.html

Luke Boswell (Jul 27 2025 at 22:21):

Warning: Abomonation should not be used on any data you care strongly about, or from any computer you value the data on. The encode and decode methods do things that may be undefined behavior, and you shouldn't stand for that.

What's that about? I guess it's not relevant for the way we're talking about using it in Roc?

Notification Bot (Jul 27 2025 at 22:21):

10 messages were moved here from #gatherings > Roc Online Meetup Jul 2025 by Luke Boswell.

Karl (Jul 27 2025 at 22:25):

The blog post I just linked outlines the reasons but the brief part is that in-memory representation isn't a reliable format for something that can wind up on machines with different architectures. It also can't handle versioning and whatnot. Basically it's reasonable enough if you're running the same software on the same CPU architecture and otherwise you're in undefined behavior and there are no guardrails for any of this.

Richard Feldman (Jul 27 2025 at 22:30):

in this specific case the only relevant distinction between machines would be 32-bit vs 64-bit pointers

Richard Feldman (Jul 27 2025 at 22:32):

we could prevent that if we wanted to by padding our ~20 pointers with zeros on 32-bit targets (so, just wasm32) which would mean the cache files could be generated on a 64-bit system and then used in wasm32 builds of the compiler

Richard Feldman (Jul 27 2025 at 22:33):

I thought about this but it didn't seem worth doing right now since I couldn't think of a use for it, plus we can kinda do it whenever we want to later if desired

Luke Boswell (Jul 27 2025 at 22:46):

I'm really fascinated by this idea of compiling on my machine and sending the ModuleEnv over the network for a host running in the cloud, potentially a WASM runtime.

Luke Boswell (Jul 27 2025 at 23:01):

Like I'm wondering if we could use this for hot-reloading something like basic-dom?

Richard Feldman (Jul 27 2025 at 23:01):

well it's pretty straightforward to do that if we want to

Luke Boswell (Jul 27 2025 at 23:06):

I'm definitely leaning towards we should do this now, while we have the context loaded and are testing it etc. Such an incredible capability.

Richard Feldman (Jul 27 2025 at 23:10):

ok I'll make a pr for it tonight or tomorrow prob

Last updated: Nov 08 2025 at 12:13 UTC

Stream: compiler development

Topic: Roc Cache discussion - Abomonation serialization

Karl (Jul 27 2025 at 12:12):