Roc for database operations · ideas

So, I have this weird idea for a uniquely-shaped datastore that won't leave me alone, and it would actually be a pretty good fit for Roc.

The tl;dr is that it's an Event Sourcing system ("The log is the DB, everything else is a cache") where what's in the log are actions. The actions in the log are an action name, a version number of the action implementation, and a JSON blob of parameters. To run the action, you use the name and version to get a WASM blob, and you execute it.

So actions are a deterministic pure function that take as inputs the current object state and parameters, and return the new object state.

How hard would it be to do that with Roc, and would it be a good fit? The idea is that actions could be (re)played against a local version of the objects on a server, on a device, etc. (Hence the WASM).

Easy mode is "an object is a JSON blob representing a single entity", harder mode is "an object is a JSON blob representing a collection of related objects" and galaxy-brain mode is "an object is a SQLite database". So it would be nice if the "platform" had "run this SQL query and give me the results" tasks.

Like I said, weird, but the idea isn't leaving me alone until I elucidate it to myself.

The thing I'm unclear on: would WASM/WASI typically constitute the platform for Roc, or would my app be the platform and the Roc code just happen to be WASM? I think I want the latter, so that eg. SQL queries or actions can be provided as tasks from my app(s), be they Go or Java or Swift, etc.

I guess I could keep it all within WASM by using WASM-compiled SQLite, but it would be nice if there were a clean easy API to use Roc for plugins via WASM, and just implement the Tasks you wanted to add in as part of the platform, the way you can link in a set of external functions to a WASM execution environment.

Luke Boswell (Jun 29 2024 at 01:12):

I think it might help you to mock out an app and explore this idea from a users perspective. Like imagine if this thing existed and was all there. You are a user of this new datastore. Write an app or something that uses it.

Luke Boswell (Jun 29 2024 at 01:25):

You pay a performance cost moving data across the WASM boundary if you have to encode everything into numbers.

Zellyn Hunter (Jun 29 2024 at 01:28):

It's actually not a terribly convenient datastore to use. It's trying to make the same bargain that Google AppEngine did back in the day: warp your storage needs into this weird shape, and I promise it'll have the characteristics you want later: for AppEngine, horizontal scalability, for this scheme, an object store with automatic local caching where the store can be easily and safely replicated, and the "write owner" failed over in a Disaster Recovery situation.

Zellyn Hunter (Jun 29 2024 at 01:30):

To write, you traverse the network, possibly to the other coast, to ask the current "write owner" to enqueue an action: the change is real when it appears in the log. To read, you run a local cache server that follows the log and creates a local version of the data. Or you could subscribe to changes filtered down to a set of objects you care about and keep the cache yourself.

Luke Boswell (Jun 29 2024 at 02:43):

The Zellyn Object Store (ZellosDB)

This is an architecture for a distributed object store, which provides reliable offline actions and seamless synchronisation across a network.

Zellos Library - the core component of ZellosDB is the library. This is compiled to native machine code and provides all of the core functionality for the system. It is responsible for managing the object data store (local copies of the log and caches), execution of actions to manipulate objects, synchronising with other nodes in the network e.g. negotiation of “write owner”, uploading new actions to nodes across the network.

Zellos CLI - the ZellosDB CLI is used to interact with ZellosDB. It provides key functionality to manage the network, like query for data manually, inspecting available actions, and managing logs.

Zellos API - the ZellosDB API is a roc platform, which describes a DSL for writing “actions” to work with objects. Each action is a roc app (plugin) which can manipulate objects in the data store. When an end client queries the ZellosDB object store, they will run an action and (optionally) provide data which can be processed by the roc plugin, and used to manipulate the objects in the store, before returning a result.

Zellos Server - the ZellosDB Server is a standalone service which provides primary storage for the object store, and redundancy in the event another server is unavailable. In normal operation each end device in the network will be running actions on a ZellosDB Server.

Zellos Client - a ZellosDB Client is an end device or application which is interacting with the distributed object store. It may be a Swift, Kotlin, Go or another application or server which uses the Library to interact with the network. The most common method of using ZellosDB Library is to natively link it with the application. It is also possible for a client to use the CLI or run a Server in another process if required. A Client will store some objects locally, those they need to work with, and can run actions locally.

Zellos Action - A ZellosDB action is a Roc application (plugin) which is able to manipulate objects in the store, before returning a result. When loading a new action into the store, it is first compiled to WASM bytecode before being stored in the network. Actions are versioned. When required to be executed, a WASM execution engine running in a server (or possibly an offline client) will run the WASM bytecode to perform the desired actions on the store. Note – while the action is WASM bytecode, it is not expected that these actions will be written in any languages other than Roc.

Zellyn Hunter (Jun 29 2024 at 02:46):

lol, you have no idea how distracting it is to try to read text littered with the first four letters of my (uncommon) first name in all caps :joy:

Zellyn Hunter (Jun 29 2024 at 02:51):

I think the main difference between that description and the fuzzy idea in my mind is probably that I'd been thinking of using something like Kafka for transporting streams of actions around, rather than requiring native code for interacting with the network.

Luke Boswell (Jun 29 2024 at 02:53):

I forgot to mention, I think this is a great fit for roc. Also, I would imagine Zig or Rust would be a good choice to build this in.

Zellyn Hunter (Jun 29 2024 at 02:55):

I think perhaps an interesting subset/alternative of the idea could be: "What if we take CRDTs the way the local-first/inkandswitch folks are using them, but _also_ allow small Roc programs that manipulate the data to travel around the network too?" Is there a way to dynamically update the set of operations you can perform on a CRDT and ship those new actions around? Roc would be ideal for that kind of sandboxed action code. I think the big problem would be that it's very easy to break the CRDT properties with arbitrary code, but it might be interesting to only give the Roc platform CRDT-safe primitives.

Zellyn Hunter (Jun 29 2024 at 03:04):

Ok, I think something is crystallizing in my mind. It's not (necessarily) that you want Roc compiled to WASM.

It's that the easy and safe sandboxing ability of Roc platforms mean that in some ways Roc is asking to be a viable alternative to use-cases where WASM was being used for sandboxing. The interesting thing there is that the mode of distribution of WASM is "compiled WASM blob", but the mode of distribution for Roc in the same use-case is a text Roc program that a user still needs to compile. (I've heard Richard describe the "I feel perfectly safe downloading and running this Roc script" use-case a couple of times.)

In the script case, there's an assumption of a shared "safe script execution platform", and whatever machinery is there to let you "run" a Roc script stored as text — presumably, behind the scenes, it's compiling it against a defined platform and then executing it.

If I imagine using Roc as a language to customize an editor, say, or writing code in a game engine, I think the same implicit platform plus ability to think of the Roc program as text rather than compiled executable would be useful too.

Luke Boswell (Jun 29 2024 at 03:06):

For reference. The roc compiler itself is running in the browser for the web REPL. It compiles roc source code to WASM and then runs that in the browser.

Zellyn Hunter (Jun 29 2024 at 03:07):

I'm low-key hopeful that working on Zed will push Richard to introduce a modality to Roc where it can be used to write textual editor-customization scripts the way Emacs folks do with Elisp, where the platform and compiler are embedded in the host application and mostly invisible (I guess except for error messages :smile:)

Luke Boswell (Jun 29 2024 at 03:07):

So you could share the Roc plugins around in source code and run locally on end devices. Or compile to WASM end then execute the bytecode.

Zellyn Hunter (Jun 29 2024 at 03:07):

Fair. So it's 100% feasible to ship compiler + platform as WASM, and then consume text "scripts"… nice.

Luke Boswell (Jun 29 2024 at 03:08):

Yeah, or ship the compiler as native. It's not super large -- particularly if you are ok using a dev backend -- like WASM which is pretty mature.

Zellyn Hunter (Jun 29 2024 at 03:10):

At least in my original conception, the actions are already-compiled WASM programs, so you just need to grab one of the many off-the-shelf WASM execution engines available for your platform/language. So actions could be written in any language that compiles to WASM. Roc just seemed like a nice fit because of the pure function nature of the actions: (current state, parameters) → new state | error.

Zellyn Hunter (Jun 29 2024 at 03:11):

But I also kinda love the idea of small Roc programs wandering around the network :smile:

Luke Boswell (Jun 29 2024 at 03:15):

I would go with "small Roc programs flying around the network" ... we're trying to stick with bird themes

Luke Boswell (Jun 29 2024 at 03:16):

Zellyn Hunter (Jun 29 2024 at 03:17):

I think probably ever since I messed around with MUDs/MOOs back in the day, where "portals" between MUDs were a concept, I've wanted objects/programs to be able to wander afar across the network, do things, and then later return with the news…

Albert (Jul 04 2024 at 07:34):

Albert (Jul 04 2024 at 07:35):

Albert (Jul 04 2024 at 07:36):

each event is cryptographically signed so that we can verify data validity locally in the client and don't rely on any central set of servers

Albert (Jul 04 2024 at 07:36):

Albert (Jul 04 2024 at 07:37):

Albert (Jul 04 2024 at 07:38):

{"limit":2,"kinds":[0], "authors": ["the id"]},

Albert (Jul 04 2024 at 07:39):

I am thinking about embedding a simple language in the relay so that the client can have precise queries.

Albert (Jul 04 2024 at 07:39):

Albert (Jul 04 2024 at 07:40):

Albert (Jul 04 2024 at 07:41):

I would like to send Roc source from the client to the server and the server can Just In-time Compile it and run

Albert (Jul 04 2024 at 07:42):

@Richard Feldman @Luke Boswell what are the technical challenges to implement such a system at current stage?

Albert (Jul 04 2024 at 07:43):

This project is not a hobbyst project and many developers and I have real incentives to do so.

Albert (Jul 04 2024 at 07:44):

image.png
currently, the client always over fetch and store all events locally in browser's indexed DB

Albert (Jul 04 2024 at 07:45):

but that's the only way to make a purely event driven client fast because we don't have the ability to precisely query the server. Have to over fetch.

Albert (Jul 04 2024 at 07:46):

The good side is that even if the server goes down, the client can still work with historical data and the user can browser

Albert (Jul 04 2024 at 07:47):

Effectively, with a local-first approach, we turn the client into a true node/peer in this distributed & decentralized system. The client is just a server with only 1 user's data and the server is just a client without UI

Zellyn Hunter (Jul 04 2024 at 11:55):

fwiw, SpacetimeDB is another database that (only) allows operations that have been registered before, using WASM blobs.

Albert (Jul 04 2024 at 15:49):

Agus Zubiaga (Jul 04 2024 at 19:15):

I don't know if the compiler would be fast enough for this use case. Maybe it's ok if the queries don't vary much (outside of param values) and you can cache them?

Agus Zubiaga (Jul 04 2024 at 19:16):

Luke Boswell (Jul 04 2024 at 23:09):

One option to explore would be using the roc compiler with the WASM dev backend -- which I think is our most mature dev backend and also fast. Compile the source to WASM bytecode for your queries either ahead of time, or maybe just in time. The website has a version of the roc compiler which does this for the REPL.

You could start experimenting by taking the REPL and feeding it "queries" and measure how fast that is perhaps?

Well, roc is still a Work in Progress... so theres plenty of known bugs and performance issues. Things are rapidly evolving (for the better) and the documentation on platform development has yet to be written (because that is still heavily WIP).

From what I understand, roc could be a great fit (particularly in the long term), but there are a lot of open questions. I suggest you consider building a prototype and explore the technology risks, specifically with the intention of throwing it all away.

I'm only a little bit selfish, as I would love to see this happen so we can find any critical issues for the plugin use-case while roc is still early in development.

I've spoken to a few people who have very similar ideas in mind, so if you can do something in the open and are welcome to contributions, it could be beneficial for all.

Stream: ideas

Topic: Roc for database operations

Zellyn Hunter (Jun 29 2024 at 00:39):

Luke Boswell (Jun 29 2024 at 01:12):

Luke Boswell (Jun 29 2024 at 01:25):

Zellyn Hunter (Jun 29 2024 at 01:28):

Zellyn Hunter (Jun 29 2024 at 01:30):

Luke Boswell (Jun 29 2024 at 02:43):

The Zellyn Object Store (ZellosDB)

Zellyn Hunter (Jun 29 2024 at 02:46):

Zellyn Hunter (Jun 29 2024 at 02:51):

Luke Boswell (Jun 29 2024 at 02:53):

Zellyn Hunter (Jun 29 2024 at 02:55):

Zellyn Hunter (Jun 29 2024 at 03:04):

Luke Boswell (Jun 29 2024 at 03:06):

Zellyn Hunter (Jun 29 2024 at 03:07):

Luke Boswell (Jun 29 2024 at 03:07):

Zellyn Hunter (Jun 29 2024 at 03:07):

Luke Boswell (Jun 29 2024 at 03:08):

Zellyn Hunter (Jun 29 2024 at 03:10):

Zellyn Hunter (Jun 29 2024 at 03:11):

Luke Boswell (Jun 29 2024 at 03:15):

Luke Boswell (Jun 29 2024 at 03:16):

Zellyn Hunter (Jun 29 2024 at 03:17):

Albert (Jul 04 2024 at 07:34):

Albert (Jul 04 2024 at 07:35):

Albert (Jul 04 2024 at 07:35):

Albert (Jul 04 2024 at 07:36):

Albert (Jul 04 2024 at 07:36):

Albert (Jul 04 2024 at 07:36):

Albert (Jul 04 2024 at 07:37):

Albert (Jul 04 2024 at 07:37):

Albert (Jul 04 2024 at 07:38):

Albert (Jul 04 2024 at 07:39):

Albert (Jul 04 2024 at 07:39):

Albert (Jul 04 2024 at 07:40):

Albert (Jul 04 2024 at 07:40):

Albert (Jul 04 2024 at 07:40):

Albert (Jul 04 2024 at 07:41):

Albert (Jul 04 2024 at 07:42):

Albert (Jul 04 2024 at 07:43):

Albert (Jul 04 2024 at 07:44):

Albert (Jul 04 2024 at 07:45):

Albert (Jul 04 2024 at 07:45):

Albert (Jul 04 2024 at 07:46):

Albert (Jul 04 2024 at 07:47):

Zellyn Hunter (Jul 04 2024 at 11:55):

Albert (Jul 04 2024 at 15:49):

Albert (Jul 04 2024 at 15:49):

Agus Zubiaga (Jul 04 2024 at 19:15):

Agus Zubiaga (Jul 04 2024 at 19:16):

Luke Boswell (Jul 04 2024 at 23:09):