Stream: ideas

Topic: Encoders/Decoders


view this post on Zulip Kevin Gillette (Mar 14 2022 at 02:47):

Regarding Encoders/Decoders (as briefly mentioned in the #Abilities doc), is there thought about support for incremental operations, i.e. streaming to or from a file, connection, or "effect source/sink" ? It is regularly the case that buffering wire data in memory is problematic (decoding/decoding in terms of a string), since even with refcounting, there is a moment in which you pay the memory cost for both the wire data and the structured form, and processing gigabytes of data at a time is a realistic workload. Further, even if the data source can be trusted, it is unreasonable to read gigabytes of data you expect to be JSON, only to then attempt to decode and realize that the very first byte is "<" (as in XML, for example).

Also relevant to this discussion is streaming encoders and decoders, such as for https://jsonlines.org/. If support is flexible enough, it'd be compelling, but niche to also allow encoding or decoding into a JSON array of objects (or equivalents in other formats) without needing to to buffer either the full wire data nor the list of records within Roc (essentially by handling/disposing of an envelope [the array] then processing internal data via streaming an element at a time).

The Go language, for example, does a pretty good job of making "readers" and "writers" consistently applicable to just about anything that looks like a byte-stream (files/stdio, sockets, memory buffers, gzip wrappers around any of the above, etc), and can encode to or from an arbitrary amount of data with only a ~4 KiB of overhead (a fixed buffer window for performance); generating on-the-fly tar.bz2 data containing gzip'd members sourced from incoming socket data we're sending the result back to is, for example, trivial to express and has negligible overhead regardless of the amount of data being streamed. What Go typically does not do well in this regard is separating compatibility/validation from the actual streaming (making sure the data can be encoded prior to sending the first byte, or can be decoded before receiving the first byte). It sounds like the Abilities approach in Roc will allow this determination to be made at compile time.

In any case, I'd normally consider this a platform consideration, but if Encoders/Decoders are to be included in the stdlib, this topic should be broached. While most situations will correctly consume or produce strings, without a consistent approach to handling byte-stream data (preferably via an abstraction, such as a per-datum callback), some needs will prove especially awkward to express, particularly in a platform independent way.

view this post on Zulip Richard Feldman (Mar 14 2022 at 03:16):

sounds interesting! I hadn't thought about it, but seems worth keeping in mind when getting into design specifics of those abilities (which I haven't yet gotten to)


Last updated: Jun 16 2026 at 16:19 UTC