On /fast it is indicated that deforestation - elision of intermediate lists - is a goal for roc. Reading the linked paper (https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/stream-fusion.pdf) it seems like there are four things needed (for this approach):
stream
and unstream
,Now the last two involve complicated operations on (I think) roc's intermediate representation; these seem hard. These also seem like useful optimisations to try to implement anyway.
The first two, or at least the first one, seems like a draft version could be done entirely in roc. What's more, these streams seem to me to be useful as a general lazy list type in a language that is otherwise completely strict.
Is there an appetite for a streams library, either simply for use in deforestation or for general use in roc?
Yeah, 4 is definitely wanted for certain other closure changing like happens a ton with tasks
As for the streams, I think our goal is to hide it 100% from user land. So end users would have no concept of streams. Just lists. Silently under the hood we would do all this streaming.
That said, I'm not sure if it would be written in roc or zig or other.
I don't know, wouldn't a lazy list type be useful for users? In python I use their lazy lists (iterators) all the time. They can avoid creating huge intermediate results (imagine a video decoder that emits a lot of high-resolution frames that need processing) without relying on the language to optimise those results away. For example, I might be streaming frames from my webcam, ready to stop when a frame contains recognisable Aruco target. This is an endless stream of large items, but we could handle it with the stream version of findFirst.
that is completely valid, but we can't really make lazy streams work well with language design: now when an api needs a sequence (e.g. creating a string from bytes), do you pass in a list or a stream? it makes apis more complicated.
So a stream type could (should!) exist in the ecosystem, but we don't want to include it in the language/standard library
@Richard Feldman I think the guide needs to make that more explicit. The design is not really "list fusion" as it is commonly understood
yeah is there something different we could link to?
no, just something like "eliminating intermediate values" instead of specifically fusion
or well deforestation apparently means the same thing in the literature
yeah that was my understanding :big_smile:
but I'd like to link to something more concrete so it doesn't sound like we're trying to do something purely theoretical that's never been done in any form before
well what we propose is just term rewriting. standard compiler stuff (inline, simplify)
What is our actual plan for composing two list operations? Will zig builtins need functions like mapReduce and all of the other many many combinations? Obviously with some things like map, we just need to chain the inner lambda, but it is less clear for more complex functions and various intermixed functions.
I wrote a quick translation of the above Coutts et al. paper: https://github.com/aarchiba/roc-experiments/blob/main/Streams.roc [ETA: moved]
I can't speak to more complicated optimisations, but they make a convincing argument that an optimiser rule that recognises and removes occurences of \x -> unstream x |> stream can help. It's technically not the identity, because removing it means that an infinite stream will no longer crash with an out-of-memory error. Nevertheless, they show that if your stream functions are non-recursive you can fuse them down very cleanly once that pair has been removed everywhere.
I don't know how roc's optimisation works at all, but if it operates on something like the syntax tree, it might help to have stream and unstream implemented in another language, or at least tagged for easy recognition. Certainly to achieve this kind of optimisation they should live in the compiler, one way or another. Leaving them in roc allows for the possibility that roc's hypothetically good optimiser could combine parts of their internals with their producers or consumers when they don't occur as a pair. There are also possible advantages to operations like streamReversed (List.reverse can't be stream-based, but a list can easily be read backwards into a stream).
I get the impression that there is a hope that roc could very generally recognise construction followed by iteration over and then discarding of many kinds of containers?
in theory, yes, because the list api is more built into the compiler than in say haskell where you can define many of the functions efficiently in userspace (map, folds, etc)
in practice, this sort of thing is hard. Haskell has an elaborate system of (phased) rewrite rules to optimize operations on lists
we'd need something similar, but ideally it would be simpler and less fragile than that
(equality saturation promises that simpler solution, but the research isn't quite at the point yet where we can just implement it for a language as complex as roc)
Folkert de Vries said:
we'd need something similar, but ideally it would be simpler and less fragile than that
I can see that it would be difficult for the compiler to decide which would be more efficient, a sequence of list operations with the possibility of large intermediate lists but also the possibility of opportunistic mutation, or a sequence of stream operations that might or might not be fused into an efficient loop. So perhaps it makes sense to let the user decide which data structure is more efficient and convenient for their application?
In case it's of any interest to anyone, I did the experiment of implementing most of the roc list-processing functions in terms of easily fusible streams.
I think that with fromList/toList/fromListReversed/toListAppend, fromDictItems/toDict/writeToDict etc. these could provide implementations of most of the operations on lists and dicts (and sets) that are as, or nearly as, efficient as the direct implementations. It would also save writing all the combinations of walkListBackwardsUntil etc. for each container type in turn.
An example of where this might be less efficient would be keepIf/dropIf: if you use myList |> fromList |> dropIf predicate |> toListAppend (List.withCapacity (List.len myList))
then you will get unnecessary copying of elements up until the first item is dropped; after that you will be doing the necessary copying with similar efficiency in a similar loop.
More minor operations that impact only a few list elements can often be handled with readFromList
and writeToList
, which use indices and allow opportunistic mutation. Dict operations likewise.
I do think some kind of iterator concept, possibly as simple as supporting next
or possibly a more complex ability like rust's to allow more efficient implementations of specific operations, would simplify operations on containers. The alternative seems to be implementing a huge list of (hopefully the same) transformation operations and hoping the compiler is able to elide all the intermediate containers.
Cool! Got a link we could check out?
Brian Carroll said:
Cool! Got a link we could check out?
Oops! Meant to point to this directly:
https://github.com/aarchiba/roc-experiments/blob/main/Stream.roc
There is an experimental attempt to make it possible to write streams using a generator syntax, using {} <- Yield value
to mean "yield this value next then resume with the following statements". Unfortunately I ran into a compiler bug and don't know how to work around it, so it's half-finished (Yield
works but yieldFrom
doesn't). But it's here:
https://github.com/aarchiba/roc-experiments/blob/main/Generator.roc
Last updated: Jul 06 2025 at 12:14 UTC