Deforestation · contributing · Zulip Chat Archive

On /fast it is indicated that deforestation - elision of intermediate lists - is a goal for roc. Reading the linked paper (https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/stream-fusion.pdf) it seems like there are four things needed (for this approach):

A new lazy list implementation they call "streams" that are stored as records containing a closure that gets the next element in the stream,
an implementation of list operations in terms of streams and the transformation functions stream and unstream,
a compiler optimization that removes the composition of stream and unstream, and
a variety of compiler optimisations that do things like avoid packing/unpacking values, avoid creating and immediately calling closures, and restructure nested case (when) statements.

Now the last two involve complicated operations on (I think) roc's intermediate representation; these seem hard. These also seem like useful optimisations to try to implement anyway.

The first two, or at least the first one, seems like a draft version could be done entirely in roc. What's more, these streams seem to me to be useful as a general lazy list type in a language that is otherwise completely strict.

Is there an appetite for a streams library, either simply for use in deforestation or for general use in roc?

Brendan Hansknecht (Jan 31 2024 at 19:13):

Yeah, 4 is definitely wanted for certain other closure changing like happens a ton with tasks

Brendan Hansknecht (Jan 31 2024 at 19:15):

As for the streams, I think our goal is to hide it 100% from user land. So end users would have no concept of streams. Just lists. Silently under the hood we would do all this streaming.

That said, I'm not sure if it would be written in roc or zig or other.

Anne Archibald (Jan 31 2024 at 19:35):

I don't know, wouldn't a lazy list type be useful for users? In python I use their lazy lists (iterators) all the time. They can avoid creating huge intermediate results (imagine a video decoder that emits a lot of high-resolution frames that need processing) without relying on the language to optimise those results away. For example, I might be streaming frames from my webcam, ready to stop when a frame contains recognisable Aruco target. This is an endless stream of large items, but we could handle it with the stream version of findFirst.

Folkert de Vries (Jan 31 2024 at 20:02):

that is completely valid, but we can't really make lazy streams work well with language design: now when an api needs a sequence (e.g. creating a string from bytes), do you pass in a list or a stream? it makes apis more complicated.

So a stream type could (should!) exist in the ecosystem, but we don't want to include it in the language/standard library

Folkert de Vries (Jan 31 2024 at 20:02):

@Richard Feldman I think the guide needs to make that more explicit. The design is not really "list fusion" as it is commonly understood

Richard Feldman (Jan 31 2024 at 21:34):

yeah is there something different we could link to?

Folkert de Vries (Jan 31 2024 at 22:31):

no, just something like "eliminating intermediate values" instead of specifically fusion

Folkert de Vries (Jan 31 2024 at 22:31):

or well deforestation apparently means the same thing in the literature

Richard Feldman (Jan 31 2024 at 23:19):

yeah that was my understanding :big_smile:

Richard Feldman (Jan 31 2024 at 23:20):

but I'd like to link to something more concrete so it doesn't sound like we're trying to do something purely theoretical that's never been done in any form before

Folkert de Vries (Jan 31 2024 at 23:31):

well what we propose is just term rewriting. standard compiler stuff (inline, simplify)

Brendan Hansknecht (Jan 31 2024 at 23:52):

What is our actual plan for composing two list operations? Will zig builtins need functions like mapReduce and all of the other many many combinations? Obviously with some things like map, we just need to chain the inner lambda, but it is less clear for more complex functions and various intermixed functions.

Anne Archibald (Feb 01 2024 at 19:56):

I wrote a quick translation of the above Coutts et al. paper: https://github.com/aarchiba/roc-experiments/blob/main/Streams.roc [ETA: moved]

I can't speak to more complicated optimisations, but they make a convincing argument that an optimiser rule that recognises and removes occurences of \x -> unstream x |> stream can help. It's technically not the identity, because removing it means that an infinite stream will no longer crash with an out-of-memory error. Nevertheless, they show that if your stream functions are non-recursive you can fuse them down very cleanly once that pair has been removed everywhere.

I don't know how roc's optimisation works at all, but if it operates on something like the syntax tree, it might help to have stream and unstream implemented in another language, or at least tagged for easy recognition. Certainly to achieve this kind of optimisation they should live in the compiler, one way or another. Leaving them in roc allows for the possibility that roc's hypothetically good optimiser could combine parts of their internals with their producers or consumers when they don't occur as a pair. There are also possible advantages to operations like streamReversed (List.reverse can't be stream-based, but a list can easily be read backwards into a stream).

I get the impression that there is a hope that roc could very generally recognise construction followed by iteration over and then discarding of many kinds of containers?

Folkert de Vries (Feb 01 2024 at 20:37):

in theory, yes, because the list api is more built into the compiler than in say haskell where you can define many of the functions efficiently in userspace (map, folds, etc)

Folkert de Vries (Feb 01 2024 at 20:38):

in practice, this sort of thing is hard. Haskell has an elaborate system of (phased) rewrite rules to optimize operations on lists

Folkert de Vries (Feb 01 2024 at 20:38):

we'd need something similar, but ideally it would be simpler and less fragile than that

Folkert de Vries (Feb 01 2024 at 20:39):

(equality saturation promises that simpler solution, but the research isn't quite at the point yet where we can just implement it for a language as complex as roc)

Anne Archibald (Feb 02 2024 at 17:37):

Folkert de Vries said:

we'd need something similar, but ideally it would be simpler and less fragile than that

I can see that it would be difficult for the compiler to decide which would be more efficient, a sequence of list operations with the possibility of large intermediate lists but also the possibility of opportunistic mutation, or a sequence of stream operations that might or might not be fused into an efficient loop. So perhaps it makes sense to let the user decide which data structure is more efficient and convenient for their application?

Anne Archibald (Feb 10 2024 at 12:23):

In case it's of any interest to anyone, I did the experiment of implementing most of the roc list-processing functions in terms of easily fusible streams.

I think that with fromList/toList/fromListReversed/toListAppend, fromDictItems/toDict/writeToDict etc. these could provide implementations of most of the operations on lists and dicts (and sets) that are as, or nearly as, efficient as the direct implementations. It would also save writing all the combinations of walkListBackwardsUntil etc. for each container type in turn.

An example of where this might be less efficient would be keepIf/dropIf: if you use myList |> fromList |> dropIf predicate |> toListAppend (List.withCapacity (List.len myList)) then you will get unnecessary copying of elements up until the first item is dropped; after that you will be doing the necessary copying with similar efficiency in a similar loop.

More minor operations that impact only a few list elements can often be handled with readFromList and writeToList, which use indices and allow opportunistic mutation. Dict operations likewise.

I do think some kind of iterator concept, possibly as simple as supporting next or possibly a more complex ability like rust's to allow more efficient implementations of specific operations, would simplify operations on containers. The alternative seems to be implementing a huge list of (hopefully the same) transformation operations and hoping the compiler is able to elide all the intermediate containers.

Brian Carroll (Feb 10 2024 at 17:37):

Cool! Got a link we could check out?

Anne Archibald (Feb 10 2024 at 17:51):

Brian Carroll said:

Cool! Got a link we could check out?

Oops! Meant to point to this directly:

https://github.com/aarchiba/roc-experiments/blob/main/Stream.roc

There is an experimental attempt to make it possible to write streams using a generator syntax, using {} <- Yield value to mean "yield this value next then resume with the following statements". Unfortunately I ran into a compiler bug and don't know how to work around it, so it's half-finished (Yield works but yieldFrom doesn't). But it's here:

https://github.com/aarchiba/roc-experiments/blob/main/Generator.roc

Last updated: Aug 17 2025 at 12:14 UTC

Stream: contributing

Topic: Deforestation

Anne Archibald (Jan 31 2024 at 18:39):

Brendan Hansknecht (Jan 31 2024 at 19:13):

Brendan Hansknecht (Jan 31 2024 at 19:15):

Anne Archibald (Jan 31 2024 at 19:35):

Folkert de Vries (Jan 31 2024 at 20:02):

Folkert de Vries (Jan 31 2024 at 20:02):

Richard Feldman (Jan 31 2024 at 21:34):

Folkert de Vries (Jan 31 2024 at 22:31):

Folkert de Vries (Jan 31 2024 at 22:31):

Richard Feldman (Jan 31 2024 at 23:19):

Richard Feldman (Jan 31 2024 at 23:20):

Folkert de Vries (Jan 31 2024 at 23:31):

Brendan Hansknecht (Jan 31 2024 at 23:52):

Anne Archibald (Feb 01 2024 at 19:56):

Folkert de Vries (Feb 01 2024 at 20:37):

Folkert de Vries (Feb 01 2024 at 20:38):

Folkert de Vries (Feb 01 2024 at 20:38):

Folkert de Vries (Feb 01 2024 at 20:39):

Anne Archibald (Feb 02 2024 at 17:37):

Anne Archibald (Feb 10 2024 at 12:23):

Brian Carroll (Feb 10 2024 at 17:37):

Anne Archibald (Feb 10 2024 at 17:51):