Iterators and fusion proposal · ideas

I like this overal! Minor nitpick about the for example with reverse and index, so not even all that relevant yet. If you do for index, elem in List.toIterRev list, does the index go up from zero? As a user I would expect the indices to correspond with their list index, but from reading the proposal it looks like it would just count up from zero. In rust this would be the difference between list.iter().rev().enumerate() and list.iter().enumerate().rev(). I guess it should just be clear to users what would be the case

Richard Feldman (Sep 18 2024 at 12:53):

ah that's a good point! maybe it's clearer to do something like |> Iter.enumerate explicitly instead of baking that in

Brendan Hansknecht (Sep 18 2024 at 13:07):

Could easily do both. Default is that it counts up (as if enumerate was applied at the very end). But if a user needs more control, they could explicitly call enumerate at some point

Richard Feldman (Sep 18 2024 at 14:59):

yeah, although seems simpler to start without it and then see if there's demand for the extra feature

Brendan Hansknecht (Sep 18 2024 at 15:07):

Pearce Keesling (Sep 18 2024 at 15:30):

This looks awesome! Relying on inlining makes me anxious but I suppose that is optional so probably good. Rust requires explicit iteration which is inconvenient but makes me confident, on the other side kotlin in theory optimizes away intermediate lists but I have no way to verify it so I don't really know. Having the ability to manually do it myself when I'm feeling paranoid seems good to me

Richard Feldman (Sep 18 2024 at 16:04):

I think it's possible to offer in the general case, but it seems like we can't guarantee that it's efficient in the general case

Richard Feldman (Sep 18 2024 at 16:05):

e.g. if I have a singly linked list, and I call Iter.reverse on that...it's implementable, but it can't be efficient

Richard Feldman (Sep 18 2024 at 16:05):

e.g. in Rust, .rev() is only available on a DoubleEndedIterator presumably for this reason

Richard Feldman (Sep 18 2024 at 16:06):

but I don't think it's worth getting into traits/abilities for this; seems more straightforward to just say "expose toIter no matter what, and toIterRev only if you can support that efficiently" (which is Rust's equivalent of only implementing DoubleEndedIterator if that's efficient)

Sam Mohr (Sep 18 2024 at 16:06):

We should probably then require explicit allocation in the cases where it's not possible to reverse an existing iterator.

Richard Feldman (Sep 18 2024 at 16:07):

Sam Mohr (Sep 18 2024 at 16:08):

Sam Mohr (Sep 18 2024 at 16:09):

iter
|> List.fromIter
|> List.reverse
|> List.toIter

Sam Mohr (Sep 18 2024 at 16:09):

Brendan Hansknecht (Sep 18 2024 at 16:38):

Sorry, I more mean, once I have an Iter, it is kinda opaque to the initial container type. I guess you have to pass the reverse function into Iter.custom when constructing the iterator?

Brendan Hansknecht (Sep 18 2024 at 16:39):

Like you can never build an efficient reverse from the primitives currently given to the iterator.

Brendan Hansknecht (Sep 18 2024 at 16:41):

Brendan Hansknecht (Sep 18 2024 at 16:43):

Brendan Hansknecht (Sep 18 2024 at 16:52):

Is there a way for us to support that? Preferably while allowing them to all use the same Iter API?

Brendan Hansknecht (Sep 18 2024 at 17:04):

My only current thought on supporting this in Roc without splitting the type is to Iter.custom take an optional nextBack function.

Then just for iter.nextBack and iter.Rev, it will return a result. So you would do

[ ... ]
|> List.iter
|> Iter.enumerate ...
|> try Iter.rev

Richard Feldman (Sep 18 2024 at 17:04):

I don't think so without involving either a second type parameter on Iter or else abilities, which would require introducing parameterized abilities

Richard Feldman (Sep 18 2024 at 17:04):

Brendan Hansknecht (Sep 18 2024 at 17:04):

Richard Feldman (Sep 18 2024 at 17:05):

Brendan Hansknecht (Sep 18 2024 at 17:05):

Richard Feldman (Sep 18 2024 at 17:06):

Brendan Hansknecht (Sep 18 2024 at 17:07):

["a", "b", "c"]
|> List.toIterRev
|> Iter.enumerate

["a", "b", "c"]
|> List.iter
|> Iter.enumerate
|> Iter.rev

Brendan Hansknecht (Sep 18 2024 at 17:08):

Brendan Hansknecht (Sep 18 2024 at 17:16):

Just skimmed through roc source cause I was curious if we ever do anything more complex here.

        .iter()
        .zip(state_fields)
        .zip(index_vars)
        .zip(state_field_vars)
        .rev()

        .iter()
        .rev()
        .zip(state_fields.iter().rev())
        .zip(index_vars.iter().rev())
        .zip(state_field_vars.iter().rev())

Richard Feldman (Sep 18 2024 at 17:21):

well the simplest fix would be to require in custom specifying how to reverse, and then always supporting it

Richard Feldman (Sep 18 2024 at 17:21):

but that has the downside that it wouldn't be obvious when the perf of that would be terrible

Richard Feldman (Sep 18 2024 at 17:22):

which it wouldn't be in any builtin data structures, but easily could be in a userspace data structure

Brendan Hansknecht (Sep 18 2024 at 17:25):

I feel like that will lead to some containers just crashing and noting the bad perf. That is at least what I would probably do.

Brendan Hansknecht (Sep 18 2024 at 17:27):

But that would randomly block an iterator from using functions that depend on reverse....which isn't good either.

Agus Zubiaga (Sep 18 2024 at 17:27):

Richard Feldman (Sep 18 2024 at 17:28):

Agus Zubiaga (Sep 18 2024 at 17:28):

Agus Zubiaga (Sep 18 2024 at 17:29):

I don’t think that’s that bad. I don’t think people are likely to write many Iter annotations, and we consider it fine for numbers.

Richard Feldman (Sep 18 2024 at 17:30):

Iter.map : Iter a x, (a -> b) -> Iter b x

Brendan Hansknecht (Sep 18 2024 at 17:31):

Agus Zubiaga (Sep 18 2024 at 17:31):

Richard Feldman (Sep 18 2024 at 17:33):

I think it would be that you get either Iter DoubleEnded or Iter SingleEnded and then Iter.rev would only accept Iter DoubleEnded as an argument

Richard Feldman (Sep 18 2024 at 17:33):

Agus Zubiaga (Sep 18 2024 at 17:34):

Richard Feldman (Sep 18 2024 at 17:34):

that way if you zip a double-ended with a single-ended it unifies to SingleEnded and you can't rev anymore

Richard Feldman (Sep 18 2024 at 17:35):

if they're both concrete types, then they can't unify and you couldn't zip them together

Brendan Hansknecht (Sep 18 2024 at 17:37):

Agus Zubiaga (Sep 18 2024 at 17:39):

You can also have a phantom type for whether the length is known so that Iter.len doesn't have to be Iter.lenIfKnown and return a Result

Agus Zubiaga (Sep 18 2024 at 17:39):

Brendan Hansknecht (Sep 18 2024 at 17:39):

Richard Feldman (Sep 18 2024 at 17:40):

Brendan Hansknecht (Sep 18 2024 at 17:41):

Agus Zubiaga (Sep 18 2024 at 17:41):

Iter a { unknownLen: {}, singleEnded: {} }

Agus Zubiaga (Sep 18 2024 at 17:41):

Agus Zubiaga (Sep 18 2024 at 17:42):

Richard Feldman (Sep 18 2024 at 17:42):

Agus Zubiaga (Sep 18 2024 at 17:50):

I was going to suggest something like Fused in Rust, but we should probably just make that the default

Agus Zubiaga (Sep 18 2024 at 17:51):

if the inner function returns Err NoMore, the returned Iter from Iter.next can just have the function replaced with one that always returns Err NoMore, so it doesn't have to keep checking

Agus Zubiaga (Sep 18 2024 at 18:03):

Agus Zubiaga (Sep 18 2024 at 18:04):

Maybe something like whether enumeration happened before reversing or the other way around, but that's probably not worth it

Brendan Hansknecht (Sep 18 2024 at 18:12):

For List.fromIter it will want to get the length if available, but it also wants to work even if there is no length. It might even want a minimum or upper bound on the length to use withCapacity as an esimate

Agus Zubiaga (Sep 18 2024 at 18:13):

Brendan Hansknecht (Sep 18 2024 at 18:20):

Agus Zubiaga (Sep 18 2024 at 18:22):

Agus Zubiaga (Sep 18 2024 at 18:25):

We can make the phantom type indicate whether the length is definitely known or not, but you could still call a function that returns a Result

Brendan Hansknecht (Sep 18 2024 at 18:25):

If the iterator has a known length, you want to List.withCapacity knownLen. If not, you likely want to query for an minimum bound and upper bound. If the upper bound is small, you want to just list the capacity to match it. If the upper bound is larger or unknown, you would set the capacity to the minimum.

Agus Zubiaga (Sep 18 2024 at 18:26):

Agus Zubiaga (Sep 18 2024 at 18:27):

while still making it possible to get the length for Iters that may or may not know it

Brendan Hansknecht (Sep 18 2024 at 18:29):

How often do you require a specific sized iterator/query the length? I don't think this concept even exists in rust and I don't think I have ever called size hint either (though it is used and required for collecting to a container with perf)

Agus Zubiaga (Sep 18 2024 at 18:31):

idk in some function that needs to do something special for the last item or something

Agus Zubiaga (Sep 18 2024 at 18:32):

that's common in UIs when you have to draw the first and last elements differently

Agus Zubiaga (Sep 18 2024 at 18:33):

Agus Zubiaga (Sep 18 2024 at 18:36):

but it seems convenient in cases where you're chaining multiple Iter and you need to know the total length

Agus Zubiaga (Sep 18 2024 at 18:39):

iter =
    Result.iter editingItem
    |> Iter.concat (List.iter existingItems)

total = Iter.len iter

total = List.len existingItems + (if Result.isOk editingItem then 1 else 0)

iter =
    Result.iter editingItem
    |> Iter.concat (List.iter existingItems)

Agus Zubiaga (Sep 18 2024 at 18:44):

Jasper Woudenberg (Sep 18 2024 at 19:41):

This is probably a no-go for some obvious reason, but the line in the doc about List.map being best implemented in terms of Iter got me thinking: would it be possible and/or a good idea for iterators to be an entirely behind-the-scenes concept? So we'd only have a List type, but behind the scenes that list might be represented using an array or an iterator.

So Iter.custom (which I guess would need a different name if there's no Iter module) would (secretly) always produce the iter implementation, and so all the functions implemented in terms of it (like Dict.keys) would too. List.map would take an interator-backed-list, so if we're passing it a array-backed-list the compiler would have to insert a conversion from array-backed-list to iterator-backed-list first.

Upside would be that folks don't need to learn an Iter concept and or convert between these types. But there's probably some scenarios I'm overlooking where it's ambiguous which representation is most performant for a piece of code and where we'd like control.

Brendan Hansknecht (Sep 18 2024 at 19:58):

Brendan Hansknecht (Sep 18 2024 at 20:10):

Iterators behind the scenes end up having a lot of sharp edges and unpredictable performance.

Brendan Hansknecht (Sep 18 2024 at 20:10):

It has been discussed previously on Zulip, but I honestly don't think it would work out well.

Brendan Hansknecht (Sep 18 2024 at 20:11):

Even doing the reverse (implementing List.map in terms of iterators), I am weary of. I think this is a case where being explicit is best. That said, some simple pattern matching on list pipelines would be reasonable.

Richard Feldman (Sep 18 2024 at 20:41):

regarding length - I actually use size_hint on a regular basis, and .len() when I happen to have an ExactSizeIterator :big_smile:

Richard Feldman (Sep 18 2024 at 20:41):

Brendan Hansknecht (Sep 18 2024 at 20:42):

Brendan Hansknecht (Sep 18 2024 at 20:43):

Like it is just that fusion behind the sense would happen to call List.fromIter which uses size hint, right?

Richard Feldman (Sep 18 2024 at 20:43):

Brendan Hansknecht (Sep 18 2024 at 20:43):

Also, List.fromIter wont be able to call length cause it will have to assume a generic Iterator (which may not have a length)

Richard Feldman (Sep 18 2024 at 20:45):

Brendan Hansknecht (Sep 18 2024 at 20:47):

Also, though I'm weary of fusion behind the scemes, I think it is totally worth testing and seeing if llvm can figure it out.

My gut feeling is that llvm won't inline and merge everything well enough. So it will add overhead (especially given it loses inplace mapping).

Richard Feldman (Sep 18 2024 at 20:50):

Richard Feldman (Sep 18 2024 at 20:51):

because the fusion optimization is literally just detecting pairs of toIter/fromIter and deleting those calls

Richard Feldman (Sep 18 2024 at 20:51):

meaning you end up with the equivalent of Rust's .iter().map().map().map().collect()

Richard Feldman (Sep 18 2024 at 20:51):

Richard Feldman (Sep 18 2024 at 20:52):

so maybe something will be relevantly different, but the precedent of how well it handles Rust's iterators means it seems like the safe default assumption is that it'll do just as well here

Richard Feldman (Sep 18 2024 at 21:11):

I think if we'd just be adding a second type parameter for reversibility it's not worth it

Sam Mohr (Sep 18 2024 at 21:13):

If we're pushing to remove *, then needing to write Iter Str a or Iter Str notUsed every time would be pretty confusing

Brendan Hansknecht (Sep 18 2024 at 21:14):

Sorry, fusion isn't the right word.....I mean the plan to implement List.map and such in terms of iterators

Brendan Hansknecht (Sep 18 2024 at 21:14):

Richard Feldman (Sep 18 2024 at 21:16):

the scenario where we get any benefit out of it is that all of the following have to be true:

Richard Feldman (Sep 18 2024 at 21:17):

like basically we're talking about a super rare scenario where the consequence is that maybe someone has to debug a perf issue that might not even be fixable if going in reverse isn't actually avoidable

Richard Feldman (Sep 18 2024 at 21:18):

ah yeah, that's fair - although if that happens we could do a separate rewrite rule to translate "list into iter into map into list" into a direct call to our current implementation

Richard Feldman (Sep 18 2024 at 21:19):

so then the inner implementation being different would only be observable if you were actually doing multiple operations that could benefit from the fusion optimizations

Richard Feldman (Sep 18 2024 at 21:20):

Brendan Hansknecht (Sep 18 2024 at 21:24):

Hmm. I thought it was going to be internally implemented as an iterator. That is different than the implementation today which is a zig builtin.

Brendan Hansknecht (Sep 18 2024 at 21:27):

List.map = \list, fn ->
    list
    |> List.toIter
    |> Iter.map fn
    |> List.fromIter

Richard Feldman (Sep 18 2024 at 21:35):

yeah so what I'm saying is that we implement it that way, but then after fusion runs, if we still see that exact pattern of calls (to iter, map, from iter) - meaning no fusion occurred and this was a vanilla List.map - then we replace those (to iter, map, from iter) calls with a RunLowLevel which code gens to our current implementation

Richard Feldman (Sep 18 2024 at 21:35):

Brendan Hansknecht (Sep 18 2024 at 21:35):

Richard Feldman (Sep 18 2024 at 21:36):

Brendan Hansknecht (Sep 18 2024 at 21:36):

Yeah, I missed that. I was thinking we were making it depending on llvm inlining

Richard Feldman (Sep 18 2024 at 21:36):

Brendan Hansknecht (Sep 18 2024 at 21:36):

Richard Feldman (Sep 18 2024 at 21:36):

Richard Feldman (Sep 18 2024 at 21:37):

Richard Feldman (Sep 18 2024 at 22:15):

hm, I'm not sure how much of a problem this is, but having a separate Iter.rev function means every Iter will need 2 references to the underlying collection unless we special-case it somehow

Richard Feldman (Sep 18 2024 at 22:15):

Iter.custom :
    # the collection being iterated, or e.g. a range
    source,
    # the length of the collection, if available
    # (some collections may not know length up front)
    [Known U64, Unknown],
    # get the next element in the collection
    (source -> Result (elem, Iter elem) [NoMore])
    -> Iter elem
Iter.custom = \source, lenIfKnown, next -> @Iter {
    lenIfKnown,
    # create a thunk which captures source and next
    # (we can't store them directly without
    # adding a second type parameter to Iter)
    next: \{} ->
        when next source is
            # translate Result into [One …, Done, Skip …]
            # (this difference will come up later)
            Ok (elem, iter) -> One elem iter
            Err NoMore -> Done
}

Iter elem := {
    lenIfKnown : [Known U64, Unknown],
    next : ({} -> [One elem (Iter elem), Skip U64, Done])
}

Richard Feldman (Sep 18 2024 at 22:15):

List.toIter : List elem -> Iter elem
List.toIter = \list ->
    Iter.custom Unknown \{} ->
        when List.splitFirst list is
            Ok (first, rest) ->
                Ok (first, List.toIter rest)
            Err ListWasEmpty -> Err NoMore

Richard Feldman (Sep 18 2024 at 22:18):

Brendan Hansknecht (Sep 18 2024 at 22:46):

Brendan Hansknecht (Sep 18 2024 at 22:48):

Richard Feldman (Sep 19 2024 at 00:40):

Iter.custom :
    # the collection being iterated, or e.g. a range
    source,
    {
        # the length of the collection, if available
        # (some collections may not know length up front)
        lenIfKnown : [Known U64, Unknown],

        # get the next element in the collection
        next: (source -> Result (elem, Iter elem) [NoMore]),

        # get the previous element in the collection
        prev: (source -> Result (elem, Iter elem) [NoMore]),
    }
    -> Iter elem
Iter.custom = \source, { lenIfKnown, next, prev } -> @Iter {
    reverse: Bool.false,
    lenIfKnown,
    # create a thunk which captures source and next
    # (we can't store them directly without
    # adding another type parameter to Iter)
    advance: \reverse ->
        fn = if reverse then prev else next

        when fn source is
            # translate Result into [One …, Done, Skip …]
            # (this difference will come up later)
            Ok (elem, iter) -> One elem iter
            Err NoMore -> Done
}

Richard Feldman (Sep 19 2024 at 00:43):

because if next and prev don't capture anything (which the collection author can make sure they don't, if they want to avoid unnecessary sharing) then this line should be a cmov of just the lambda set discriminant integer:

fn = if reverse then prev else next

Richard Feldman (Sep 19 2024 at 00:43):

Richard Feldman (Sep 19 2024 at 00:44):

Richard Feldman (Sep 19 2024 at 00:47):

Richard Feldman (Sep 19 2024 at 00:48):

Sam Mohr (Sep 19 2024 at 00:55):

For something like a custom fibonacci iterator that can only go forwards, it doesn't seem like this allows the author to prevent consumers from trying to reverse their iterator. Is the plan to still put a type variable, or is this just a case where we want documentation to be written to say "don't use Iter.rev on this type"?

Sam Mohr (Sep 19 2024 at 00:58):

It seems like a consumer of said iterator would just get an Err NoMore in that case, leading to "silent failure", where the iterator would collect to a list with just the initial value, but wouldn't signal to the consumer that an issue occurred

Sam Mohr (Sep 19 2024 at 01:04):

I think toIterRev solves this by only exposing the capacity to iterate backwards for types that support it

Sam Mohr (Sep 19 2024 at 01:05):

Richard Feldman (Sep 19 2024 at 01:26):

in the specific case of something like fibonacci or pi or any other infinite sequence based on an algorithm, I'm absolutely fine with saying "don't use Iter for that, just do what you want to do a different way" :big_smile:

Krzysztof Skowronek (Aug 21 2025 at 20:11):

I looked at the repo, can't see anything on iterators on main branch yet, so maybe it's not too late for some feedback, since it's important enough part of the language that I used the zulip search to get here :)

I work with F# (and C#) a lot, and LINQ, which is basically iterators, are my everyday tool.

File.ReadLines "someFile.txt" // this return an enumerator that yields lines one by one
|> Seq.map _.ToUpper() // this is a shorthand for fun x -> x.ToUpper()
|> File.WriteLines "output.txt"

This will allocate only a single line at a time, and the compiler and GC are smart enough that it's basically ref counted. It will work on files that are 50GB no issue (except you will not max out the read speed, because the whole chain is single threaded)

In Roc, I assume that a function like File.ReadLines would return an iterator that is of length Unknown, and that's great.

But that's the generic case: iterators returned from a list, or array, work in a different way.

I can call `File.ReadLines(input.txt).Reverse()" - if it fits into memory, that's great! Although I am fairly sure that that iterator is actually using seek on the file stream going from end to beginning (just guessing), so it's not an issue.

I can also write a generator for Fibbonaci like Sam mentioned, and call reverse on it - it will use the generic implementation with a buffer. It will allocate all the memory it can, and crash with OOM and then as a programmer I will realize that I am trying to reverse an infinite sequence, and I can now fix it by adding .Take(1_000_000), or whatever.

I really don't see a good reason on why you would want to prevent this on a type level.

What's more, maybe the generic iterator reverse could even take the max buffer size as a param? And for everything else the reverse should be directly on the type?

someList.rev()

vs

someList.iter().rev(1_000) // not providing this is a type error

in worst case you can emit a compiler warning like this rev() call will use the generic buffered approach. Materialize a collection with toList to make this go away or something.

Krzysztof Skowronek (Aug 21 2025 at 20:20):

Another REALLY cool thing about C# and iterators is that all the db related packages also use iterators:

// with Dapper, micro ORM that "just" does mapping from SQL result sets into objects
var products = dbConnection.Query<Product>("select * from products");

var listProducts = [ {...}, {...}];
var dbProducts = dbContext.Products;

var products = listProducts.Where(x => x.Name.Contains("mouse").Select(x => x.Price).Sum();

the above works both with just the list objects, AND with the DbContext. It will be actually translated to SQL:

select sum(price) from products where name like "%mouse%"

Krzysztof Skowronek (Aug 21 2025 at 20:21):

var products = listProducts.Where(x => x.Name.Contains("mouse").Select(x => x.Price)

will return an iterator, it's really pretty cool (although the dbContext.Products is actually an IQueryable instead of IEnumerable)

Krzysztof Skowronek (Aug 21 2025 at 20:24):

in C# they also implemented some cool stuff around this pipeline fusion idea, but that's not a compiler optimization, it's just implementation of different iterators.

For example, when you do .Skip(10).Take(20) for paging, and you are iterating over an array, those two in a row will become effectively array.Slice(10, 30)

Krzysztof Skowronek (Aug 21 2025 at 20:27):

going back to the db queries returning iterators, it's really great because you can do a foreach loop over it, and "process" millions of rows with minimal memory footprint

Richard Feldman (Aug 21 2025 at 21:50):

so without going too far into the weeds right away, I think we'd want a separate type (e.g. Stream) for the type of thing where advancing has effects

Richard Feldman (Aug 21 2025 at 21:51):

Richard Feldman (Aug 21 2025 at 21:52):

we could have a separate for!keyword that calls .next!() where for would call .next()

Krzysztof Skowronek (Aug 22 2025 at 08:00):

let count = File.ReadLines "input"
   |> Seq.map function // function is shorthand for a lambda that does pattern matching
     | "Cat" ->
             Console.WriteLine "Found a cat!"
             "Cat"
     | x -> x
   |> Seq.count

I guess I would need to use something like 'Seq.map!' for that? And then the whole function that contains this would need the bang?

Effectful iterators are very useful, especially if they can clean up after they consumed everything (like close the stream in File.ReadLines). Whether it's a separate type or not

Richard Feldman (Aug 22 2025 at 11:41):

Richard Feldman (Aug 22 2025 at 11:42):

so you could have a map that's just using a pure function to transform the data coming out of a Stream

Richard Feldman (Jun 10 2026 at 17:30):

ok, I learned something about fusion that makes me come to to the surprising conclusion that we shouldn't implement it after all.

conclusion: don't do fusion after all, just encourage .iter() followed by .collect() at the end if you want to do that sort of thing.

Richard Feldman (Jun 10 2026 at 17:41):

if you're curious about the optimization problem, here's a Claude Fable example of why Set would optimize to a bug because of its deduplication (and of course any user could author a collection with a similar problem)

# Why fusion on userspace types would break Set

This example shows how a fusion rule that cancels `to_iter`/`from_iter` round trips
would make a pure function return a different answer when the optimizer runs.

**The program:**

```roc
answer = Set.from_list([1, 2, 3]).map(|n| n % 2).fold(0, I64.plus)
```

**Without the optimization**, it runs exactly as written:

1. `.map(|n| n % 2)` computes `1 % 2`, `2 % 2`, `3 % 2` — that's `1, 0, 1` — and puts
   them into a new Set. A Set throws away duplicates, so the two `1`s become one. The
   new Set is `{0, 1}`.
2. `.fold(0, I64.plus)` adds up that Set's contents: `0 + 1`.

**`answer == 1`**

**With the optimization**: `Set.map` and `Set.fold` are each written as a pipeline
(`Set.map = |s, f| Set.from_iter(Iter.map(Set.to_iter(s), f))`,
`Set.fold = |s, acc, step| Iter.fold(Set.to_iter(s), acc, step)`), so after inlining
both, the whole expression is:

```roc
Iter.fold(Set.to_iter(Set.from_iter(Iter.map(Set.to_iter(s), |n| n % 2))), 0, I64.plus)
```

The fusion rule spots `Set.to_iter(Set.from_iter(...))` — "build a Set, then
immediately iterate it" — and deletes both calls. It would rewrite the program to:

```roc
Iter.fold(Iter.map(Set.to_iter(s), |n| n % 2), 0, I64.plus)
```

That adds up `1, 0, 1` directly — the Set that would have removed the duplicate `1`
no longer exists.

**`answer == 2`**

**Why this happens:** fusion's whole premise is that "build the collection, then
iterate it" is wasted work you can skip. For a list that's true — the list hands back
exactly the items you put in. For a Set it's false — building the Set *removes
duplicates*, so it isn't wasted work, and deleting it changes the answer. The compiler
can't tell these two cases apart: both `from_iter` functions take an `Iter(I64)` and
both `to_iter` functions give back an `Iter(I64)`, so the types look identical.
Whether items survive the round trip depends on what the user's code *does*, which the
optimizer can't know — it would just have to trust it, and for Set that trust is wrong.

Matthieu Pizenberg (Jun 10 2026 at 19:11):

Is there any property that would guaranty safe fusion? And a way to encode such property in the types?

Richard Feldman (Jun 10 2026 at 19:20):

this is apparently why Haskell requires you to use a special language feature to have a type opt into it if you want to do it

Richard Feldman (Jun 10 2026 at 19:21):

kinda like Rust's unsafe - like "hey you are verifying that you don't have bugs or else the optimizer might break purity" - which is just not compatible with Roc's design goals

Jake Brownson (Jun 10 2026 at 19:42):

Set isn't actually a functor right, does it work for things that obey the functor laws in addition to the type signature? Not that that would help really.

Richard Feldman (Jun 10 2026 at 19:44):

Krzysztof Skowronek (Jun 10 2026 at 20:14):

$result = []

for i in list {
     x = i * 2
     if x % 2 { unsafe_append($result, x}
}

and also make the map, filter, fold and so on works on where [ a.iter() : a -> Iter(item)], and calls the iter for you

basically anything that returns anything other than Iter(_) breaks the fusion chain

in principle, I think if there is a materialization in the user space, the compiler should not be able to remove it

Richard Feldman (Jun 10 2026 at 20:17):

Krzysztof Skowronek (Jun 10 2026 at 20:17):

I know that this is probably off-topic, but you can still do a lot about just the iterators

Krzysztof Skowronek (Jun 10 2026 at 20:19):

so fusion was about optimizing away some internal materializations, not about merging iterator operators together?

Richard Feldman (Jun 10 2026 at 20:20):

Richard Feldman (Jun 10 2026 at 20:21):

and we designed ours to end up optimizing the same way theirs do, so LLVM just takes care of that for us

Krzysztof Skowronek (Jun 10 2026 at 20:28):

Krzysztof Skowronek (Jun 10 2026 at 20:30):

is true for some operations? like if stream and unstream are for the same collection type

Krzysztof Skowronek (Jun 10 2026 at 20:31):

it would know for Set.ofList it cannot fuse it, but for list.iter().map...toList() it could

although if everything just works with iterators, and llvm optimizes those into for loops with a single body... maybe implementing the skip that has number of skipped elements is a bigger improvement (and other specialized iterators)

Fabian Schmalzried (Jun 12 2026 at 00:01):

On first glance, I would have expected the wrong result for the set example. :see_no_evil:

Crazy idea: .map always returns an iterator instead of the original type. The 2nd .map is the in this iterator and therefore can easily be optimized.

Krzysztof Skowronek (Jun 12 2026 at 06:17):

I think that's already the case? Those operations are defined on the Iher type, not per collection

Kasper Møller Andersen (Jun 12 2026 at 10:20):

I appreciate the idea of keeping builtin collections on equal footing with custom collections. But I feel like custom collections are already secondary to the builtin ones, so I don’t understand why this crosses a boundary that wasn’t already crossed.

For example, if I want a Dict with a different behavior than the builtin one (different read/write tradeoffs, different distribution of keys, whatever), I basically have to make a new type MyDict and implement it from scratch. Okay, fine. But because there’s no interface/trait that MyDict can implement to make it compatible with Dict, I can’t pass MyDict to any function which requires a Dict. And although you could technically write Roc code that is generic over Dict-like structures with the current facilities in the language, that is fragile and hardly scalable.

This is also my experience from Elm. Even if there are data structures which are more suited to a given problem, you end up not using them, because in the end you to convert them to an Elm List type anyway to use them elsewhere, so the point is lost.

I expect Roc code to be similar, where custom collection types are really niche and probably best avoided. Conversely I expect the builtins to be so fast, that in most cases, it’s fine to fall back to them. Applying fusion only to them thus seems like a sensible choice.

Jonathan (Jun 12 2026 at 11:06):

I haven't really ever used Elm, but it was my understanding that a combination of privileged kernel-implemented data structures, and a lack of parametricity over behaviour is what led to consolidation to the core types (citation needed).

If that's correct, to the second point, you can still write APIs that expect a "key-value collection", regardless of whether that is Dict or MyDict. It is up to the library to decide whether it really needs precisely Dict(k, v) or just

libraryFunc : coll, k, v -> ... where [
    coll.insert : coll, k, v -> coll
    coll.remove : coll, k -> coll
    etc...
]

Your new type MyDict is acceptable in this case as long as it implements the set of dict-like functions required, even though there is no explicit trait. Of course, if the library specifies it needs a Dict, there's nothing you can do but that isn't unique to Roc.

To return to the first point, it is my hope that with the various reference counting and reuse optimisations, data structures implemented in user-land will not lag so far behind those builtin to the compiler or platform, but I'm very out my depth here :sweat_smile:

Jonathan (Jun 12 2026 at 11:17):

If this was written under that understanding though, I'm not sure what the evidence is to suggest that it won't be scalable. I do like the aspect of traits-as-documentation, but I know it's also blamed for a lot of overly complex code, for example being avoided entirely in the Gleam language.

Kasper Møller Andersen (Jun 12 2026 at 11:17):

Your analysis is correct from my point of view, and I know Roc isn’t as tightly bound to its builtins as Elm, but I still foresee generic code just taking Dict, even if it could be more generic. Because if there is no easy way to refer to a common DictLike interface, then the community won’t do so I expect.

Jonathan (Jun 12 2026 at 11:45):

I do share (admittedly with no evidence) the concern that lack of explicit grouping and labeling could lead to poorer discoverability/documentation, and more speculatively that this could lead to fragmentation, but I don't think it is inevitable. For example, it will be difficult for an outlier to interact with the community if they do not share similar API patterns, and so there is still a normalising force. Libraries may also be pressured to accept a wider range of possibilities, rather than applications conforming to only core types.

Anyhow, traits have their negatives; a function may label something DictLike because that is the typical level of granularity for traits, when it only needs insert and not keys, values, remove, get... placing more onus on you at the call site to implement potentially unneeded functions in order to reuse code. That said, some form of labeling is not entirely ruled out either.

Kasper Møller Andersen (Jun 12 2026 at 12:44):

And to be clear, I’m not advocating for interfaces in the language, I just think there will be strong pressure towards using the builtins without some kind of solution like them. I’m personally okay with that :blush:

Jonathan (Jun 12 2026 at 14:43):

No of course, I just thought I'd do my best to demonstrate some of the tradeoffs :grinning_face_with_smiling_eyes:

Brendan Hansknecht (Jun 14 2026 at 07:23):

@Richard Feldman is the issue above just that you need to stay in iter land to get fusion. Like if you do Set.from_iter(some_set.to_iter().map(...).map(...)) you should be able to fuse everything, right?

The issue with your initial example from Claude fable is that you don't stay in the world of iters. You keep returning to the world of sets (which applies extra constraints).

List may be a special type where its from other method has no effects and this can be fused, but that is not generally guaranteed.

Seems like you would still want iter fusion in my view. Just not in the generic from that merges from and into. At least not without guaranteeing to element changes. Technically you could fold set to and from iter if you essentially inserted iter.unique() between all calls. Not sure if that is derivable though

Richard Feldman (Jun 14 2026 at 12:56):

the issue is that the optimization depends on it being true that converting a collection from an Iter to the collection, and then back to an Iter again, is a no-op that can safely be optimized away without changing the program's behavior, but it turns out that's not true in the general case (and Set is an example of that)

Lukas Juhrich (Jun 14 2026 at 14:37):

I'm not sure I understand the reasoning here. I thought it was very common for interfaces (for lack of a better word) which user-defined types implement to imply certain properties/laws whose validity the compiler cannot decide. Like commutativity of the implemented equality, transitivity of an implemented .lt(), and so forth. if I violate that, a sorting or maximum finding algorithm might just not work correctly.

So why not allow the user to mark a from_iter method as [inverted_by_to_iter] or something similar and have the optimizer allow rewriting relevant occurrences to id if this attribute is present?

Sorry if this is a stupid question but I'm genuinely curious about the intricacies and design tradeoffs here.

Richard Feldman (Jun 14 2026 at 15:01):

in Roc, changing optimization levels must never change program behavior - otherwise, they can't be relied upon.

if my code passes tests in the (fast) dev build, and if works when I try it locally in a dev build, it's a huge time saver if I don't need to bother redoing all that testing with release builds because I can be confident that a release build won't change my program's behavior, it'll just make the same logic run faster.

Brendan Hansknecht (Jun 14 2026 at 15:02):

That is true for merging from_iter and into_iter, but if you have an iter and call map on it twice, that should be fine, right?

Brendan Hansknecht (Jun 14 2026 at 15:02):

Richard Feldman (Jun 14 2026 at 15:02):

Brendan Hansknecht (Jun 14 2026 at 15:02):

Brendan Hansknecht (Jun 14 2026 at 15:03):

I guess it depends if each iter.map loops the whole collection or if it is a pull based model

Brendan Hansknecht (Jun 14 2026 at 15:03):

Brendan Hansknecht (Jun 14 2026 at 15:04):

Richard Feldman (Jun 14 2026 at 15:04):

Richard Feldman (Jun 14 2026 at 15:05):

so if you want to do two maps in a row, you should just use iter like you always have to in Rust

Brendan Hansknecht (Jun 14 2026 at 15:05):

may be slower than:
my_set.into_iter().map(...).map(...).into_set()
or if needed:
my_set.into_iter().map(...).unique().map(...).into_set()

Richard Feldman (Jun 14 2026 at 15:07):

well that's the thing - in the general case, collection.map(...).map(...) should be slower than collection.iter().map(...).map(...).collect()

Richard Feldman (Jun 14 2026 at 15:08):

however, if everyone knows that culturally, then at least you know the fix (and we could consider a lint/warning for it)

Richard Feldman (Jun 14 2026 at 15:09):

the problem is that for Set (and similar collections where converting to/from iter can change the collection), collection.map(...).map(...) is still slower than collection.iter().map(...).map(...).collect() but it also potentially gives a different answer

Richard Feldman (Jun 14 2026 at 15:09):

so if you change it explicitly, and get different answers, it's good that you'll get them in both dev and release builds, so if that causes a bug, you'll find it in both dev and release builds

Brendan Hansknecht (Jun 14 2026 at 15:11):

Brendan Hansknecht (Jun 14 2026 at 15:14):

Would be nice if we could somehow guarantee a correct fold that still fits into an iterator. Like for set, folding into .unique() and list folding to nothing... but I don't see how that could safely be generically done.

Stream: ideas

Topic: Iterators and fusion proposal

Richard Feldman (Sep 18 2024 at 01:54):

Kilian Vounckx (Sep 18 2024 at 12:30):

Richard Feldman (Sep 18 2024 at 12:53):

Brendan Hansknecht (Sep 18 2024 at 13:07):

Richard Feldman (Sep 18 2024 at 14:59):

Brendan Hansknecht (Sep 18 2024 at 15:07):

Pearce Keesling (Sep 18 2024 at 15:30):

Richard Feldman (Sep 18 2024 at 16:04):

Richard Feldman (Sep 18 2024 at 16:05):

Richard Feldman (Sep 18 2024 at 16:05):

Richard Feldman (Sep 18 2024 at 16:06):

Sam Mohr (Sep 18 2024 at 16:06):

Richard Feldman (Sep 18 2024 at 16:07):

Richard Feldman (Sep 18 2024 at 16:07):

Sam Mohr (Sep 18 2024 at 16:08):

Sam Mohr (Sep 18 2024 at 16:09):

Sam Mohr (Sep 18 2024 at 16:09):

Brendan Hansknecht (Sep 18 2024 at 16:38):

Brendan Hansknecht (Sep 18 2024 at 16:39):

Brendan Hansknecht (Sep 18 2024 at 16:41):

Brendan Hansknecht (Sep 18 2024 at 16:43):

Brendan Hansknecht (Sep 18 2024 at 16:52):

Brendan Hansknecht (Sep 18 2024 at 17:04):

Richard Feldman (Sep 18 2024 at 17:04):

Richard Feldman (Sep 18 2024 at 17:04):

Richard Feldman (Sep 18 2024 at 17:04):

Brendan Hansknecht (Sep 18 2024 at 17:04):

Richard Feldman (Sep 18 2024 at 17:05):

Brendan Hansknecht (Sep 18 2024 at 17:05):

Richard Feldman (Sep 18 2024 at 17:06):

Brendan Hansknecht (Sep 18 2024 at 17:07):

Brendan Hansknecht (Sep 18 2024 at 17:08):

Brendan Hansknecht (Sep 18 2024 at 17:16):

Richard Feldman (Sep 18 2024 at 17:21):

Richard Feldman (Sep 18 2024 at 17:21):

Richard Feldman (Sep 18 2024 at 17:21):

Richard Feldman (Sep 18 2024 at 17:22):

Brendan Hansknecht (Sep 18 2024 at 17:25):

Brendan Hansknecht (Sep 18 2024 at 17:27):

Agus Zubiaga (Sep 18 2024 at 17:27):

Richard Feldman (Sep 18 2024 at 17:28):

Agus Zubiaga (Sep 18 2024 at 17:28):

Agus Zubiaga (Sep 18 2024 at 17:29):

Richard Feldman (Sep 18 2024 at 17:30):

Brendan Hansknecht (Sep 18 2024 at 17:31):

Agus Zubiaga (Sep 18 2024 at 17:31):

Richard Feldman (Sep 18 2024 at 17:33):

Richard Feldman (Sep 18 2024 at 17:33):

Agus Zubiaga (Sep 18 2024 at 17:34):

Richard Feldman (Sep 18 2024 at 17:34):

Richard Feldman (Sep 18 2024 at 17:34):

Richard Feldman (Sep 18 2024 at 17:35):

Brendan Hansknecht (Sep 18 2024 at 17:37):

Agus Zubiaga (Sep 18 2024 at 17:39):

Agus Zubiaga (Sep 18 2024 at 17:39):

Brendan Hansknecht (Sep 18 2024 at 17:39):

Brendan Hansknecht (Sep 18 2024 at 17:39):

Richard Feldman (Sep 18 2024 at 17:40):

Brendan Hansknecht (Sep 18 2024 at 17:41):

Agus Zubiaga (Sep 18 2024 at 17:41):

Agus Zubiaga (Sep 18 2024 at 17:41):

Agus Zubiaga (Sep 18 2024 at 17:42):

Richard Feldman (Sep 18 2024 at 17:42):

Richard Feldman (Sep 18 2024 at 17:42):

Agus Zubiaga (Sep 18 2024 at 17:50):

Agus Zubiaga (Sep 18 2024 at 17:51):

Agus Zubiaga (Sep 18 2024 at 18:03):

Agus Zubiaga (Sep 18 2024 at 18:04):

Brendan Hansknecht (Sep 18 2024 at 18:12):

Agus Zubiaga (Sep 18 2024 at 18:13):

Brendan Hansknecht (Sep 18 2024 at 18:20):

Agus Zubiaga (Sep 18 2024 at 18:22):

Agus Zubiaga (Sep 18 2024 at 18:25):

Brendan Hansknecht (Sep 18 2024 at 18:25):

Agus Zubiaga (Sep 18 2024 at 18:26):

Agus Zubiaga (Sep 18 2024 at 18:27):

Agus Zubiaga (Sep 18 2024 at 18:27):

Brendan Hansknecht (Sep 18 2024 at 18:29):