regarding this:
Richard Feldman said:
walkRangeisn't necessary anymore thanks to seamless slices, and once we have deforestation/fusion we can replace the others with just|> List.reversefor backwards and|> List.withIndexfor "with index"
I wonder if we could get a sort of "80-20" ergonomics win without being blocked on full deforestation, by just having canonicalization detect when we're doing List.reverse (List.walk ...) and compile that to a behind-the-scenes walkBackwards implementation, and a few other similar things like that
the idea being that if we had a hardcoded list of these basic optimizations, we could:
this would be really straightforward to implement, very similar to how we desugar operators like |> and == except "desugaring" specific nested function calls (after desugaring |> so it would work with or without pipeline style) instead of operators
thoughts?
(cc @Folkert de Vries and @Ayaz Hafiz since I know we've talked about deforestation in the past!)
I think relying on automatic deforestation to elide potentially enormous/infinite lists seems very brittle and introduces a strong element of "for me but not for thee" - users' data structures will never be able to experience similar benefits. An iterator interface could outright replace many of the numerous list functions and do the same in a uniform way for other data structures as well. It would give users explicit control of when they made intermediate lists and when they didn't. And it would allow function fusion now, not in some hypothetical future where deforestation covers the particular use case of interest.
In this concrete case, one could use fromListReversed instead of List.reverse to achieve the same effect; if one wanted some more complicated indexing operation (every other element, say) one could simply map a list lookup over a stream of indices.
I guess it is always a question of when explicit functions make sense vs iterators vs just lists.
Iterators if not seamless split the ecosystem.
That said, obviously iterators can have huge memory wins (though can also have performance losses).
I think infinite iterators are not really the best talking point though. Infinite iterators tend to be a design choice rather than a design constraint. There is always an alternative view.
yeah I'm very skeptical that iterators are the right API design for Roc's stdlib
long-term I mean
(or short-term for that matter I guess)
I think it results in the ecosystem having much more straightforward types if there's instead a convention that collections expose a way to convert the collection to List, and then the compiler knows what to do from there
otherwise you end up with API design questions like whether List.concat should take an iterator as its second argument, except multiplied across the entire ecosystem - plus, anytime someone makes a library which doesn't do that, now they're getting issues opened on the repo asking to make N public functions more flexible for performance
as opposed to today, where the type of List.concat is straightforward, the API design choice is obvious (second argument should be a List), and there's only one, obvious way to use it (convert whatever you've got to a List if it isn't already one)
put another way, if "convert to List" can be optimized into the behind-the-scenes equivalent of an iterator, I think that's the best long-term design :big_smile:
and if we're going to enable something nicer than the status quo in the short term, I think we should enable the thing we want to build towards anyway!
as an example, we could expose this function:
List.fromIter : state, (state -> [Continue (state, elem), Done elem]) -> List elem
at that point, we sort of have "List is the iterator"
in the sense that deforestation of List.fromIter on some custom collection, followed by other List operations, could theoretically fuse in the same way as if we had an iterator type
:thinking: an interesting consideration about that: you could use it to define an infinite iterator, and then call fromIter to create an "infinite List" with it - which normally wouldn't terminate, but then as long as the next operation was something that only grabbed a few elements from the list, after deforestation it wouldn't actually materialize the list
which in turn would mean that code would only work if the optimization succeeded, and otherwise it would hang, which in turn would mean if the optimization didn't work in dev builds, it would potentially work fine in optimized builds but hang in dev builds
and it's generally already true that some code will work in optimized builds but fail in dev builds, but the failure in a dev build looks more like "it's too slow" or "it overflowed the stack," not "the program didn't terminate"
what's further concerning about that is that even if we had the optimization enabled in dev builds, whether or not it successfully applied could depend on inlining, which in turn would suggest avoiding that problem would require the exact same amount of inlining in dev builds as optimized builds
Kinda unrelated, but kinda intriguing to think about. The idea of simd iterators. So it is iterators, but they handle a simd chunk of elements at a time. Doesn't make much sense for default iterators, but might actually make sense for list deforestation where we can imagine that the full list was generated, but as an optmization, we just run a simd chunk of the iterator at a time.
Richard Feldman said:
what's further concerning about that is that even if we had the optimization enabled in dev builds, whether or not it successfully applied could depend on inlining, which in turn would suggest avoiding that problem would require the exact same amount of inlining in dev builds as optimized builds
This is exactly what I mean by "brittle" - even without the dev/optimized distinction, writing code that will only work if certain optimizations are applied is very questionable unless the optimization is a guarantee of the language (for example tail call optimization in Scheme, or laziness in Haskell). It is bad enough to know that I can't pull out code from my function into a second function because it breaks roc's tail recursion optimization, but it is worse if whether that happens depends on compiler internals (and there are at least two different compilers).
More, allowing users to choose which optimizations should be applied to their code is strictly more powerful than expecting the compiler to determine which optimizations are beneficial and which are not. I know there are research papers where the compiler considers all possible combinations of optimizations and tries to determine which combination is the most efficient (called something like "saturation" I'm afraid I don't remember) but how is the compiler to evaluate what the performance tradeoffs will be for an arbitrary piece of code?
do you think there's a design that wouldn't have that problem?
like for example, what would be the types of List.map, List.keepIf, and List.concat (making up syntax if it would require things that don't exist today) be, such that you could combine them in various ways without creating intermediary lists?
Brendan Hansknecht said:
Kinda unrelated, but kinda intriguing to think about. The idea of simd iterators. So it is iterators, but they handle a simd chunk of elements at a time. Doesn't make much sense for default iterators, but might actually make sense for list deforestation where we can imagine that the full list was generated, but as an optmization, we just run a simd chunk of the iterator at a time.
For iterators of the style I implemented, following the research paper, SIMD iterators could just happen - because of the inclusion of the "Skip" state, all the step logic is very simple and loop-free. So an optimizing compiler will produce one loop (driven from toList) combining multiple functions, probably inlined. An aggressive optimizing compiler that knows the types are small enough should already apply SIMD optimizations to that kind of loop, if they are possible.
Richard Feldman said:
like for example, what would be the types of
List.map,List.keepIf, andList.concat(making up syntax if it would require things that don't exist today) be, such that you could combine them in various ways without creating intermediary lists?
Why should they exist? I'd rather use Iterator.map, whether it's a list or a dict or whatever. Iterators concatenate fine too (both Iterator.concat and Iterator.concatMap). The underlying loop will be the same. I guess List.concat could be more efficiently written as Iterator.toListAppend (which should exist even if just to accommodate List.withCapacity).
If you want special faster versions for lists of the few functions that can be faster when implemented directly on lists (dropIf when dropping is unlikely maybe?), something like an Iterator ability that had default implementations could do this. The rust iterators provide an example of this - you could provide a special implementation of dropIf specifically for iterating over a list, or for some other data type you might just provide (say) an implementation of Iterator.fromStateTransitionFunction and call it a day; then dropIf on this kind of iterator would just use the generic iterator implementation. You could also allow iterators to provide minLength and maxLength if known, as rust does, to automate withCapacity usage.
One could even provide a canonical iterator construction pattern: Dict.keys, Dict. values, Dict.items, List.keys, List.values, List.items, ... I guess sets don't have keys/items versions, or maybe only have keys versions. There would also be non-canonical iterator constructions like SomeTree.depthFirst, SomeTree.breadthFirst, KdTree.byDistanceFrom, ...Multidimensional arrays would also have a variety of natural iterators (all elements, all n-1-dimensional arrays along some axis, index and with-index versions of the above possibly). These canonical ones would give rise to patterns like iterating over keys or key/value pairs, dropping most, and applying the rest as changes to the original structure - which work for dicts as well as lists.
I haven't quite thought it through but it might be possible to make List implement an ability that let you run Iterator functions on it directly? so instead of thing |> List.values |> Iterator.map f |> Iterator.toListAppend (List.withCapacity (List.len thing)) you could drop the List.values? You'd still want List.items/List.enumerate for when you needed the indices too but it could be a shortcut for extracting the "obvious" iterator. If iterators (and map in particular) implemented (estimated) length reporting when available then you could just use Iterator.toList.
ok so let's say everyone calls Iterator.map instead of List.map and Iterator.keepIf instead of List.keepIf - how is it any less brittle if I call map followed by keepIf (and expect that to be optimized into no intermediate data structure) if I use Iterator vs List?
either way I rely on the optimization to get acceptable performance! :big_smile:
(and if we can say "well iterators are guaranteed to get the optimization" then we could just say the same of lists)
either way, the problem is the same as before, regarding whether or not dev backends do the optimization
Richard Feldman said:
ok so let's say everyone calls
Iterator.mapinstead ofList.mapandIterator.keepIfinstead ofList.keepIf- how is it any less brittle if I callmapfollowed bykeepIf(and expect that to be optimized into no intermediate data structure) if I useIteratorvsList?
Er. Iterators don't produce intermediate data structures? Even with nothing but tail recursion optimization, thing |> Iterator.fromList |> Iterator.map f |> Iterator.keepIf g |> Iterator.toList never produces any intermediate structures beyond the input and outputs for f and g (which may well live on the stack)? And it's one loop, implemented with tail recursion in Iterator.toList? So this is about as efficient as it's possible for an unoptimized program to be. And the optimizations would be inlining f and g and the state transition functions, and removing the tagging/untagging in the state transition functions. Maybe I don't understand what you mean by intermediate structures?
If you do it with lists and the deforestation doesn't happen, you create an intermediate list or two the size of the input and output lists (or much larger if map does something like produce a string and keepIf drops most of them; regardless they go on the heap). With iterators, under no circumstances is there an intermediate object of more than constant size. More, memory locality is excellent: the sequence of computations is "Load from thing, apply f, apply g, save to output list", possibly with some tagging/untagging at each step. The advantage for longer chains of iterators, or if your result isn't a list (any/all for example) is even stronger.
My normal example here would be processing a large text file line by line, but I don't know how to make iterators interact with Tasks, so I can't speak to that one.
I suppose one could speculatively keep the results of an iterator around in case someone else ran the same iterator looking for the same elements (for example if one wrote the infinite iterator "primes" and made it globally available), but this is a question of whether to memoize functions in a pure language, and here the optimization would be if the compiler decided to keep the intermediate results around.
Maybe I don't understand what you're asking?
I mean if you do thing |> Iterator.map f |> Iterator.toList |> Iterator.fromList |> Iterator.keepIf g |> Iterator.toList then sure you get an intermediate list and separate loops unless the optimizer removes the toList/fromList, but why would you write that? Unless you for some reason knew that that would be more efficient? (I can't see it in this case, but the user can materialize a list if at any point they want one.)
oh I see, so you're seeing Iterator as a wrapper around a function that produces elements, rather than e.g. an ability that would be implemented on concrete collection types like List
so then instead of List.map, people would always have to call Iterator.fromList followed by Iterator.map, is that right?
Richard Feldman said:
so then instead of
List.map, people would always have to callIterator.fromListfollowed byIterator.map, is that right?
Yes, unless one created a shortcut that allowed Lists in an Iterator context to produce the "obvious" iterator (as opposed to a reversed one or one that attached indices).
Iterator tools would be the primary way to, um, iterate through a data structure, whether list, dict, set, or user-created one.
:thinking: what would that shortcut look like in practice?
If all one wanted was to apply map and then have a list again, it would be more verbose; but if one wanted a more complicated operation than map, expressed as a sequence of operations, one wouldn't bother converting back to list (or dict or whatever) until the end.
maybe a good specific example to ask: right now on roc-lang.org the first code sample is this:
list = List.map songs \song ->
"Artist: $(song.artist)"
what would that change to in this idea?
If iterators were more ubiquitous one might choose a shorter module name than Iterator :-P or just import "map" unqualified.
sure, like Seq in Clojure
but what would that code snippet look like?
So, no shortcut: list = songs |> List.values |> Iterator.map (\song -> "Artist: $(song.artist)" |> List.fromIterator.
The shortcut I'm not quite sure as I don't have much experience with abilities. I suppose there would be an Iterable ability that gave a standard way to get the obvious iterator out of types that implemented it, but if that needed to be called explicitly that wouldn't help much. Python has a pattern like this, where some objects are Iterable, which allows (for example) for loops to extract an Iterator (an object with state for traversing the original object) and then walk through it. I guess there's not really a way to detect when a List is being treated like an Iterator.
If iterators were ubiquitous, though, one might not bother with the fromIterator, depending what one wanted to do with it afterward.
maybe, but all of this is ostensibly in the name of improving ergonomics compared to like walkWithIndex and such
comparing those two snippets, I don't think this is an ergonomics improvement compared to status quo :big_smile:
I'm not sure what the "shortcut" would do to that though - what would the shortcut snippet look like?
Ergonomics and performance.
For ergonomics, I'd rather keep orthogonal tools in mind than their product: List.values, List.valuesReversed, List.items, Dict.keys, Dict.values, Dict.items / Iterator.map, Iterator.walk, Iterator.walkUntil / List.fromIterator, Dict.fromIterator, Dict.setFromIterator, List.setFromIterator, ... not to mention user-defined data structures.
This seems like a situation where the tools that are nicest for the simple cases explode once you get to more complicated cases, while there are tools that cope just fine with complexity. And there has been virtually no work put into making Iterators tidy for simple cases.
One way to keep the simple case simple would be to provide List.map but not List.walk or any of the more complicated variants; List.map would then be a shortcut for the above conversion to an iterator, map, followed by conversion from an iterator. This could reduce the List functions to those that would be needed in simple cases and those that can gain efficiency by knowing that the input and output are lists. They could be given names identical to the Iterator versions, to reduce memory load. Any operation that was complicated anyway might as well use the iterator versions and gain efficiency and uniformity across types.
(I'm assuming that comprehension syntax is off the table.)
I think what we have today is better than that version in both ergonomics and explicit performance:
List.mapList.walkBackwardsWithIndex (or whatever), it's maximally explicit what you're doing, and the performance is as good or better than what you'd get with iteratorswhat would a comprehension syntax look like?
Richard Feldman said:
I think what we have today is better than that version in both ergonomics and explicit performance:
- it's more concise and straightforward in the common cases like
List.map- in the uncommon cases where you actually do want
List.walkBackwardsWithIndex(or whatever), it's maximally explicit what you're doing, and the performance is as good or better than what you'd get with iterators
What about the cases where one wants an operation that is best expressed as a sequence of operations? map then keepIf, say, or a longer chain? Here explicit lists take a serious hit.
I also don't like the ergonomics of keeping a combinatorial explosion of List (and dict and set) functions around to handle all the possible forward/reverse, state/no state, Maybe/raw, stopping/no stopping combinations.
Richard Feldman said:
what would a comprehension syntax look like?
In roc? I'm not sure exactly. The usual syntax for list comprehension in other languages writes the comprehension in square brackets, with an expression combined with one or more "for variable in thing" and an "if" clause. Dictionary and set comprehensions would be written using something similarly analogous to their literal syntax; generator comprehensions would require some inventiveness since they have no literal syntax. I don't know if this could be made to work:
list = ["Artist: $(song.artist)" for song in songs]
Anne Archibald said:
Richard Feldman said:
I think what we have today is better than that version in both ergonomics and explicit performance:
- it's more concise and straightforward in the common cases like
List.map- in the uncommon cases where you actually do want
List.walkBackwardsWithIndex(or whatever), it's maximally explicit what you're doing, and the performance is as good or better than what you'd get with iteratorsWhat about the cases where one wants an operation that is best expressed as a sequence of operations? map then keepIf, say, or a longer chain? Here explicit lists take a serious hit.
in practice, these sequences tend to be very short (e.g. map followed by keepIf is a common one), which makes the added conversion to/from an iterator proportionately more of an ergonomics downside
I also don't like the ergonomics of keeping a combinatorial explosion of List (and dict and set) functions around to handle all the possible forward/reverse, state/no state, Maybe/raw, stopping/no stopping combinations.
same here! (Hence the original idea in this thread.) I just think the "convert to/from iterators all the time" design would be a step in the wrong direction ergonomically, even compared to what we have today
regarding the list comprehension, I'm not sure what problem that solves :thinking:
like as written it's syntax sugar for List.map, but the scenario in question is where you want to write a pipeline of multiple operations that iterate, without getting intermediate data structures, and I'm not sure how comprehensions could help with that
I guess with if involved in the syntax it would help in the very specific case of List.map and List.keepIf back to back, but the original idea of special-casing those to always get optimized seems like the same level of special-casing without introducing new syntax and style :big_smile:
maybe another way to say what I like about the original idea is that:
List.map, List.keepIf, List.reverse, and List.withIndex, and writing them in a pipelineList API by removing walkBackwardsWithIndexUntil and such For iterators of the style I implemented, following the research paper, SIMD iterators could just happen - because of the inclusion of the "Skip" state, all the step logic is very simple and loop-free. So an optimizing compiler will produce one loop (driven from toList) combining multiple functions, probably inlined. An aggressive optimizing compiler that knows the types are small enough should already apply SIMD optimizations to that kind of loop, if they are possible.
Optimizing compilers are still significantly less likely to produce any sort of SIMD with iterators compared to regular list operations. Fundamentally, optimizing compilers like llvm are actually really bad at creating SIMD loops unless everything is setup perfectly. One of the most common ways to stop an optmizing compiler from creating a SIMD loop is to make the loop too long/complicate. This is exactly what iterators do. As such, iterators are much less likely to get any sort of SIMD optimization even if it is theoretically possible.
They still might be way faster in a number of situation due to avoiding intermediate allocation, but with resuse during operations like map, explicit lists can be faster.
Here is a simple example:
Code is something like
vals
|> List.map \v -> v + 7
|> List.map \v -> if v < 10 then v - 3 else v
This is what iterators would end up generating if inlined
for v in vals {
*v += 1;
if *v < 10 {
*v -= 3;
}
}
This is what regular list ops would generate:
for v in &mut *vals {
*v += 1;
}
for v in &mut *vals {
if *v < 10 {
*v -= 3;
}
}
In this case, the list version is probably better. It will optimize the first loop to use simd. The second loop will not use simd because it is too complex. With iteraters, no simd will be used at all.
Asm: https://rust.godbolt.org/z/c3Psjoa9T
Also, I am really sad that llvm can't even turn this into simd:
for v in &mut *vals {
let n = if *v < 10 {3} else {0};
*v -= n;
}
I would stick with the absolute most simple API for now, which is automatic list fusion as an optimization in the compiler. The vast majority of operations across all users are likely to be over List. If this is insufficient, you can always implement iterators yourself as a library. If that is also insufficient, you can implement the operation you desire as a for loop using recursion.
If all of that is insufficient, the idea of iterators in the standard library can be revisited, but I think Roc usage is far from seeing this problem right now
Anyway, I think my last post is a side track from the main point.
I think we should really anchor back to a few fundamentals:
List.map and related calls that are creating temporary lists (killing either performance or memory usage). In fact, that should be visible with static compiler analysis.An interesting note, rust has overall done a decent job at avoiding 3. I think this is because rust has so so many container types that have different low level tradeoffs from Vec. As such, many libraries just use iterators. That said, I have seen plenty of code that assumes a specific container and won't take an iterator. I don't think Roc would be this lucky. We have List. There will be no other low level alternative to List that people will be picking between. As such, the split would be much clearer and more stark.
Lastly, I think it is really important to realize that it is generally easy for an end user to write a wrapping function to avoid a lot of the pain points of intermediate allocations. If a user sees a group of list operations that lead to multiple costly intermediate allocations, they can take the group as a whole and do them in a single function. This function would be equivalent to what iterators would generate, but written out explicitly. As such, even today, an end user can get identical results to iterators if they need that performance lever.
Richard Feldman said:
regarding the list comprehension, I'm not sure what problem that solves :thinking:
List comprehensions can serve as syntax sugar for a combination of map, concatMap, and keepIf, eliding explicit calls to those, intermediate construction of lists, as well as the need for an explicit lambda. They really are primarily syntax sugar, but the desugaring can produce functions that express complicated operations without intermediate lists.
Last updated: Jun 16 2026 at 16:19 UTC