Fusion without Iter · ideas · Zulip Chat Archive

Stream: ideas

Topic: Fusion without Iter

Ayaz Hafiz (Sep 19 2024 at 01:14):

Cool proposal. Glad this is being discussed.

One thought I have is I think this will actually increase compile times much more than it may seem, and the optimization is not guaranteed. This optimization problem is exactly the kind that typically requires a lot of compute work. The reason I say this is because at its core, the optimization relies on the user writing a lot of IR with intermediate calls, and having some batch optimization later on aggregate and cancel out the intermediates.

I wonder if there's a way to achieve the goals here without generating a ton of intermediate code that then needs to be thrown away. One idea would be to treat List as the common denominator, and have an analysis that tracks direct calls to List.* functions in a sequence. Then, if there is a region of code that only contains sequenced calls to List.* functions, you know you can inline those directly, and specialize sequenced List.map/etc. without having to traverse big chains of code that you need to eliminate or consider inlining - the directive for what to do is defined explicitly by the analysis.

Ayaz Hafiz (Sep 19 2024 at 01:20):

(the reason this is so significant is this is exactly why LLVM or other optimizers can be pathologically slow - if you feed them a ton of code to be optimized away, they will do so but take a very long time. if it's possible to feed optimal code earlier on, you win compile time)

Richard Feldman (Sep 19 2024 at 01:28):

Ayaz Hafiz said:

One thought I have is I think this will actually increase compile times much more than it may seem, and the optimization is not guaranteed. This optimization problem is exactly the kind that typically requires a lot of compute work. The reason I say this is because at its core, the optimization relies on the user writing a lot of IR with intermediate calls, and having some batch optimization later on aggregate and cancel out the intermediates.

just to check my understanding - is the extra IR from intermediate calls from people writing |> List.map fn1 |> List.map fn2 where otherwise they might have just combined them into one map call?

Richard Feldman (Sep 19 2024 at 01:29):

or the fact that each of those map calls would expand to 3 calls to Iter.fromList, Iter.map, and then Iter.toList? (whereas today they'd just be one call)

Richard Feldman (Sep 19 2024 at 01:30):

or I guess both, since either compounds the other

Richard Feldman (Sep 19 2024 at 01:31):

Ayaz Hafiz said:

I wonder if there's a way to achieve the goals here without generating a ton of intermediate code that then needs to be thrown away. One idea would be to treat List as the common denominator, and have an analysis that tracks direct calls to List.* functions in a sequence. Then, if there is a region of code that only contains sequenced calls to List.* functions, you know you can inline those directly, and specialize sequenced List.map/etc. without having to traverse big chains of code that you need to eliminate or consider inlining - the directive for what to do is defined explicitly by the analysis.

yeah I definitely think this is a situation where there would be an extreme power law distribution in play

Richard Feldman (Sep 19 2024 at 01:31):

like I would not be at all surprised if in practice it turned out to be 95% of the time it's specifically a List higher-order function in a pipeline with 1+ other List higher order functions

Richard Feldman (Sep 19 2024 at 01:32):

and "everything else" accounts for 5% of the scenarios where extra work is done compared to today

Richard Feldman (Sep 19 2024 at 01:32):

and yeah, those would be straightforward to detect prior to inlining even

Ayaz Hafiz (Sep 19 2024 at 02:00):

or the fact that each of those map calls would expand to 3 calls to Iter.fromList, Iter.map, and then Iter.toList? (whereas today they'd just be one call)

yes, this. and the fact that is can be deeply nested, hence making the inline analysis hard

Ayaz Hafiz (Sep 19 2024 at 02:02):

yeah I definitely think this is a situation where there would be an extreme power law distribution in play

yes, this is why it might make sense to do a simpler heuristic based just on list. I think in practice you're going to want to convert to a list-like structure, or represent your data structure as a list under the hood anyway.

Richard Feldman (Sep 19 2024 at 02:23):

as in don't do optimization on Iter at all?

Richard Feldman (Sep 19 2024 at 02:24):

or as in like don't implement data structure operations in terms of Iter

Ayaz Hafiz (Sep 19 2024 at 03:51):

use List as the basic unit for supporting fusion rather than Iter

Brendan Hansknecht (Sep 19 2024 at 04:17):

We've discussed fusion a lot in the past, personally, I think it is likely to be much more finicky than iterators. I think this is something that likely needs to be explicit. An iterator over a range of numbers needs to never allocate a full list.

Brendan Hansknecht (Sep 19 2024 at 04:17):

Using explicit iterators (even if nothing inlines) avoids the intermediate allocations

Brendan Hansknecht (Sep 19 2024 at 04:18):

I do also agree that it will make a crazy chain of lambdas that llvm will have to deal with unwinding and inlining.

Sam Mohr (Sep 19 2024 at 05:24):

Maybe we could make iterators easier to start and stop, that might make it just as desirable to use, which seems to be a solution here

Sam Mohr (Sep 19 2024 at 05:24):

(more keywords, basically)

Sam Mohr (Sep 19 2024 at 05:24):

I'm AFK at the moment, I can brainstorm a bit at home

Sam Mohr (Sep 19 2024 at 05:25):

If we were able to make iterators just as easy to use in 90% or more of situations, we'd be happy to use those, right?

Sam Mohr (Sep 19 2024 at 05:26):

Basically, if we could come up with some syntax to start and collect iterators, that seems to be a solution here

Sam Mohr (Sep 19 2024 at 05:26):

At which point, we could radically remove builtins like List.map and List.filter

Sam Mohr (Sep 19 2024 at 05:38):

It feels like a stupid question, but could someone tell me why the reason we want fusion is anything beyond "iterators are more effort to use"?

Sam Mohr (Sep 19 2024 at 05:39):

Considering we have size hints meaning we get the same allocations and all

Sam Mohr (Sep 19 2024 at 10:15):

I'm messing with the best I can do with some magic iter <list> keyword. It will take some brainpower, but again, if the limit really is ergonomics, that seems like a much more reasonable battle to fight than manual inlining and fusion.

Sam Mohr (Sep 19 2024 at 10:52):

Current thought is something like this, using a new iter block:

iter items as List
    Iter.map \x -> x + 2
    |> Iter.keepIf \x -> x % 2 == 0

# desugars to

items
    |> List.toIter
    |> Iter.map \x -> x + 2
    |> Iter.keepIf \x -> x % 2 == 0
    |> List.fromIter

for collecting to a different type

iter items as List to Personal.Dict
    Iter.map \x -> x + 2

# desugars to

items
    |> List.toIter
    |> Iter.map \x -> x + 2
    |> Personal.Dict.fromIter

It's a line longer for the standard case, but it's plainly readable and seems to sidestep the need for alternatives because we would not support List.map, just List.toIter and List.fromIter

Notification Bot (Sep 19 2024 at 11:22):

28 messages were moved here from #ideas > Iterators and fusion proposal by Richard Feldman.

Richard Feldman (Sep 19 2024 at 11:23):

Ayaz Hafiz said:

use List as the basic unit for supporting fusion rather than Iter

yeah, I was really into that idea for awhile, but it turned out to have a number of problems.

Richard Feldman (Sep 19 2024 at 11:24):

one category of problems was around custom data structures

Richard Feldman (Sep 19 2024 at 11:50):

for example, if I make a Tree data structure and I give it a toList, I'm not sure how fusion could work on tree |> Tree.toList |> List.map fn1 |> List.map fn2 etc

Richard Feldman (Sep 19 2024 at 11:51):

because converting the tree to a List isn't the same as converting it to the Iter data structure (which is set up to be able to do these transformations) - and I think it would be really hard to detect that and make it work

Richard Feldman (Sep 19 2024 at 11:52):

one way to address that would be to offer Iter.custom but have it be List.custom intead

Richard Feldman (Sep 19 2024 at 11:52):

but then we have the problem that if you have a List in general, you're assuming it has certain properties (e.g. seamless slices) but actually sometimes it might be a "secret iterator" created with List.custom which doesn't have the perf characters of an actual List

Richard Feldman (Sep 19 2024 at 11:54):

another design direction is to say "okay custom data structures just don't benefit from fusion"

Richard Feldman (Sep 19 2024 at 11:55):

but the problem there is that now it becomes a reasonable style to do things like |> List.map fn1 |> List.map fn2 until you make a custom data structure, at which point it's like "oh yeah don't do the normal style at all or else your performance will be terrible - switch to loops/folds/etc"

Richard Feldman (Sep 19 2024 at 11:55):

which doesn't seem like a good outcome :sweat_smile:

Richard Feldman (Sep 19 2024 at 11:56):

so I wonder about the design direction of having Iter but sort of inverting the default :thinking:

Richard Feldman (Sep 19 2024 at 11:57):

in other words, don't have the builtins implemented in terms of Iter, but give them toIter and fromIter

Richard Feldman (Sep 19 2024 at 11:57):

and then do these optimizations:

Iter fusion
separate, special-cased fusion on the builtins (so, List, Dict, Set)

Richard Feldman (Sep 19 2024 at 11:58):

Richard Feldman (Sep 19 2024 at 12:17):

Sam Mohr said:

It feels like a stupid question, but could someone tell me why the reason we want fusion is anything beyond "iterators are more effort to use"?

the way I think about this is that there are a few reasonable designs possible:

Status quo ("don't do things like |> List.map fn1 |> List.keepIf fn2 because it will allocate an intermediate list") - downside is that you sometimes have to choose between code styles that feel natural in a functional language, are easy to read and understand, but don't have good perf bc of intermediate allocations
Always convert to Iterexplicitly - make super common operations like List.map ~3x as verbose, to increase code consistency while avoiding intermediate allocations. This design makes less sense to me than status quo. The pitch sounds to me like "let's facilitate writing code in a nice concise functional style without sacrificing perf...by making it so that it's always really verbose to do basically anything in a functional style." :sweat_smile:
Fusion

Richard Feldman (Sep 19 2024 at 12:19):

there's another possible direction, which is essentially "introduce the concept of ability parameters, have an Iter elem abilitiy, give it to List", and then you can write:

foo
|> Iter.map fn1
|> Iter.keepIf fn2

where foo is a List.

Richard Feldman (Sep 19 2024 at 12:22):

some downsides of that design:

compared to status quo, |> Iter.map is no more concise than |> List.map but it's less self-documenting. I had to tell you that foo was a list because you didn't have the extra context that |> List.map provides.
it might be even worse for compiler perf because I think we still have about the same amount of IR as a naive Iter value implementation, but we also have a lot of extra ability resolution all over the place
calling style gets more fragmented, e.g. it's |> Result.map if you have a Result, |> Parser.map if you have a Parser, but |> Iter.map if you have a List
it requires the extra language feature of parameterized abilities

Ayaz Hafiz (Sep 19 2024 at 14:03):

I think you can make the List approach work if you do the fusion all at once; there is no need for intermediate “list but secretly an iterator”. And the case of custom data structures can be solved by specialized the toList constructor around fusion.

There are other ways here to avoid potentially blowing up the amount of code in user space, which is the larger thing I’m trying to get at, for example by using external iterators. Happy to chat more Friday

Ayaz Hafiz (Sep 19 2024 at 14:09):

it might be even worse for compiler perf because I think we still have about the same amount of IR as a naive Itervalue implementation, but we also have a lot of extra ability resolution all over the place

This would very likely be better! The reason being the typechecking happens before any specialization so you have less code to check up front, the typechecking can direct you to exactly what code to inline and simplify, and you have much less code to delete because there are less trivially discardable fromIter/toIter in sequence.

+1 on it probably not being the right approach for ergonomic reasons

Ayaz Hafiz (Oct 07 2024 at 02:27):

https://arxiv.org/pdf/2410.02232

Richard Feldman (Oct 07 2024 at 15:48):

so if I understand this correctly, it seems like:

it's a way to do fusion based on a separate type inference pass where you can specify your own primitive types and unification rules, meaning you could (for example) special-case all the builtins like List.map
you no longer need the "match and eliminate fromIter/toIter pairs" pass, so you also don't need inlining to happen first - which could be beneficial for dev build performance if we wanted to try to get fusion without having to do inlining (although it seems like inlining is still desired after this fusion pass for optimized builds, in order to get rid of some of the thunks this strategy creates)
it works on more than just iterators

does that all sound right? Did I miss anything?

Ayaz Hafiz (Oct 07 2024 at 20:12):

yes i think so

Richard Feldman (Oct 07 2024 at 22:08):

very cool! sounds more promising than the previous state of the art

Richard Feldman (Jan 09 2025 at 18:53):

I think based on :point_up: it probably makes sense to hold off on doing an Iter API for right now. We can have for loops work on List for now, and if you want to iterate over something other than a List, you have to convert it to a List first

Richard Feldman (Jan 09 2025 at 18:53):

obviously that will be bad for performance in the immediate term, but it does mean you have a way to use for loops on anything if you really want to, and if the perf is too bad you can fall back on the pre-for strategies

Richard Feldman (Jan 09 2025 at 18:53):

(for list perf it'll be fine, of course)

Richard Feldman (Jan 09 2025 at 18:54):

because it seems like - especially in light of that paper - it might actually be possible to get the benefit of Iter without actually introducing a separate Iter type, and instead just using List for everything

Richard Feldman (Jan 09 2025 at 18:54):

this would be really nice from an API perspective, but it feels like something that needs a lot more investigation and experimentation, and I don't want to block for on that

Richard Feldman (Jan 09 2025 at 18:55):

I also don't want to build and introduce an Iter API that we later end up removing as a big breaking change

Sam Mohr (Jan 09 2025 at 18:57):

And if for item in list relies on some static dispatch a.iter() -> Iter elem then there won't be much of a migration step if we move to Iter later

Richard Feldman (Jan 09 2025 at 19:01):

hm, I'm not sure that would work in the current type system

Sam Mohr (Jan 09 2025 at 19:01):

Because no HKT?

Richard Feldman (Jan 09 2025 at 19:01):

yeah

Sam Mohr (Jan 09 2025 at 19:02):

I'll have to bug @Ayaz Hafiz about that special set of HKT that we'd be able to support with principle type inference

Richard Feldman (Jan 09 2025 at 19:02):

although if we did end up having List be the intermediary, then they'd all be doing the right thing already anyway

Richard Feldman (Jan 09 2025 at 19:02):

Sam Mohr said:

I'll have to bug Ayaz Hafiz about that special set of HKT that we'd be able to support with principle type inference

we definitely can do that, I just don't think we should

Sam Mohr (Jan 09 2025 at 19:03):

If we move to Iter instead of List, then we'd need something like that. Otherwise we'd be good with what you suggest.

Richard Feldman (Jan 09 2025 at 19:04):

well or just call list.to_iter() explicitly, which I think would be totally fine :big_smile:

Sam Mohr (Jan 09 2025 at 19:07):

Yep

Last updated: Jul 23 2026 at 13:15 UTC