SIMD API · ideas · Zulip Chat Archive

Instead of a constant you could repeat some of the real data. So if you have 3 leftover items at the end then you could repeat the last one to get 4. Then it won't throw a divide-by-zero unless it would have done so anyway.

Richard Feldman (Apr 23 2022 at 20:52):

interesting! Algorithms where each chunk references the others could still cause problems there, but it seems like that would be a super rare use case

Folkert de Vries (Apr 23 2022 at 20:55):

Richard Feldman (Apr 23 2022 at 20:58):

I don't think any of the "fill in leftovers automatically" designs can work well with that :big_smile:

Richard Feldman (Apr 23 2022 at 21:00):

Jared Cone (Apr 23 2022 at 22:33):

User-provided fill value may be necessary. The value may need to be different depending on the operation. Adds a bit of complexity for the initial implementer but most people would probably just use functions from a library of functions that hide this detail. One downside is it may require duplicating the array if the array has more than one reference and the specified fill value doesn't match what's already there?

Jared Cone (Apr 23 2022 at 22:39):

Might want a version where user can declare it doesn't matter what the fill value is, the operation doesn't care

Brendan Hansknecht (Apr 23 2022 at 23:13):

Don't you just not use simd on the last iteration of the loop if it isn't a multiple of the lanes?

Thomas Dwyer (Apr 23 2022 at 23:19):

Awesome proposal! The addition of tuples, should this proposal pass, sounds great! On the handling of overflow, giving the user choice between garbage fill and defined fill seems preferable. just wondering, the last bullet on #3 under proposals shows Simd having kind * -> * but doesn't it have kind * -> * -> * for defining the number of lanes? Not being too knowledgeable about simd beyond basics, is it not possible to supply a function a -> a and automatically fit that onto all lanes so that it can be used for both the simd operations and the leftovers?

Jared Cone (Apr 23 2022 at 23:57):

Brian Carroll (Apr 24 2022 at 00:21):

Brian Carroll (Apr 24 2022 at 00:26):

Does that still count as "single instruction multiple data"? (Sounds to me like it shouldn't but maybe CPU makers see it differently?)

Folkert de Vries (Apr 24 2022 at 00:41):

Folkert de Vries (Apr 24 2022 at 00:42):

so it fits in the "works on 256/512-bit registers" category, which we typically call simd I guess but yeah that is not entirely accurate

Richard Feldman (Apr 24 2022 at 01:04):

Richard Feldman (Apr 24 2022 at 01:09):

this would be equivalent to having map use SIMD under the hood; the problem is that SIMD instructions only sometimes have equivalents in non-SIMD instructions, and vice versa

Richard Feldman (Apr 24 2022 at 01:12):

so sometimes they can be automated - e.g. we can offer a List.mul that silently uses SIMD under the hood, and handles leftovers in this way, and sometimes LLVM can automatically SIMDify certain loops (this is known as "autovectorization") - but it can only do it for pretty basic examples.

The point of having a function like this would be to give you the power to do operations that are complex enough (could be as simple as doing one instruction or another based on a conditional) that autovectorization wouldn't be able to figure out how to correctly translate them from non-SIMD to SIMD instructions.

Richard Feldman (Apr 24 2022 at 01:32):

yeah they have things like swizzle which reorders the lanes in a vector, and also things like masks, "give me the min/max of all the numbers in the lanes" etc.

I don't think any of those operations would make sense to use in a map (as opposed to a walk that accumulates state) but there's nothing stopping someone from using those operations if they decide to :big_smile:

Richard Feldman (Apr 24 2022 at 01:32):

I agree! Great link @Jared Cone - I think the overlapping strategy is the most promising one so far

Martin Stewart (Apr 24 2022 at 15:27):

Is the introducing tuples necessary for this? Doesn't using T2, T3, T4 tag unions work equally well? For example, instead of

matrix = Matrix4x4.new
    ((5, 4, 7, 8),
     (9, 0, 1, 2),
     (3, 4, 5, 6),
     (7, 8, 9, 0))

matrix = Matrix4x4.new
    (T4 5 4 7 8)
    (T4 9 0 1 2)
    (T4 3 4 5 6)
    (T4 7 8 9 0)

Richard Feldman (Apr 24 2022 at 15:34):

Martin Stewart (Apr 24 2022 at 15:43):

Wouldn't matrix4x4 define getter functions? Matrix4x4.row1column2 matrix or something short like Matrix4x4.v12 matrix, though I kind of like the first one because I always get row and column mixed up when working with matrices.

Nikita Mounier (Apr 24 2022 at 16:37):

To me tuples seem like a natural addition to the language, and I like how you define them on top of records which will definitely reduce implementation complexity.

Richard Feldman (Apr 24 2022 at 16:38):

it could, but then you also need a separate set of them for Matrix4x4 and Matrix3x3 and Matrix4x3 and Matrix3x4, all of which come up in games - so a total of 49 getter functions

Richard Feldman (Apr 24 2022 at 16:39):

edit: also you'd want setter functions for all those positions too, so it's either tuples or +112 functions for known use cases games have

Richard Feldman (Apr 24 2022 at 16:39):

that's totally a plausible API, but I was on the fence about tuples before, and this pushes me over to "yeah they're worth it"

Martin Stewart (Apr 24 2022 at 16:55):

Forcing the API author to write getters/setters for the matrices nudges them towards giving useful names that say whether the matrix is row major or column major. (toTuple matrix).1.2 doesn't tell the end user if row or column is accessed first. To me the time saved by one person not needing to write the getters and setters will be lost later by people needing to slow down and carefully think through this (or not realizing it and potentially introducing a bug)

Edit: To clarify why I think row/column are easily mixed up. I know that matrices tend to be row then column but anyone new to matrices might think xy coordinates in which case column would come first.

Jared Cone (Apr 24 2022 at 17:16):

API authors could still write those getters and setters even if underlying is a tuple. Though I think adding a language feature should always be approached with caution. What would the ergonomics be with using tags instead of tuples? Is it inconvenient to access those elements?

Brendan Hansknecht (Apr 24 2022 at 17:23):

Tags are quite inconvenient for accessing a specific element in general. You have to pattern matching the tag to get the element. Just ends up being quite verbose

Brendan Hansknecht (Apr 24 2022 at 17:25):

Something like this:
(Matrix4x4 _ (T4 _ _ elem _) _ _) = m
is equivalent to:
elem = m.1.2

Richard Feldman (Apr 24 2022 at 17:45):

I certainly don't think tuples are necessary for this, but I also certainly think they're nicer; I definitely prefer the version without the T tags! :big_smile:

Richard Feldman (Apr 24 2022 at 17:47):

to be fair, if not for the fact that roc format would jumble it up, you could also theoretically have new take 16 arguments:

matrix = Matrix4x4.new
     5 4 7 8
     9 0 1 2
     3 4 5 6
     7 8 9 0

Richard Feldman (Apr 24 2022 at 18:06):

I also think it's reasonable to want a Matrix API that exposes 100+ functions for getters and setters, to make it harder to mix up row and column, but I also think it's reasonable to prefer polymorphic getters and setters and module documentation that doesn't primarily consist of 100+ getter and setter functions.

Richard Feldman (Apr 24 2022 at 18:14):

as an aside, I've run into wanting tuples in both application design and API design often enough that I'm convinced they will end up in the language eventually

Richard Feldman (Apr 24 2022 at 18:16):

if it's not going to be because of matrices and vectors, I expect they'd end up being worth it for other reasons anyway

Kevin Gillette (Apr 25 2022 at 14:40):

List.mapSimd List a, Simd.Policy (Simd a lanes -> Simd b lanes) -> List b

[
  Panic,
  CopyAny # likely equivalent to CopyLast
  CopyLast # repeat last element as needed
  CopyFirst
  FillConst a
  FillFunc (Nat -> a) # based on destination index (past end of list)
]

Stream: ideas

Topic: SIMD API

Richard Feldman (Apr 23 2022 at 16:22):

Brian Carroll (Apr 23 2022 at 20:11):

Brian Carroll (Apr 23 2022 at 20:12):

Richard Feldman (Apr 23 2022 at 20:52):

Folkert de Vries (Apr 23 2022 at 20:55):

Richard Feldman (Apr 23 2022 at 20:58):

Richard Feldman (Apr 23 2022 at 21:00):

Jared Cone (Apr 23 2022 at 22:33):

Jared Cone (Apr 23 2022 at 22:39):

Brendan Hansknecht (Apr 23 2022 at 23:13):

Thomas Dwyer (Apr 23 2022 at 23:19):

Jared Cone (Apr 23 2022 at 23:57):

Brian Carroll (Apr 24 2022 at 00:21):

Brian Carroll (Apr 24 2022 at 00:26):

Folkert de Vries (Apr 24 2022 at 00:41):

Folkert de Vries (Apr 24 2022 at 00:41):

Folkert de Vries (Apr 24 2022 at 00:42):

Richard Feldman (Apr 24 2022 at 01:04):

Richard Feldman (Apr 24 2022 at 01:09):

Richard Feldman (Apr 24 2022 at 01:12):

Richard Feldman (Apr 24 2022 at 01:32):

Richard Feldman (Apr 24 2022 at 01:32):

Martin Stewart (Apr 24 2022 at 15:27):

Richard Feldman (Apr 24 2022 at 15:34):

Richard Feldman (Apr 24 2022 at 15:34):

Martin Stewart (Apr 24 2022 at 15:43):

Nikita Mounier (Apr 24 2022 at 16:37):

Richard Feldman (Apr 24 2022 at 16:38):

Richard Feldman (Apr 24 2022 at 16:39):

Richard Feldman (Apr 24 2022 at 16:39):

Martin Stewart (Apr 24 2022 at 16:55):

Jared Cone (Apr 24 2022 at 17:16):

Brendan Hansknecht (Apr 24 2022 at 17:23):

Brendan Hansknecht (Apr 24 2022 at 17:25):

Richard Feldman (Apr 24 2022 at 17:45):

Richard Feldman (Apr 24 2022 at 17:47):

Richard Feldman (Apr 24 2022 at 18:06):

Richard Feldman (Apr 24 2022 at 18:14):

Richard Feldman (Apr 24 2022 at 18:16):

Kevin Gillette (Apr 25 2022 at 14:40):