I've been chatting with @Johan Lindskogen and @Ryan Bates on this github issue about how best to represent matrix indices https://github.com/mulias/roc-array2d/issues/5
Most of the necessary context is in the thread, but in case it's helpful the current implementation uses Index : { x: Nat, y: Nat }
, and all of the array functions treat x
as the row component and y
as the column component of the index. As Johan points out that's not always going to match the user's expectations.
I think we've reached a point where I need to experiment with some options and see which ones make the most readable roc code, but I thought I'd bring the discussion over here to give people with different perspectives/backgrounds a chance to chime in. In particular I'm interested in advice from people with data science or array programming experience, since I have done very little of either!
Generally, I think row and column are more intuitively clear to people
I've got data science experience :)
I like Index : { row: Nat, col: Nat }
(4) for it's clarity. Algorithms that use multi-dimensional arrays can very easily become hard to follow, so I think (4) mitigates this best.
I've done a little bit of data science and computer graphics and I was never sure which one is which. I think generally y
is the rows and x
is the columns, so you index like this: array[y][x]
. But that can differ between use-cases (table with records vs an image) and libraries. I think row and column would clear that up, but then you still need to remember which one is the "outer" one (this is often important for performance)
Yes that's a good point, I'm currently using row-major ordering https://en.wikipedia.org/wiki/Row-_and_column-major_order
My understanding is that some APIs will let you choose your data order. I've thought about that a little bit but it's fiddly.
Maybe Index : { outer: Nat, inner: Nat }
could work? That's more explicit and has less assumptions, but it's not really used so can be confusing. Although, IMO, it's better to have to think about it than be confused about what x
or row
means in a given context
Interesting. I think the issue with that is outer
could be either the row or the column, depending on if the data is stores in row-major or column-major order. I'll keep thinking about it though.
I feel like what row
and column
means depends on the context and maybe how you get your data. If you load a table from a csv file it's pretty unambiguous, but in other cases not so much. If you transpose an image, rows are now columns and columns are now rows, but they're still arrays of pixels. When you render an image to screen, you have a "contract" with the library/API/whatever that inner arrays will go horizontally. But it's not a "property" of the data.
But maybe that's too philosophical/abstract and not productive :sweat_smile:
Sure. We need to start with something that people can build their mental model off of, and I think it's pretty well understood that columns are the uppy-downies and rows are the side-to-sidies.
If you're only interested in sticking with 2D arrays, I'd suggest {row, col}. If you're planning to go to higher dimensions later on, I'd honestly go with the tuple option, and expose helpers like dim0, dim1, ...
to extract the position in a dimension.
No solid plans, but I'm considering 1d with a integer index, 2d with {row, col}
, 3d with {row, col, depth}
, and Nd with a tuple. I don't think that the index types being different would be too big of a deal
Would you prefer to use column-major for "scientific" purposes, no? I think every scientific purposed language I've used (R, Matlab, Julia, eigen/blas/lapack etc.) other than numpy (although, order='F') uses it by default or allows you to modify it. For math/physics/science books in general its standard, but computer graphics textbooks said "nah"... Although in the end it doesn't really matter that much, more preference/consistency.
Last updated: Jul 06 2025 at 12:14 UTC