Stream: ideas

Topic: Souped up slices


view this post on Zulip Anne Archibald (Feb 20 2024 at 19:27):

I see that roc supports "slices", that is, indexing objects that point to a subset of an existing List's elements. This sounds very like the central idea that produced Python's numpy: you can represent a huge number of data layouts by giving an arena of contiguous memory plus an indexing object that specifies a list of dimensions, a starting point, a "stride" in memory corresponding to increasing that index by one, and a size for each dimension. So if you want an m by n matrix, you could have two dimensions, of size m and n, and with strides n and 1. If you wanted to transpose it, you would leave the data untouched and simply produce a new indexing operator with the sizes and strides reversed. You can easily extract subsets, skip elements, and reverse along any axis. (There are also some anomalous things one can do with "improper" combinations of stride and size, like make a sliding-window matrix that shares elements, but these are perhaps best not encouraged.)

I don't know whether roc has a use for a multidimensional array data type - in particular whether roc is going to have the syntax required to make using one tolerable - but one could be built on top of List. Making things go fast would require calling into compiled code for pure functions (GotoBLAS/ATLAS/vendor-optimized toolkits have cache-optimal loops for various matrix operations) but adequate speed should be achievable with a fairly simple pure roc implementation.

More relevantly, one could have slice objects that took not only a start and length argument, but also a stride argument. Then one could create slices that captured every other element, or more relevantly, one could create a slice that, without copying, simply indexed a list backwards. Then List.reverse() could simply return a view of the original object, no copying required. Opportunistic mutation would require no more complexity than current slices do. (Unless you wanted to optimize the case where only two non-overlapping slices could access the underlying memory.)

Would improving slices in this way be useful?

view this post on Zulip Brendan Hansknecht (Feb 20 2024 at 20:18):

I think it would be hard to do with the current constraint on slices to be seamless with lists and to fit in 3 usizes.

view this post on Zulip Brendan Hansknecht (Feb 20 2024 at 20:19):

So definitely would be interesting and may be worth trying, but might not have the flexibility to make it work without making all lists more expensive (which we would prefer not to do)

view this post on Zulip Brendan Hansknecht (Feb 20 2024 at 20:21):

Also, there has been some discussion on matrix/tensor types in roc, but the stance so far seems to be that first we would want to see a serious userland library to see what the cost of userland truly is before we consider a bespoke syntax in roc. (Cost in terms of ergonomics, perf, full picture)

view this post on Zulip Brendan Hansknecht (Feb 20 2024 at 20:23):

I think it currently falls into an area of interest, but no strong enough driver/use case to consider an implementation.

view this post on Zulip Richard Feldman (Feb 21 2024 at 01:03):

something that's definitely interesting about the idea (maybe as a separate type from List) is that it sounds like maybe it could be used to implement multiple different matrix representations?

view this post on Zulip Richard Feldman (Feb 21 2024 at 01:03):

in userspace I mean

view this post on Zulip Richard Feldman (Feb 21 2024 at 01:04):

I recall that being a concern from the discussions before: there wasn't one in-memory representation of a matrix that made sense for all use cases

view this post on Zulip Brendan Hansknecht (Feb 21 2024 at 07:28):

For sure, high quality matrices often require any tricks around representation and slicing and such.

view this post on Zulip Brendan Hansknecht (Feb 21 2024 at 07:29):

That said, as discussed in other chats. One of the biggest concerns for matrices is probably the lack of custom infix operators to give them nice organics. Currently to get that would require them to be builtin.

view this post on Zulip Brendan Hansknecht (Feb 21 2024 at 07:33):

I think a fancy slice type would be in the same boat. You would want it to just work with list pattern matching. You would want it to just work with the list apis like reverse just setting a bit instead of actually doing any work.

As a concept separate from list it probably isn't too useful. Would only be used by people that really really need it for perf. Also, would lead to similar issues to iterators if it caught on. Split ecosystem.

view this post on Zulip Brendan Hansknecht (Feb 21 2024 at 07:33):

So I think it mostly is valuable as a seamless extension to list. A lot less valuable as a standalone special list wrapper type with extra data.

view this post on Zulip Anne Archibald (Feb 21 2024 at 17:38):

Brendan Hansknecht said:

I think it would be hard to do with the current constraint on slices to be seamless with lists and to fit in 3 usizes.

Yes, I was definitely thinking of an implementation that would be seamless, at least from the roc side. If this isn't possible, then they lose most of their appeal.

view this post on Zulip Anne Archibald (Feb 21 2024 at 17:57):

Richard Feldman said:

I recall that being a concern from the discussions before: there wasn't one in-memory representation of a matrix that made sense for all use cases

This is specifically the idea that strided memory access is designed to solve; it doesn't matter to most code whether the matrix is in row-major or column-major form; or if it's actually one of a stack of many matrices arranged along some other axis. It also defers the computation of transposes or other data reordering operations until they are actually needed.

For a multidimensional array library, reasonable syntax is very important. Python's numpy was able to take advantage of the flexibility of the indexing (square bracket) operator to allow constructions like a[...,::-1] to mean "a reversed along the last axis, regardless of how many axes it has". This can also cope with a[a>0] to select the positive elements of a. You can also write perfectly normal expressions like a+b*c and they work element-wise if you have arrays; including "broadcasting" scalar values to match the matrix ones. For years, though, there was no infix matrix multiplication operator, which forced all matrix multiplication to be written prefix (dot(a, dot(b,c))). This was a major point of contention, and there was enough demand for it to push the new @ operator into the syntax of the language.

The other thing any matrix library would need would be the ability to call into compiled (non-roc) code from pure functions. While you can implement matrix multiplication with loops, including strides, it will be grossly inefficient. So will matrix transposes. The BLAS library exists simply to make this sort of very simple operation efficient (it was created to speed up C and FORTRAN code). LAPACK extends this approach to highly non-trivial procedures like LU decomposition and singular value decomposition. Modern BLAS/LAPACK implementations are usually highly optimized (often by chip vendors) to be fast with specific cache layouts and architectures. Many also use OpenMP or other mechanisms to enlist multiple cores for large matrix operations, giving people doing heavy linear algebra work "free" parallelization. There is no way these libraries will ever be replicated in pure roc and there is no way linear algebra/matrix operations will be even close to as fast without them. Linear algebra operations often live at the core of complex algorithms, and many of them are pure functions, or can be run in-place if the inputs are no longer needed. I do not think that forcing users to include apparent side effects in code that does not have any, just so they can multiply two matrices, will produce a tolerable user experience. As I understand it this is a constraint that roc intends to enforce: no compiled code outside the compiler can be treated as pure.

Finally, a really major application of multidimensional matrix types is in machine learning, where the actual matrix data often lives elsewhere than the heap; in GPU RAM, or distributed across a cluster, or otherwise in unusual places. Accessing this necessarily requires third-party tools. Some of the interfaces these third-party tools provide would naturally be pure, but others build side-effects into their way of working. It is not clear how or if such tools can coexist with a functional style of code.


Last updated: Jun 16 2026 at 16:19 UTC