Stream: ideas

Topic: buffer protocol


view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 15:47):

So I have been messing around with using python and c++ together recently. Of course, a major problem is avoiding copying data around especially with multidimensional array like types. For python, they expose a buffer protocol that is essentially data pointer, size of element, element type, number of dimensions, size of dimensions, and strides in bytes. This enables easily sharing data buffers of any shape while avoiding copying.

I know that we eventually plan to add seamless slices, and we believe that minor abuses to seamless slices should enable passing a buffer from host to Roc without copying. I am wonder if there would be some merit digging into a more structured/full featured buffer protocol.

Of course we have some complications around mutability and garbage collection, but performance of platform to app communication can be super important when talking about large amounts of data.

In the case of performance critical code, it would be amazing to pass in a unique buffer owned by the host, have a way to ensure roc only does in place transformations on the buffer, and profit from never copying.

Anyway, here is an article talking some about the buffer protocol: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html#buffer-protocol

view this post on Zulip Richard Feldman (Sep 30 2022 at 16:14):

I might be missing something, but doesn't the current seamless slices design (with the changes noted in the issue itself) cover that use case?

view this post on Zulip Richard Feldman (Sep 30 2022 at 16:15):

you have to allocate a refcount somewhere for the buffer, but I don't think that's avoidable for Roc's automatic memory management to work

view this post on Zulip Richard Feldman (Sep 30 2022 at 16:15):

but if that refcount is 1, you'd still get in-place mutation

view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 16:26):

I guess it does mostly cover it. It doesn't have any extra information around shapes and what not, but i guess that can done in user land.

I also think that something I keep coming back to is wanting a Unique type level tag that at compile time ensures you never copy.

In this case for example, you really want an inout parameter. As such, refcounting doesn't really make sense. If you try to free the buffer, there will be problems. So yes it is unique and roc can edit it, but no, roc can't ever free it. Also, roc should edit it in place because we don't want to return anything. Just modify the buffer. So I guess if we had a return type, we would want to be able to assert that we return the exact same buffer that came in.

view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 16:26):

Also, i think that is something we missed with seamless slices, they likely can't have a refcount of 1 because if roc frees the slice, it would fail.

view this post on Zulip Richard Feldman (Sep 30 2022 at 21:40):

hm, that's true

view this post on Zulip Richard Feldman (Sep 30 2022 at 21:41):

although the host can say "this has a refcount of 1, so go ahead and free it, but in my roc_dealloc implementation I'll look up the pointer and see that it's actually still in use, and so make dealloc be a no-op for this particular pointer"

view this post on Zulip Richard Feldman (Sep 30 2022 at 21:47):

I also think that something I keep coming back to is wanting a Unique type level tag that at compile time ensures you never copy.

I'm pretty sure this requires linear types, which (aside from being a big complication to the type system) would also be infectious, which would mean you couldn't just put a wrapper type around one type

view this post on Zulip Richard Feldman (Sep 30 2022 at 21:48):

I do think expect-unique sounds plausible though, which would at least let you verify specific paths through tests

view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 21:51):

The roc_dealloc option is a possibility though definitely a lot messier than I would hope.

view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 21:53):

And yeah, if it requires linear types that definitely is problemática. I would really hope there is some middle ground here where we can still get some stronger guarantees.....but i also, am currently debating trying to use roc in something with machine learning and large tensors. Probably a niche use case.

view this post on Zulip Richard Feldman (Sep 30 2022 at 21:55):

yeah and I think it's worth trying to figure out ways to make things work in these use cases!

view this post on Zulip Richard Feldman (Sep 30 2022 at 21:56):

just gotta balance that with the potential costs to the type system

view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 23:03):

Also, just to clarify, my main question -> can I get roc to consistently update the data in a tensor inplace without the overhead of a function call to an effect per write?

view this post on Zulip Brendan Hansknecht (Sep 30 2022 at 23:03):

If we can solve that nicely, I will be happy.

view this post on Zulip Richard Feldman (Oct 01 2022 at 07:17):

I assume a tensor is an array of data that was allocated by the host, and doesn't have space for a refcount at the start?

view this post on Zulip Brendan Hansknecht (Oct 01 2022 at 14:16):

That is correct.

Also, a tensor is multidimensional and might be pinned to a special location of memory that is automatically being copied to the GPU.

view this post on Zulip Richard Feldman (Oct 01 2022 at 15:42):

yeah then I think the "make sure dealloc doesn't free them" approach should work

view this post on Zulip Richard Feldman (Oct 01 2022 at 15:45):

one way this could be done efficiently: implement roc_alloc to mmap a big chunk of contiguous virtual memory up front, and then parcel out chunks of it as normal (the way malloc would).

Now roc_dealloc can trivially test whether it's trying to free a tensor: see if the given pointer is inside that big chunk of mmap'd virtual address space reserved for non-host Roc allocations

view this post on Zulip Richard Feldman (Oct 01 2022 at 15:46):

you could also get fancier and support multiple big backing allocations (in which case you might have to check a small N number of pointer ranges as the program's memory usage grows) but that might never be necessary in practice if you can get away with a big enough virtual memory allocation at the start

view this post on Zulip Richard Feldman (Oct 01 2022 at 15:47):

this does mean you need to have a custom roc_alloc, of course, but depending on your perspective, having a good reason to do that might be a fun bonus :grinning_face_with_smiling_eyes:


Last updated: Jun 16 2026 at 16:19 UTC