Stream: ideas

Topic: SIMD runtime detection


view this post on Zulip Richard Feldman (Jul 17 2023 at 01:52):

so it seems safe to assume that when we add SIMD support to Roc, there will be demand for the following scenarios:

view this post on Zulip Richard Feldman (Jul 17 2023 at 01:52):

those last two options seem pretty straightforward to support, but the first one seems tricky

view this post on Zulip Richard Feldman (Jul 17 2023 at 01:54):

like one option is to try to have some sort of Simd.support pure function which tells you if it's supported. By default I strongly prefer to avoid things that fit the description "is a pure function that returns a totally different value depending on what system is running on" so I'd rather not do that

view this post on Zulip Richard Feldman (Jul 17 2023 at 01:57):

we could make that be a Task instead, but then it becomes potentially clunky to use in various situations because it's common to want to use simd in pure functions - so you'd need to detect it and then thread it through probably.

This might especially be a problem for libraries, which might end up accepting configuration parameters for whether or not to use simd, which doesn't sound great - and kind of defeats the purpose of having a Simd abstraction that's capable of doing graceful fallbacks

view this post on Zulip Richard Feldman (Jul 17 2023 at 01:59):

another issue with that idea is that it seems like sometimes detecting simd requires syscalls (specifically on arm32 Linux) which is not something Roc builtins do today. So we might need to make it another "platforms expose a function which tells us what's supported" thing

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:03):

another approach is to say: let platform authors request which simd options are supported, and then compile a different specialized version of the application's entrypoints for each requested simd feature set, and let the platform call the appropriate one depending on its own process for determining what's available at runtime

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:04):

that has its own set of problems, one of which is that this feels like something the application author should control rather than the platform author. (It's very related to --target)

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:05):

Personally I would err towards testing solutions where the simd size is opaque to roc.

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:06):

I am not full convinced they will turn out well, but that seems to be what is being pushed for and tested by a number of modern languages and developers who have messed with simd a lot

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:06):

so I like that in theory, but then there's a problem of: when do we do the runtime detection?

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:06):

The question is if we can get almost all of the gains with a lot more simplicity of the users

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:07):

like assuming we have the answer of what's supported already in memory somewhere, there has to be a branch which checks it at runtime. Where does that branch go?

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:07):

Probably would just do it lazily on the first call to a simd function. We already do that for memcpy.

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:07):

ok but now every single simd function has an extra memory load and branch attached to it? :grimacing:

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:08):

Essentially the same way a call to libc or any shared library works, but for simd.

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:08):

right but these are supposed to be fast arithmetic functions haha

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:08):

unlike memcpy

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:08):

That's fair, maybe it wouldn't work for simd (though compared to other instructions, simd instructs are quite slow)

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:09):

sure, but even things like loading into a simd register would need it

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:09):

Also, if we really want perf, we could have code to actually rewrite the binary at runtime. Then the cost is one 100% predictable jump instruction.

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:09):

it feels like some amount of hoisting and/or specialization would be necessary

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:10):

I thought about that, but can we actually do that with a normal executable on every OS?

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:10):

Oh, I think you are missing a piece, with the opaque simd functions the function you expose would operate on arrays of data, not single values

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:10):

I thought to be able to write to executable memory you need special permissions nowadays

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:12):

Simd on single values is almost never worth it. When you go back and forth between simd and normal code, it often just thermal throttles processes and you lose gains

view this post on Zulip Brendan Hansknecht (Jul 17 2023 at 02:12):

But all of this, I would just label as worth thoroughly testing. I would not label it as definitely a good plan.

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:13):

that's fair - it does seem like the easiest to test, if nothing else

view this post on Zulip Richard Feldman (Jul 17 2023 at 02:14):

also I suppose a potential option is to say we don't support it on arm32 Linux if that's the only one that doesn't support detecting it as a CPU instruction


Last updated: Jun 16 2026 at 16:19 UTC