SIMD runtime detection · ideas · Zulip Chat Archive

Stream: ideas

Topic: SIMD runtime detection

Richard Feldman (Jul 17 2023 at 01:52):

so it seems safe to assume that when we add SIMD support to Roc, there will be demand for the following scenarios:

I want to make use of SIMD when it's available, and use a fallback otherwise. I want runtime detection of this in my binary, so I can tell my users to download a binary for their particular CPU architecture and OS, and it will Just Work (and make use of whatever SIMD features their CPU has)
I want to compile my app to work only on one target architecture (e.g. my servers or my raspberry pi) and I want it to run as fast as possible. I don't care about fallbacks at all, and don't want to pay any runtime check cost for them, or have the binary get bigger because it's including fallbacks.
I don't think SIMD is going to improve my performance at all in practice, so I want maximum compatibility: fallbacks only.

Richard Feldman (Jul 17 2023 at 01:52):

those last two options seem pretty straightforward to support, but the first one seems tricky

Richard Feldman (Jul 17 2023 at 01:54):

like one option is to try to have some sort of Simd.support pure function which tells you if it's supported. By default I strongly prefer to avoid things that fit the description "is a pure function that returns a totally different value depending on what system is running on" so I'd rather not do that

Richard Feldman (Jul 17 2023 at 01:57):

we could make that be a Task instead, but then it becomes potentially clunky to use in various situations because it's common to want to use simd in pure functions - so you'd need to detect it and then thread it through probably.

This might especially be a problem for libraries, which might end up accepting configuration parameters for whether or not to use simd, which doesn't sound great - and kind of defeats the purpose of having a Simd abstraction that's capable of doing graceful fallbacks

Richard Feldman (Jul 17 2023 at 01:59):

another issue with that idea is that it seems like sometimes detecting simd requires syscalls (specifically on arm32 Linux) which is not something Roc builtins do today. So we might need to make it another "platforms expose a function which tells us what's supported" thing

Richard Feldman (Jul 17 2023 at 02:03):

another approach is to say: let platform authors request which simd options are supported, and then compile a different specialized version of the application's entrypoints for each requested simd feature set, and let the platform call the appropriate one depending on its own process for determining what's available at runtime

Richard Feldman (Jul 17 2023 at 02:04):

that has its own set of problems, one of which is that this feels like something the application author should control rather than the platform author. (It's very related to --target)

Brendan Hansknecht (Jul 17 2023 at 02:05):

Personally I would err towards testing solutions where the simd size is opaque to roc.

Brendan Hansknecht (Jul 17 2023 at 02:06):

I am not full convinced they will turn out well, but that seems to be what is being pushed for and tested by a number of modern languages and developers who have messed with simd a lot

Richard Feldman (Jul 17 2023 at 02:06):

so I like that in theory, but then there's a problem of: when do we do the runtime detection?

Brendan Hansknecht (Jul 17 2023 at 02:06):

The question is if we can get almost all of the gains with a lot more simplicity of the users

Richard Feldman (Jul 17 2023 at 02:07):

like assuming we have the answer of what's supported already in memory somewhere, there has to be a branch which checks it at runtime. Where does that branch go?

Brendan Hansknecht (Jul 17 2023 at 02:07):

Probably would just do it lazily on the first call to a simd function. We already do that for memcpy.

Richard Feldman (Jul 17 2023 at 02:07):

ok but now every single simd function has an extra memory load and branch attached to it? :grimacing:

Brendan Hansknecht (Jul 17 2023 at 02:08):

Essentially the same way a call to libc or any shared library works, but for simd.

Richard Feldman (Jul 17 2023 at 02:08):

right but these are supposed to be fast arithmetic functions haha

Richard Feldman (Jul 17 2023 at 02:08):

unlike memcpy

Brendan Hansknecht (Jul 17 2023 at 02:08):

That's fair, maybe it wouldn't work for simd (though compared to other instructions, simd instructs are quite slow)

Richard Feldman (Jul 17 2023 at 02:09):

sure, but even things like loading into a simd register would need it

Brendan Hansknecht (Jul 17 2023 at 02:09):

Also, if we really want perf, we could have code to actually rewrite the binary at runtime. Then the cost is one 100% predictable jump instruction.

Richard Feldman (Jul 17 2023 at 02:09):

it feels like some amount of hoisting and/or specialization would be necessary

Richard Feldman (Jul 17 2023 at 02:10):

I thought about that, but can we actually do that with a normal executable on every OS?

Brendan Hansknecht (Jul 17 2023 at 02:10):

Oh, I think you are missing a piece, with the opaque simd functions the function you expose would operate on arrays of data, not single values

Richard Feldman (Jul 17 2023 at 02:10):

I thought to be able to write to executable memory you need special permissions nowadays

Brendan Hansknecht (Jul 17 2023 at 02:12):

Simd on single values is almost never worth it. When you go back and forth between simd and normal code, it often just thermal throttles processes and you lose gains

Brendan Hansknecht (Jul 17 2023 at 02:12):

But all of this, I would just label as worth thoroughly testing. I would not label it as definitely a good plan.

Richard Feldman (Jul 17 2023 at 02:13):

that's fair - it does seem like the easiest to test, if nothing else

Richard Feldman (Jul 17 2023 at 02:14):

also I suppose a potential option is to say we don't support it on arm32 Linux if that's the only one that doesn't support detecting it as a CPU instruction

Last updated: Jul 23 2026 at 13:15 UTC