so it seems safe to assume that when we add SIMD support to Roc, there will be demand for the following scenarios:
those last two options seem pretty straightforward to support, but the first one seems tricky
like one option is to try to have some sort of Simd.support pure function which tells you if it's supported. By default I strongly prefer to avoid things that fit the description "is a pure function that returns a totally different value depending on what system is running on" so I'd rather not do that
we could make that be a Task instead, but then it becomes potentially clunky to use in various situations because it's common to want to use simd in pure functions - so you'd need to detect it and then thread it through probably.
This might especially be a problem for libraries, which might end up accepting configuration parameters for whether or not to use simd, which doesn't sound great - and kind of defeats the purpose of having a Simd abstraction that's capable of doing graceful fallbacks
another issue with that idea is that it seems like sometimes detecting simd requires syscalls (specifically on arm32 Linux) which is not something Roc builtins do today. So we might need to make it another "platforms expose a function which tells us what's supported" thing
another approach is to say: let platform authors request which simd options are supported, and then compile a different specialized version of the application's entrypoints for each requested simd feature set, and let the platform call the appropriate one depending on its own process for determining what's available at runtime
that has its own set of problems, one of which is that this feels like something the application author should control rather than the platform author. (It's very related to --target)
Personally I would err towards testing solutions where the simd size is opaque to roc.
I am not full convinced they will turn out well, but that seems to be what is being pushed for and tested by a number of modern languages and developers who have messed with simd a lot
so I like that in theory, but then there's a problem of: when do we do the runtime detection?
The question is if we can get almost all of the gains with a lot more simplicity of the users
like assuming we have the answer of what's supported already in memory somewhere, there has to be a branch which checks it at runtime. Where does that branch go?
Probably would just do it lazily on the first call to a simd function. We already do that for memcpy.
ok but now every single simd function has an extra memory load and branch attached to it? :grimacing:
Essentially the same way a call to libc or any shared library works, but for simd.
right but these are supposed to be fast arithmetic functions haha
unlike memcpy
That's fair, maybe it wouldn't work for simd (though compared to other instructions, simd instructs are quite slow)
sure, but even things like loading into a simd register would need it
Also, if we really want perf, we could have code to actually rewrite the binary at runtime. Then the cost is one 100% predictable jump instruction.
it feels like some amount of hoisting and/or specialization would be necessary
I thought about that, but can we actually do that with a normal executable on every OS?
Oh, I think you are missing a piece, with the opaque simd functions the function you expose would operate on arrays of data, not single values
I thought to be able to write to executable memory you need special permissions nowadays
Simd on single values is almost never worth it. When you go back and forth between simd and normal code, it often just thermal throttles processes and you lose gains
But all of this, I would just label as worth thoroughly testing. I would not label it as definitely a good plan.
that's fair - it does seem like the easiest to test, if nothing else
also I suppose a potential option is to say we don't support it on arm32 Linux if that's the only one that doesn't support detecting it as a CPU instruction
Last updated: Jun 16 2026 at 16:19 UTC