I'm happy to report that in my experiment making chompTrivia
faster using SIMD (https://github.com/roc-lang/roc/pull/7815)... I think I may have made it slower.
12:06:41 ~/Documents/GitHub/roc simd_chomp_trivia $ zig build test_chomp_perf -Dsimd=false
=== chompTrivia Performance Test ===
SIMD Enabled: false
SIMD Width: 16 bytes
Testing with src/BigFile.roc
Time (µs) | Throughput (MB/s) | SIMD Chunks | File Size (bytes)
------------------------------------------------------------
9.78e3 | 2.32e1 | 0 | 226943
12:08:08 ~/Documents/GitHub/roc simd_chomp_trivia $ zig build test_chomp_perf -Dsimd-width=8
=== chompTrivia Performance Test ===
SIMD Enabled: true
SIMD Width: 8 bytes
Testing with src/BigFile.roc
Time (µs) | Throughput (MB/s) | SIMD Chunks | File Size (bytes)
------------------------------------------------------------
1.06e4 | 2.14e1 | 371 | 226943
:shrug:
It was a fun experiment... I definitely need to rethink my approach though if this has any chance of landing someday.
Not sure it will help here, Simd width likely should be bigger than 8 bytes
Probably 32?
Anyway, likely your are hiting fallback cases to much and need some sort of better handling of index of specific characters instead of doing it byte by byte
Thank you for the comments Brendan. I'm a SIMD noob, so learning by doing on this one.
I simplified the problem a little and have been noodling around with it a bit.
I've managed to get some things working ok, but there are a lot of edge cases that need to be handled or really thought through. Using a fixed (compile time) width SIMD chunk makes things a little challenging.
Here is an example of where I got to today. It's definitely not doing everything we want, but compiles and passes a bunch of simple test cases. I can't tell you if it's any faster than the while
loop or anything, I'm just exploring how to write SIMD things like this with Zig and thought I'd share in case anyone is interested.
// Return the indent after the first newline (if any) or null if no newline is found
pub fn chompTrivia(buf: []const u8) ?u16 {
var first_newline_idx: ?u16 = null;
const Vec = @Vector(SIMD_WIDTH, u8);
const ones = @as(Vec, @splat(1));
const zeros = @as(Vec, @splat(0));
const chunk: Vec = buf[0..SIMD_WIDTH][0..SIMD_WIDTH].*;
// Find the index of the first newline in the chunk
{
// Create a vector filled with newline characters
const newline_needle = @as(Vec, @splat('\n'));
// Compare each byte with '\n', resulting in a vector of booleans
const matches = chunk == newline_needle;
// Convert the boolean vector to a bitmask (where each true becomes a bit)
const mask = @as(u16, @bitCast(@select(u1, matches, ones, zeros)));
// If we found any newlines, return the position of the first one
if (mask != 0) {
first_newline_idx = @ctz(mask);
}
}
// If no newline found, return null
if (first_newline_idx == null) {
return null;
}
// Get index of position after newline
const idx = first_newline_idx.? + 1;
if (idx >= SIMD_WIDTH or idx >= buf.len) {
return 0; // No space after newline
}
// Create masks for spaces and tabs
const space_matches = chunk == @as(Vec, @splat(' '));
const tab_matches = chunk == @as(Vec, @splat('\t'));
// Find the first non-space, non-tab character after the newline
const space_mask = @select(u8, space_matches, ones, zeros);
const tab_mask = @select(u8, tab_matches, ones, zeros);
const trivia_mask = space_mask | tab_mask;
const non_trivia = ~trivia_mask;
// Create a vector with indices to handle position-based operations
var indices: [SIMD_WIDTH]u8 = undefined;
for (0..SIMD_WIDTH) |i| {
indices[i] = @intCast(i);
}
const index_vec = @as(Vec, indices);
// Create masks for the range after the newline
const after_newline = index_vec >= @as(Vec, @splat(@as(u8, @intCast(idx))));
const after_newline_u8 = @select(u8, after_newline, ones, zeros);
const non_trivia_after_newline = non_trivia & after_newline_u8;
const non_trivia_mask = @as(u16, @bitCast(@select(u1, non_trivia_after_newline != zeros, ones, zeros)));
// Calculate the ending position (first non-trivia after newline or end of chunk)
const end_pos = if (non_trivia_mask != 0)
@ctz(non_trivia_mask)
else
SIMD_WIDTH;
// Create a mask for the valid range (after newline and before the first non-trivia)
const before_end = index_vec < @as(Vec, @splat(end_pos));
const valid_range = @select(u8, after_newline, ones, zeros) & @select(u8, before_end, ones, zeros);
// Apply the masks to get valid spaces and tabs
const valid_spaces = @select(u8, space_matches, ones, zeros) & valid_range;
const valid_tabs = @select(u8, tab_matches, ones, zeros) & valid_range;
// Count the number of spaces and tabs using popcount on bitmasks
const spaces_mask = @as(u16, @bitCast(@select(u1, valid_spaces != zeros, ones, zeros)));
const tabs_mask = @as(u16, @bitCast(@select(u1, valid_tabs != zeros, ones, zeros)));
const spaces_count = @popCount(spaces_mask);
const tabs_count = @popCount(tabs_mask);
// Calculate the indentation (1 for each space, 4 for each tab)
return spaces_count + (tabs_count * 4);
}
const chunk: Vec = buf[0..SIMD_WIDTH][0..SIMD_WIDTH].*;
This line looks off
I think this should be an inline for
:
for (0..SIMD_WIDTH) |i| {
I think that line is ok. I got if from the zig docs...
// To extract a comptime-known length from a runtime-known offset,
// first extract a new slice from the starting offset, then an array of
// comptime-known length
const vec3: @Vector(2, f32) = slice[offset..][0..2].*;
why [0..SIMD_WIDTH]
twice?
Also, yeah, getting simd right is hard. It is easy to use simd and get worse perf.
Last updated: Jul 06 2025 at 12:14 UTC