Stream: compiler development

Topic: zig compiler - SIMD tokenize exploration


view this post on Zulip Luke Boswell (Jun 05 2025 at 02:09):

I'm happy to report that in my experiment making chompTrivia faster using SIMD (https://github.com/roc-lang/roc/pull/7815)... I think I may have made it slower.

12:06:41 ~/Documents/GitHub/roc simd_chomp_trivia $ zig build test_chomp_perf -Dsimd=false
=== chompTrivia Performance Test ===
SIMD Enabled: false
SIMD Width: 16 bytes

Testing with src/BigFile.roc
Time (µs) | Throughput (MB/s) | SIMD Chunks | File Size (bytes)
------------------------------------------------------------
   9.78e3 |            2.32e1 |           0 |            226943
12:08:08 ~/Documents/GitHub/roc simd_chomp_trivia $ zig build test_chomp_perf -Dsimd-width=8
=== chompTrivia Performance Test ===
SIMD Enabled: true
SIMD Width: 8 bytes

Testing with src/BigFile.roc
Time (µs) | Throughput (MB/s) | SIMD Chunks | File Size (bytes)
------------------------------------------------------------
   1.06e4 |            2.14e1 |         371 |            226943

:shrug:

view this post on Zulip Luke Boswell (Jun 05 2025 at 02:13):

It was a fun experiment... I definitely need to rethink my approach though if this has any chance of landing someday.

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 02:35):

Not sure it will help here, Simd width likely should be bigger than 8 bytes

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 02:36):

Probably 32?

view this post on Zulip Brendan Hansknecht (Jun 05 2025 at 02:41):

Anyway, likely your are hiting fallback cases to much and need some sort of better handling of index of specific characters instead of doing it byte by byte

view this post on Zulip Luke Boswell (Jun 05 2025 at 03:25):

Thank you for the comments Brendan. I'm a SIMD noob, so learning by doing on this one.

view this post on Zulip Luke Boswell (Jun 05 2025 at 07:55):

I simplified the problem a little and have been noodling around with it a bit.

I've managed to get some things working ok, but there are a lot of edge cases that need to be handled or really thought through. Using a fixed (compile time) width SIMD chunk makes things a little challenging.

Here is an example of where I got to today. It's definitely not doing everything we want, but compiles and passes a bunch of simple test cases. I can't tell you if it's any faster than the while loop or anything, I'm just exploring how to write SIMD things like this with Zig and thought I'd share in case anyone is interested.

// Return the indent after the first newline (if any) or null if no newline is found
pub fn chompTrivia(buf: []const u8) ?u16 {
    var first_newline_idx: ?u16 = null;
    const Vec = @Vector(SIMD_WIDTH, u8);
    const ones = @as(Vec, @splat(1));
    const zeros = @as(Vec, @splat(0));

    const chunk: Vec = buf[0..SIMD_WIDTH][0..SIMD_WIDTH].*;

    // Find the index of the first newline in the chunk
    {
        // Create a vector filled with newline characters
        const newline_needle = @as(Vec, @splat('\n'));

        // Compare each byte with '\n', resulting in a vector of booleans
        const matches = chunk == newline_needle;

        // Convert the boolean vector to a bitmask (where each true becomes a bit)
        const mask = @as(u16, @bitCast(@select(u1, matches, ones, zeros)));

        // If we found any newlines, return the position of the first one
        if (mask != 0) {
            first_newline_idx = @ctz(mask);
        }
    }

    // If no newline found, return null
    if (first_newline_idx == null) {
        return null;
    }

    // Get index of position after newline
    const idx = first_newline_idx.? + 1;
    if (idx >= SIMD_WIDTH or idx >= buf.len) {
        return 0; // No space after newline
    }

    // Create masks for spaces and tabs
    const space_matches = chunk == @as(Vec, @splat(' '));
    const tab_matches = chunk == @as(Vec, @splat('\t'));

    // Find the first non-space, non-tab character after the newline
    const space_mask = @select(u8, space_matches, ones, zeros);
    const tab_mask = @select(u8, tab_matches, ones, zeros);
    const trivia_mask = space_mask | tab_mask;
    const non_trivia = ~trivia_mask;

    // Create a vector with indices to handle position-based operations
    var indices: [SIMD_WIDTH]u8 = undefined;
    for (0..SIMD_WIDTH) |i| {
        indices[i] = @intCast(i);
    }
    const index_vec = @as(Vec, indices);

    // Create masks for the range after the newline
    const after_newline = index_vec >= @as(Vec, @splat(@as(u8, @intCast(idx))));
    const after_newline_u8 = @select(u8, after_newline, ones, zeros);
    const non_trivia_after_newline = non_trivia & after_newline_u8;
    const non_trivia_mask = @as(u16, @bitCast(@select(u1, non_trivia_after_newline != zeros, ones, zeros)));

    // Calculate the ending position (first non-trivia after newline or end of chunk)
    const end_pos = if (non_trivia_mask != 0)
        @ctz(non_trivia_mask)
    else
        SIMD_WIDTH;

    // Create a mask for the valid range (after newline and before the first non-trivia)
    const before_end = index_vec < @as(Vec, @splat(end_pos));
    const valid_range = @select(u8, after_newline, ones, zeros) & @select(u8, before_end, ones, zeros);

    // Apply the masks to get valid spaces and tabs
    const valid_spaces = @select(u8, space_matches, ones, zeros) & valid_range;
    const valid_tabs = @select(u8, tab_matches, ones, zeros) & valid_range;

    // Count the number of spaces and tabs using popcount on bitmasks
    const spaces_mask = @as(u16, @bitCast(@select(u1, valid_spaces != zeros, ones, zeros)));
    const tabs_mask = @as(u16, @bitCast(@select(u1, valid_tabs != zeros, ones, zeros)));
    const spaces_count = @popCount(spaces_mask);
    const tabs_count = @popCount(tabs_mask);

    // Calculate the indentation (1 for each space, 4 for each tab)
    return spaces_count + (tabs_count * 4);
}

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 01:01):

const chunk: Vec = buf[0..SIMD_WIDTH][0..SIMD_WIDTH].*;

This line looks off

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 01:02):

I think this should be an inline for:
for (0..SIMD_WIDTH) |i| {

view this post on Zulip Luke Boswell (Jun 06 2025 at 01:03):

I think that line is ok. I got if from the zig docs...

    // To extract a comptime-known length from a runtime-known offset,
    // first extract a new slice from the starting offset, then an array of
    // comptime-known length
    const vec3: @Vector(2, f32) = slice[offset..][0..2].*;

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 01:03):

why [0..SIMD_WIDTH] twice?

view this post on Zulip Brendan Hansknecht (Jun 06 2025 at 01:04):

Also, yeah, getting simd right is hard. It is easy to use simd and get worse perf.


Last updated: Jul 06 2025 at 12:14 UTC