Stream: show and tell

Topic: roc-base64


view this post on Zulip Artur Domurad (Jul 09 2024 at 20:30):

I needed base64 decoding for my r2e (e2e framework) to save base64 encoded pdfs and screenshots from browser to disk.

I could not find any base64 lib for roc, so I did a basic implementation:
https://github.com/adomurad/roc-base64

The implementation is quite slow - but seems to work -and it is enough for me for now.

BUT I have noticed that it is really slow when running with a not optimized build.

e.g. decoding a png file (254KB of base64 text on disk / 191KB after decoding on disk)

when running with a binary from roc build decode.roc:

➜ time ./decode
./decode 5.17s user 0.00s system 99% cpu 5.172 total

when running with a binary from roc build decode.roc --optimize:

➜ time ./decode
./decode  0.01s user 0.00s system 98% cpu 0.011 total

Here is a "one file repro" if someone wants to look into this:
https://github.com/adomurad/roc-base64/blob/main/temp/decode.roc

I'm using a big when .. is expression to lookup indexes for the 64 characters.
When I tried to use a Dict it went even slower.
unoptimized:

./decodeDict  18.95s user 1.67s system 99% cpu 20.626 total

optimized:

./decodeDict  1.15s user 1.46s system 99% cpu 2.606 total

And here is the implementation with a Dict:
https://github.com/adomurad/roc-base64/blob/main/temp/decodeDict.roc

If someone has a faster implementation - please share :pray:

view this post on Zulip Kiryl Dziamura (Jul 09 2024 at 20:42):

There’s an implementation in roc’s benchmarks. Not sure how good it is tho :big_smile:

https://github.com/roc-lang/roc/tree/main/crates/cli/tests/benchmarks/Base64

view this post on Zulip Artur Domurad (Jul 09 2024 at 20:53):

Thanks :)
I will try it tommorow.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 20:57):

Dict has a perf issue with refcounted data (like strings or nested lists) currently. I have most of a fix, but have been block on getting it to work with the dev backends. So that is probably why it is so much slower

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 20:57):

As for when .. is I would guess that an unoptimized build might be generating a giant if/else chain.

view this post on Zulip Brendan Hansknecht (Jul 09 2024 at 20:58):

Definitely something I can look into later to verify

view this post on Zulip Agus Zubiaga (Jul 10 2024 at 14:45):

I haven't tested it but I imagine doing some math would be faster (at least in a dev build):

decodeChar = \char ->
    if char >= 'a' then
        char - 'a' + 26
    else if char >= 'A' then
        char - 'A'
    else if char >= '0' then
        char - '0' + 52
    else if char == '+' then
        62
    else if char == '/' then
        63
    else
        0 # ignore

view this post on Zulip Agus Zubiaga (Jul 10 2024 at 14:57):

(This assumes it will only get valid chars, you can add more guards if that isn't the case)

view this post on Zulip Artur Domurad (Jul 10 2024 at 15:53):

I have tried that yesterday, and using a dev build it is:

./decode2  4.05s user 0.00s system 99% cpu 4.054 total

so ~1s faster than the when .. is expression.

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 16:42):

Do you have files/browser.base64.txt? I'm curious to run the same file and look at the flamegraph.

Also, your basic-cli url doesn't seem to exist anymore (I guessing this is due to depending on the 0.12.0 prerelease)

view this post on Zulip Artur Domurad (Jul 10 2024 at 17:04):

Here is the file:
browser.base64.txt
It is a screenshot of roc-lang.org :grinning_face_with_smiling_eyes:

And I did notice today my mistake of depending on a prerelease as my examples stoped working.

view this post on Zulip Anton (Jul 10 2024 at 17:26):

I did notice today my mistake of depending on a prerelease as my examples stoped working.

I added an extra visible warning to the pre-release notes to prevent this in the future

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 17:28):

Interesting. The problem is 100% List.chunksOf.

This should fix debug perf:

decodeList = \list, state ->
    when list is
        [a, b, c, d, .. as rest] ->
            b1 = (0u32 |> Num.shiftLeftBy 6) + (decodeChar a)
            b2 = (b1 |> Num.shiftLeftBy 6) + (decodeChar b)
            b3 = (b2 |> Num.shiftLeftBy 6) + (decodeChar c)
            b4 = (b3 |> Num.shiftLeftBy 6) + (decodeChar d)

            c1 = b4 |> Num.shiftRightBy 16 |> Num.bitwiseAnd 0xff |> Num.toU8
            c2 = b4 |> Num.shiftRightBy 8 |> Num.bitwiseAnd 0xff |> Num.toU8
            c3 = b4 |> Num.shiftRightBy 0 |> Num.bitwiseAnd 0xff |> Num.toU8

            next = List.concat state [c1, c2, c3]
            decodeList rest next

        [] ->
            state

        _ -> crash "Base64.decode: this error should not be possible"

decode : List U8 -> List U8
decode = \bytes ->
    # pad missing "="
    list = bytes |> padMissingChars

    # calc final padding chars
    paddingLen = calculatePadding list

    # Note: Totally would be best to use `List.withCapacity` here.
    decodedList = decodeList list []

    decodedList |> List.dropLast paddingLen

view this post on Zulip Artur Domurad (Jul 10 2024 at 17:38):

wow, it is almost as fast as the optimized version now.
Thanks!

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 17:44):

Yeah, definitely something we should debug more. I definitely dont think the difference should be that stark. Oh, though maybe the chunks list of lists is hitting the refcounting perf bug. Not sure though

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 18:09):

Yeah, 100% the refcounting issue. Perf is also fixed by the list-size-on-heap branch if I can ever manage to land it.

Summary
  ./ref-fix-opt ran
    1.04 ± 0.08 times faster than ./main-opt
    1.07 ± 0.28 times faster than ./main-no-chunksOf-opt
    1.68 ± 0.28 times faster than ./main-no-chunksOf
    2.25 ± 0.14 times faster than ./ref-fix
 1861.04 ± 122.30 times faster than ./main

view this post on Zulip Brendan Hansknecht (Jul 10 2024 at 18:32):

Also, just to clarify, I would expect avoiding chunkOf to generally be the better design even after the perf bug is fixed. This is seen when benchmarking at a really high sample size (2000 executions in this case) to get a more accurate difference in perf:

Summary
  ./main-no-chunksOf-opt ran
    1.15 ± 0.07 times faster than ./ref-fix-opt
    1.18 ± 0.07 times faster than ./main-opt
    1.84 ± 0.08 times faster than ./main-no-chunksOf
    2.62 ± 0.11 times faster than ./ref-fix

Last updated: Jul 06 2025 at 12:14 UTC