Stream: compiler development

Topic: Valgrind failure on musl


view this post on Zulip Dan G Knutson (Feb 04 2026 at 16:23):

I need a hand figuring out this valgrind failure:
https://github.com/roc-lang/roc/actions/runs/21645888338/job/62398322030?pr=9083

The claim from claude is that this is related to a DWARF generation bug that loses the function names.
I thought this was related to or triggered by compiling the new glue platform as part of the build. If I understand right, that should be impossible now on the branch, because now it 'should' be using the committed glue host lib for musl. I may be misunderstanding the Zig build system, and I feel like I definitely don't understand the problem causing the '???' function names in valgrind.

view this post on Zulip Anton (Feb 04 2026 at 16:51):

I have also had issues with ??? in the past but I forgot how I solved that. Are you able to reproduce the valgrind error locally?

view this post on Zulip Dan G Knutson (Feb 04 2026 at 17:19):

It looks like it's reproducing locally with:
zig build -Doptimize=ReleaseFast -Dcpu=x86_64_v3 -Dtarget=x86_64-linux-musl
./ci/custom_valgrind.sh ./zig-out/bin/roc --no-cache test/str/app.roc

view this post on Zulip Anton (Feb 04 2026 at 17:32):

Great, I recommend starting with giving that to Claude and see if it can fix it.

view this post on Zulip Dan G Knutson (Feb 04 2026 at 21:36):

I can continue doing that :sweat_smile:. It was pointing to a Zig bug and suggesting broader valgrind suppression, so that made me think I should ask here. I don't feel qualified to say whether the stuff it wants to add to the valgrind suppression is reasonable or not.

view this post on Zulip Anton (Feb 06 2026 at 07:57):

Can you paste Claude's justification for the suppression here?

view this post on Zulip Dan G Knutson (Feb 06 2026 at 17:22):

So, initially it was suggesting object-based suppression since there was no debug info to allow function-based. That seems completely unworkable, as far as actually using valgrind to catch errors goes. It seems like it's just a way to subvert CI.

It seems like part of the issue may be the specific build config used by the snap version of valgrind. Switching off of the snap build of valgrind to either the older version from apt or a built-from-source version allows using function-based suppression for the original musl alloc thing I was running into, which claude claims (and I believe) is a false positive.

On a recent, default-configuration build of valgrind, used in the most recent build on the giesch/roc-glue branch, we're getting a new valgrind failure that seems like an actual error (not related to basic musl behavior).

The context I'm hoping to gain by asking here is how to dig into valgrind errors in the compiler, or the history of working around this apparent Zig DWARF bug. It would also help to know if ubuntu 22 or the specific version of valgrind is important. I don't get these valgrind errors on my local ubuntu-24-based OS with a recent default valgrind. I'm fuzzy on the details, but it seems possible that there's some kind of mismatch between the recent valgrind and the older kernel version.

view this post on Zulip Anton (Feb 06 2026 at 18:16):

On a recent, default-configuration build of valgrind

With "default-configuration" do you mean the way we currently use it on CI on the main branch?

view this post on Zulip Anton (Feb 06 2026 at 18:21):

I don't get these valgrind errors on my local ubuntu-24-based OS with a recent default valgrind.

You do need the same flags to build Roc as on CI, just zig build roc can yield different results. And, yes ubuntu and valgrind version matter. Anyway, I will try to reproduce this on my machine, and take a look

view this post on Zulip Anton (Feb 06 2026 at 18:49):

Hmm ./zig-out/bin/snapshot --debug --verbose is not giving me any output...

view this post on Zulip Anton (Feb 06 2026 at 18:54):

I do get all debug output on macos using your branch, very strange :thinking:

view this post on Zulip Anton (Feb 06 2026 at 18:55):

I need to go, I can pick up on Monday. Are you getting any output with ./zig-out/bin/snapshot --debug --verbose? If not, it makes sense that you are not seeing the valgrind errors.

view this post on Zulip Dan G Knutson (Feb 06 2026 at 18:55):

By 'default configuration', I mean passing no arguments to 'configure' when building valgrind from source (which that particular commit did in CI). The build (of valgrind) used by snap has some level of customization. The lack of debug info seems unique to ubuntu 22 as far as I can tell.

view this post on Zulip Dan G Knutson (Feb 06 2026 at 19:14):

It sounds like maybe we need to set omit_frame_pointer = false in the ReleaseFast build used by valgrind in CI? But that would also be kind of cheating CI, maybe.

view this post on Zulip Dan G Knutson (Feb 06 2026 at 19:47):

It looks like this is the configuration used by snap, but I'm not sure what commit would end up being used by ubuntu 22:
https://github.com/ralight/valgrind-snap/blob/main/snap/snapcraft.yaml

view this post on Zulip Anton (Feb 09 2026 at 10:42):

Building with zig build snapshot -Doptimize=ReleaseFast -Dfuzz -Dsystem-afl=false -Dcpu=x86_64_v3 -Dtarget=x86_64-linux-musl results in no output when running ./zig-out/bin/snapshot --debug --verbose. But I do see the expected output when building with zig build snapshot. So valgrind is not the issue here, I'm looking into a fix...

view this post on Zulip Anton (Feb 09 2026 at 10:46):

This also happens on the main branch.

view this post on Zulip Anton (Feb 09 2026 at 10:53):

Ok that was just: "Zig's std.log.debug is compiled out in ReleaseFast builds." :sweat_smile:

view this post on Zulip Anton (Feb 09 2026 at 14:27):

Dan G Knutson said:

It looks like it's reproducing locally with:
zig build -Doptimize=ReleaseFast -Dcpu=x86_64_v3 -Dtarget=x86_64-linux-musl
./ci/custom_valgrind.sh ./zig-out/bin/roc --no-cache test/str/app.roc

I was able to reproduce this on the main branch as well.

view this post on Zulip Anton (Feb 09 2026 at 14:41):

Adding -Dstrip=false during the build gives a clean trace.

view this post on Zulip Anton (Feb 09 2026 at 14:48):

That specific valgrind error was indeed a false positive, pushing a fix soon...

view this post on Zulip Anton (Feb 09 2026 at 16:24):

PR#9167

view this post on Zulip Anton (Feb 09 2026 at 18:33):

Fixed in PR#9169 :tada:

view this post on Zulip Anton (Feb 09 2026 at 18:35):

I'll merge the fix into your branch tomorrow @Dan G Knutson, I want to get #9167 on main first so I don't make a mess of the branches.


Last updated: Feb 20 2026 at 12:27 UTC