I'm trying to add some debug printing to the List.appendUnsafe
builtin. If my understanding is correct, this should be done in the Zig code in compiler/builtins/bitcode/src/list.zig
.
Adding a simple std.debug.print("appendUnsafe called!", .{})
does not work. Compilation fails with the error
'zig failed: /home/qqwy/.asdf/installs/zig/0.9.1/lib/std/os.zig:134:24: error: container 'std.os.system' has no member called 'fd_t'
pub const fd_t = system.fd_t;
This looks like this code is being compiled in some kind of 'no std'-like kind of mode.
What is the proper way to do this?
you need to turn on DEBUG mode in builtins/build.rs
it's because WASM doesn't have that type
Ah, so switching that flag to true essentially disables WASM?
it disables the wasm backend builtins but do you need that for your debugging?
Nope! Just trying to understand how it works. Thanks :+1:
Hmm. Using std.debug.print
results in an immediate segfault on app startup.
Using std.os.write
works, but there is no way to format numbers.
what are you running over? do you have a branch?
Publishing the branch now
https://github.com/rtfeldman/roc/issues/3494
how do you repro the segfault? is it only with the debug.print?
@Ayaz Hafiz Marten noted earlier that there is almost no debug info in the final binary. That used to work with debugir, I believe it worked with llvm13 too. Any idea what might have changed recently? (or am I misremenbering hand and has this not worked for a long time)
Surgical linking? It is default now and doesn't handle debug info
ah, that might be it
good to know
Yep! That is the reason! I have line numbers now :heart_eyes:
Yeah, eventually need to look into linking dwarf debug info.
debugir shouldn't be touching debug symbols from the zig builtins anyway I think, it only aids in debugging the generated LLVM
Seems like there is a double free of a refcounted object (which manifests as invalid read of the refcount -> invalid write of the refcount-> invalid free).
is there a free of the address before the first read?
There is
Interestingly memory is also leaked, depending on how long the input is to this example
(Short inputs 'only' leak, inputs > 23 characters result in the double-free)
Which makes me think that it might be related to the short string optimization
yes almost certainly
Even though 99% of the code manipulates List U8
It is turned from a Str into a List U8 at the very beginning and turned back at the very end.
hold up
are you using the hello world platform?
in that platform's source code, we have
// Write to stdout
if (write(1, str_bytes, str_len) >= 0) {
// Writing succeeded!
// NOTE: the string is a static string, read from in the binary
// if you make it a heap-allocated string, it'll be leaked here
return 0;
} else {
printf("Error writing to stdout: %s\n", strerror(errno));
// NOTE: the string is a static string, read from in the binary
// if you make it a heap-allocated string, it'll be leaked here
return 1;
}
Ah!
That would explain the leaking
But not the memory corruption
yeah, so what I usually do is use the platform-switching
platform and pick the zig one there
The sole reason I'm using hello-world
while developing the parser rather than cli-platform
is that that one currently does not compile :sweat:
I'll try to use the zig one :+1:
But I have to leave for a couple of hours first
that platform also has a convenient const DEBUG: bool = false;
Can you make an issue for the cli-platform @Qqwy / Marten?
I got it down to
main =
when manyImpl [48u8] [] is
_ -> "done"
manyImpl = \input, vals ->
{ before: start, others: inputRest } = List.split input 1
when List.get start 0 is
Err _ ->
Ok { val: vals, input: input }
Ok _startCodepoint ->
manyImpl inputRest []
the allocation of [48u8]
does not get freed, really not sure why
RC just loses track of it
no decrement for it is emitted
Anton said:
Can you make an issue for the cli-platform Qqwy / Marten?
It was this one: https://github.com/rtfeldman/roc/issues/3438
Folkert de Vries said:
no decrement for it is emitted
Really odd that the opposite (double free) happens for large inputs.
often the one snowballs into the other
it just means RC is wrong and then all bets are off
got it
when the slice we end up making is empty, the logic is wrong
just need to turn sublist
into
sublist : List elem, { start : Nat, len : Nat } -> List elem
sublist = \list, config ->
if config.len == 0 then
[]
else
sublistLowlevel list config.start config.len
in List.roc
Even with this fix in place, the larger example still fails with a use-after-free :sad:
can you minimize the example again?
I'm not sure; when I tried minimizing it further before, Roc no longer used refcounting and then the problem disappears
so long as you use lists it should just be using RC
I managed to make the example shorter
# Use after free. List has 24 elems.
# Making list 1 elem shorter resolves problem (short string optimization?)
main =
[65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,65u8,]
|> Str.fromUtf8
|> Result.withDefault "(not used)"
This breaks also in the REPL
And obviously also with Valgrind
I think that Str.fromUtf8
encounters a problem iff the resulting string is 'big'
In the REPL, the output becomes "%v����AAAAAAAAAAAAAAAA" : Str
. (Deterministic because of WASM?)
And running the binary outside of valgrind gives arbitrary output for the first 8 characters of the 'hello world' example.
got that one too, now there is just a memory leak that remains
Wait you already fixed this one somehow?
for the small string cases
Do I need to update the current branch with trunk or something?
yeah it's an RC thing where you flip a bool and it works
oh, it's not on trunk
just locally
I can make a branch, then you can cherry-pick to get unblocked
:+1:
see the string-memory-problems
branch
eww that has a build error,
allright, all good. The example from the issue now compiles without valgrind errors
OK, :cherries: :ok: time
Your fix resolves the problem. I only see the one leak from the hello-world platform now
Qqwy / Marten said:
In the REPL, the output becomes
"%v����AAAAAAAAAAAAAAAA" : Str
. (Deterministic because of WASM?)
If you mean the web REPL, which runs in Wasm, then be aware that on a 32-bit platform, short strings are <=11 bytes rather than <=23
Not sure if that affects your conclusions or not
No, I meant the command-line roc repl
Does that one use WASM or LLVM?
LLVM
though given the speed of our wasm backend now, using wasm there might not be a bad idea
eventually we'd go for just assembly I think, but that backend is fairly immature
Last updated: Jul 05 2025 at 12:14 UTC