Having a crack at this and I'm a little stuck. Just wondering if there is something obvious I'm missing here.
no method named `is_refcounted` found for reference `&ManuallyDrop<T>` in the current scope
found the following associated functions; to be used as methods, functions must have a `self` parameter
impl<T, E> RocRefcounted for RocResult<T, E>
where
T: RocRefcounted,
E: RocRefcounted,
{
fn inc(&mut self) {
match self.as_result_of_refs() {
Ok(payload) if payload.is_refcounted() => {
payload.inc();
}
Err(payload) if payload.is_refcounted() => {
payload.inc();
}
_ => {}
}
}
fn dec(&mut self) {
todo!()
}
fn is_refcounted() -> bool {
todo!()
}
}
I think I'm closer...
impl<T, E> RocRefcounted for RocResult<T, E>
where
T: RocRefcounted,
E: RocRefcounted,
{
fn inc(&mut self) {
unsafe {
match self.tag {
RocResultTag::RocOk => (*self.payload.ok).inc(),
RocResultTag::RocErr => (*self.payload.err).inc(),
}
}
}
fn dec(&mut self) {
unsafe {
match self.tag {
RocResultTag::RocOk => (*self.payload.ok).dec(),
RocResultTag::RocErr => (*self.payload.err).dec(),
}
}
}
fn is_refcounted() -> bool {
unsafe {
match self.tag {
RocResultTag::RocOk => (self.payload.ok).is_refcounted(),
RocResultTag::RocErr => (self.payload.err).is_refcounted(),
}
}
}
}
Screenshot-2024-07-14-at-21.32.50.png
pub trait RocRefcounted {
/// Increments the refcount.
fn inc(&mut self);
/// Decrements the refcount potentially freeing the underlying allocation.
fn dec(&mut self);
/// Returns true if the type is actually refcounted by roc.
fn is_refcounted() -> bool;
}
is_refcointed should have a branch
Should be true if either the ok or err is refcounted
fn is_refcounted() -> bool {
T::is_refcounted() || E::is_refcounted()
}
Also need to implement this
impl roc_std::RocRefcounted for SQLiteValue {
fn inc(&mut self) {
unimplemented!();
}
fn dec(&mut self) {
unimplemented!();
}
fn is_refcounted() -> bool {
true
}
}
This was generated by roc glue. I might try hand implementing first.
I think I know where the problem might be for basic-cli...
We have a lot of hand-rolled interface types like the following.
#[repr(C)]
pub struct InternalResponse {
body: RocList<u8>,
metadata: Metadata,
variant: RocStr,
}
#[repr(C)]
pub struct Metadata {
headers: RocList<Header>,
url: RocStr,
statusCode: u16,
statusText: RocStr,
}
Do we need to do/change anything to manage the refcounting in these correctly?
Like if roc gives the host one of these, or the host gives roc one, do we need to use the RocRefcounted fn's at all?
Do we need to do/change anything to manage the refcounting in these correctly?
If they aren't put in lists, nothing should change
So only if you give a List InternalResponse
To/from roc.
SQLiteValue
Must have missed that, I thought it would fall under a struct style tag, but I guess it is a non-recursive tag which needs smarter rules
impl should just be a match statement and then calling inc/dec on the values within it.
OK.
Well, I'm currently porting some of the changes we made in basic-cli across to basic-webserver so there is less glue there and trying to implement these RefCounted things.
I haven't found the issue in basic-cli
I haven't found the issue in basic-cli
I think likely the bug is on the roc internal side and not platform related.
Also, is args.roc the only one with issues?
I'm not sure, it's the first one on the list and fails in the expect script
Ok, figured you can run the expect scripts manually using
$ EXAMPLES_DIR=examples/ expect ci/expect_scripts/args.exp
spawn examples//args div -n 5 -d 20
args(82068,0x201c58c00) malloc: Heap corruption detected, free list is damaged at 0x600000f48080
*** Incorrect guard value: 145982044012671
args(82068,0x201c58c00) malloc: *** set a breakpoint in malloc_error_break to debug
Error: output was different from expected value.
Just ran them all manually, only args.roc
is failing
Args with valgrind, core issue:
# Probably reading heap size on a list allocated without a heap size?
=56308== Invalid read of size 8
==56308== at 0x1CDF43: ??? (roc_app:0)
==56308== by 0x1D1E03: ??? (roc_app:0)
==56308== by 0x1AAAB7: List_walkTryHelp_147f9a29f721bc6a47c431ffbf897368d53af6834eae9416f984da7fc2dd89 (roc_app:0)
# Double free
==56308== Address 0x4b6d4b0 is 0 bytes inside a block of size 9 free'd
==56308== at 0x48469E4: free (in /nix/store/78zxvxfy6klvwfc2s95y71y0b284fd6v-valgrind-3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==56308== by 0x1D4882: roc_dealloc (lib.rs:81)
==56308== by 0x1CE2CA: decrement_refcounted_ptr_8 (roc_app:0)
==56308== by 0x1CE27D: ??? (roc_app:0)
==56308== by 0x18DB36: #Attr_#generic_rc_by_ref_2_dec (roc_app:0)
==56308== by 0x18DB36: Arg.Parser_constructSetOfOptions_3655121d7ba98da90331d38b92032d451b515e695221ee2a9d34aa692cd2c (roc_app:0)
# Use after free
==56308== Invalid write of size 8
==56308== at 0x1CDF64: ??? (roc_app:0)
==56308== by 0x1D1E03: ??? (roc_app:0)
==56308== by 0x1AAAB7: List_walkTryHelp_147f9a29f721bc6a47c431ffbf897368d53af6834eae9416f984da7fc2dd89 (roc_app:0)
==56308== by 0x194BAE: List_walkTry_1d346aa605d9beef3d3c8b59d397858b899a3c7efdcede8d0efd4953334f936 (roc_app:0)
So almost certainly a refcounting bug of some sort.
Just need to figure out how to debug it.
Add some sort of minimal repro to gen_refcount
All the cli parser combinators are pretty complex, so would prefer something more minimal....
I think I've found a more minimal repro, but some reason I can't get it to run as a gen_refcount
test. So still need to mess with debugging more.
Much more minimal repro of at least one refcounting issue:
app [main] {
pf: platform "../platform/main.roc",
}
import pf.Stdout
import pf.Task exposing [Task]
Arg : [
Short Str,
ShortGroup { names : List Str, complete : [Complete, Partial] },
Long { name : Str, value : Result Str [NoValue] },
Parameter Str,
]
constructSetOfOptions : Str -> Arg
constructSetOfOptions = \combined ->
options =
combined
|> Str.toUtf8
|> List.keepOks \c -> Str.fromUtf8 [c]
when options is
[alone] -> Short alone
other -> ShortGroup { names: other, complete: Complete }
main =
["-d"]
|> List.map \x ->
when Str.splitFirst x "-" is
Ok {after} ->
constructSetOfOptions after
Err _ ->
Parameter "err"
|> List.len
|> Num.toStr
|> Stdout.line
Seems like the bug is somehow with pattern matching extracting a value...no idea how though. Maybe I am missing something:
constructSetOfOptions : Str -> Arg
constructSetOfOptions = \combined ->
options =
combined
|> Str.toUtf8
|> List.keepOks \c -> Str.fromUtf8 [c]
dbg options
when options is
[alone] ->
dbg alone
Short alone
other -> ShortGroup { names: other, complete: Complete }
main =
["-d"]
|> List.map \x ->
when Str.splitFirst x "-" is
Ok {after} ->
out = constructSetOfOptions after
dbg out
out
Err _ ->
Parameter "err"
|> List.len
|> Num.toStr
|> Stdout.line
[examples/args.roc:22] options = ["d"]
[examples/args.roc:25] alone = "d"
[examples/args.roc:35] out = (Short "๏ฟฝ")
Or maybe treating something as a large string/slice incorrectly.
Oh, interesting, I think the issue might actually be pattern matching directly... (I guess I don't know how pattern matching does refcounting).
This is fine:
if List.len options == 1 then
when List.get options 0 is
Ok alone -> Short alone
_ -> crash "impossible"
else
ShortGroup { names: options, complete: Complete }
Ok even more minimal:
main =
in = [Str.fromUtf8 ['d']]
out =
when in is
[alone] -> [alone]
other -> other
out
|> List.len
|> Num.toStr
|> Stdout.line
Ok, so I've done a fair bit of refactoring for basic-webserver for these changes. It looks like everything is happy, the platform and all the examples check and build ok, but whenever I run the server and actually call into roc I get the following error from inside Roc.
thread 'tokio-runtime-worker' panicked at crates/roc_host/src/roc.rs:42:5:
The Roc app crashed with: NoImplementationNamed { def_symbol: `38.IdentId(2)` }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at crates/roc_host/src/http_server.rs:87:26:
called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(10), ...)
Never mind, I found the thing that causes that is when you have a type annotation but have provided the implementation. Oops on my part.
Ok, so I think I've completed the refactor for basic-webserver. I added all the changes to the refactor-host PR. :tada:
I have some failing expect scripts, but I can verify it's behaving correctly manually... so I think it's just a macos versus linux CI thing.
Before this will pass in CI it also needs a new nightly with the Refcount changes, and a release of basic-cli. But otherwise looks like everything is running ok on my macos.
I can also test it on my linux machine later.
I've been testing the updated/refactored basic-webserver with the demo web app I've been working on, so am pretty confident it's working well.
The RC bug looks to be llvm specific (which makes sense, cause llvm rolls its own rc while other backends generate refcount procs).
Yep, a use after free for sure in llvm. The list ['d']
was converted to a Str
, but still got freed anyway.
+ allocated u8@60000166c040 (1 bytes with alignment 8)
+ allocated u8@600001a600c0 (32 bytes with alignment 8)
decref list: (8,32,true,fn(?[*]u8) callconv(.C) void@1049a9420)
freeing underlying elements from: (u8@600001a600d0, 1)
| decrement isize@60000166c040: 1 - 1 = 0!
๐ freed u8@60000166c040
| decrement isize@600001a600c8: 1 - 1 = 0!
๐ freed u8@600001a600c0
+ allocated u8@600001a600c0 (32 bytes with alignment 8)
decref list: (8,32,true,fn(?[*]u8) callconv(.C) void@1049a9420)
freeing underlying elements from: (u8@600001a600d0, 1)
| decrement isize@60000166c040: 9223523571393151041 - 1 = 9223523571393151040!
| decrement isize@600001a600c8: 1 - 1 = 0!
๐ freed u8@600001a600c0
60000166c040
specifically
Is this output from a tool?
Debug prints in our zig builtins
Some behind flags that already exist. A few new.
Testing the changes on my x64 linux machine -- in short basic-webserver refactor host branch is good to go.
The above free issue prevents us from using basic-cli.
08:44:37 ~/Documents/Github/basic-cli refactor-host$ roc --prebuilt-platform examples/args.roc -- div 5 2
free(): invalid next size (fast)
This prevents us from using the build script for basic-webserver as it uses argument parsing.
08:44:11 ~/Documents/Github/basic-webserver refactor-host$ roc --prebuilt-platform build.roc
free(): invalid pointer
Running through the build steps manually I can verify that basic-webserver is working well.
Yeah, hopefully I can narrow down the bug and then everything will be green.
Ok. I think it is a bug in drop specialization or something related to the dec
to decref
conversion.
After refcounting ir is correct:
procedure : `#UserApp.main` Str
procedure = `#UserApp.main` ():
let `#UserApp.16` : U8 = 100i64;
let `#UserApp.15` : List U8 = Array [`#UserApp.16`];
let `#UserApp.inStr` : [C {U64, U8}, C Str] = CallByName `Str.fromUtf8` `#UserApp.15`;
let `#UserApp.in` : List [C {U64, U8}, C Str] = Array [`#UserApp.inStr`];
joinpoint `#UserApp.8` `#UserApp.out`:
let `#UserApp.7` : U64 = CallByName `List.len` `#UserApp.out`;
dec `#UserApp.out`;
let `#UserApp.6` : Str = CallByName `Num.toStr` `#UserApp.7`;
ret `#UserApp.6`;
in
let `#UserApp.12` : U64 = lowlevel ListLenUsize `#UserApp.in`;
let `#UserApp.13` : U64 = 1i64;
let `#UserApp.14` : Int1 = lowlevel Eq `#UserApp.12` `#UserApp.13`;
if `#UserApp.14` then
let `#UserApp.11` : U64 = 0i64;
let `#UserApp.alone` : [C {U64, U8}, C Str] = lowlevel ListGetUnsafe `#UserApp.in` `#UserApp.11`;
inc `#UserApp.alone`;
dec `#UserApp.in`;
let `#UserApp.9` : List [C {U64, U8}, C Str] = Array [`#UserApp.alone`];
jump `#UserApp.8` `#UserApp.9`;
else
jump `#UserApp.8` `#UserApp.in`;
After drop specialization ir is incorrect cause it is missing the increment of #UserApp.alone
. For lists, there should not longer be a difference between dec
and decref
or at least, that is my understanding of how my PR changes things, but drop specialization (some other pass after refcounting) doesn't know that....Though maybe there should still be a difference... Cause we could save on the increment here if there was a difference.
procedure : `#UserApp.main` Str
procedure = `#UserApp.main` ():
let `#UserApp.16` : U8 = 100i64;
let `#UserApp.15` : List U8 = Array [`#UserApp.16`];
let `#UserApp.inStr` : [C {U64, U8}, C Str] = CallByName `Str.fromUtf8` `#UserApp.15`;
let `#UserApp.in` : List [C {U64, U8}, C Str] = Array [`#UserApp.inStr`];
joinpoint `#UserApp.8` `#UserApp.out`:
let `#UserApp.7` : U64 = CallByName `List.len` `#UserApp.out`;
dec `#UserApp.out`;
let `#UserApp.6` : Str = CallByName `Num.toStr` `#UserApp.7`;
ret `#UserApp.6`;
in
let `#UserApp.12` : U64 = lowlevel ListLenUsize `#UserApp.in`;
let `#UserApp.13` : U64 = 1i64;
let `#UserApp.14` : Int1 = lowlevel Eq `#UserApp.12` `#UserApp.13`;
if `#UserApp.14` then
let `#UserApp.11` : U64 = 0i64;
let `#UserApp.alone` : [C {U64, U8}, C Str] = lowlevel ListGetUnsafe `#UserApp.in` `#UserApp.11`;
decref `#UserApp.in`;
let `#UserApp.9` : List [C {U64, U8}, C Str] = Array [`#UserApp.alone`];
jump `#UserApp.8` `#UserApp.9`;
else
jump `#UserApp.8` `#UserApp.in`;
Thinking about this more, I don't think there can be this optimization anymore. Cause decrementing the list may be freeing tail elements that are in the capacity of the ilst but not in the specific slice being freed. So while it may be correct for the slice elements, it won't be correct for the underlying elements.
So I think I just need to block this optimization.
I guess it could still be allowed if there was some sort of check on slice vs list though.
cc: @Folkert de Vries in case he has any comments around this optimization and how it maps to the new list refcounting
Oh, I think this optimization may only apply to lists with known lengths. So maybe we can make it safe.
Ok. I think I have a fix
If anyone has thoughts on drop specialization for lists, left a big todo with info on the refcounting change. Currently just disabled it, but not sure if it should be fully removed or not: https://github.com/roc-lang/roc/pull/6907/files#r1678554981 (extra context in todo right above comment)
cc @J.Teeuwissen who worked on drop specialization
Test results for https://github.com/roc-lang/roc/pull/6907 using
ROC=roc EXAMPLES_DIR=examples/ bash ci/all_tests.sh
5dc20e0
Ok, I think we are ready for the next steps @Anton
The following platforms have been updated and compatible with these changes;
The following packages/platforms should be updated at some point soon, not blocking
So, if my understanding of the new lists is correct, a dec
of a list does something like this:
if (unique list)
dec items
free list
else
decref list
If this is the case, inc item; dec list
(with list being a singleton), it could be optimized as follows
if (unique list)
free list
else
inc item
decref list
Iirc the previous list dec
simply decremented all items. I think all drop specializations for lists (with a known length) would have to be updated with a uniqueness check (and free/decref).
I'm not sure how "dead" references factor in, I would assume they do not matter as their reference count should not be touched either way.
That is correct with the caveat that seamless slices can make it more complex
str = "some super long string that will definitely be allocated"
list = [str, str]
subList = List.dropFirst 1
# keep sublist alive and use it later
In this case, we are left with 2 allocations. 1 for the Str and 1 for the List.
The Str has a refcount of 2.
The slice has a size of 1, but knows how to get back to the initial allocation so it can free the entire original list.
When subList
goes out of scope, it will end up decrementing the refcount to str
twice and freeing both it and the underlying list allocation.
So more accurate refcount note to add to the code would be:
if (unique list)
dec items (this includes all items in the entire allocation, they don't have to be in the current sublist)
free list
else
decref list
Ah okay. Using the suggested slice/list check would be a simple fix. It would only make sense to drop specialise a slice if the size of the underlying list size is known at compile time. But even then, you would have to inline both the dec
for the slice and the dec
for the list. (Assuming the slice one calls the list one).
Makes sense. Thanks for the info.
linux x64 basic-cli refactor-host :check:
@Luke Boswell did you run this inside nix?
I'm now hitting:
โฏ ./examples/args div -n 5 -d 20
free(): invalid next size (fast)
Aborted (core dumped)
Did you delete and rebuild the host for surgical linking? I think we hit that as well if the surgical linker pulled in the old host (should pass with --linker=legacy` or if the surgical linker host is properly rebuilt
Thanks @Brendan Hansknecht :heart:
that fixed it :)
Last updated: Jul 06 2025 at 12:14 UTC