hey, Just setting up my development environment to play around in, and I hit a small snag.
When I run cargo test
within nix from the project root directory, it comes back saying that ~70-90 of the tests have failed (it seems to be a different number every time). This is with a clean working tree on commit 116463893.
Is this expected behavior? If not, any ideas on how to fix that?
which tests fail?
they all start with gen_
I can give you a complete list for my last test if you want
just a sample is fine. is any particular error printed?
and also, what are your system details
failures:
gen_abilities::encode_derived_list_of_lists_of_strings
gen_abilities::encode_derived_nested_record_tag_record
gen_abilities::encode_immediate::u16
gen_abilities::hash::immediate::u16
gen_compare::eq_i64
gen_compare::list_eq_empty
gen_compare::unit
gen_list::bool_list_literal
gen_list::list_drop_if_geq3
gen_list::list_drop_if_string_eq
gen_list::list_drop_last_mutable
gen_list::list_find
gen_list::list_find_index_not_found
...
the error that cargo comes back with is simply
error: test failed, to rerun pass '-p test_gen --test test_gen'
I'm on an M1 Macbook, macOS Monterey version 12.3.1
hmm there is nothing else above the failures:
line?
oh sorry, lots, something like this:
---- gen_tags::pattern_matching_unit stdout ----
thread 'gen_tags::pattern_matching_unit' panicked at 'called `Result::unwrap()` on an `Err` value: DlClose { desc: "dlclose(0x20fc175a0): cannot dlclose until fork() handlers have completed" }', crates/compiler/test_gen/src/gen_tags.rs:712:5
79 times
from a quick scan they all seem to be panics on unwrap
calls.
hmm. well that is the interesting bit
but I don't know what it means
some sort of race condition maybe?
ok, well, that must be it, running cargo test -p test_gen --test test_gen -- --test-threads=1
returns no failures.
but it also took 10 minutes. any ideas on how to fix?
yeah, I always run those tests with --test-threads=1
due to that race condition, generally with release
do you have any idea what causes the race condition? hardware? OS? just the way the tests are designed?
Not really. I assume it is running a kernel call in parallel over many tests.
since dlclose is directly a kernel call
maybe we could modify the test to use a lock when opening and closing dylibs
interesting, also maybe we should add some text explaining this to CONTRIBUTING.md
explaining the issue in the mean time. (in the running tests section)
ether way, thank you very much for the help :)
Update: it seems to work for up to 4 threads. don't know if that's significant, but it seemed to yield a small performance boost
So I tried locks and it actually made things worse. Don't know why, maybe I just did something wrong. That said, it did work to just ignore the error instead of crashing. I guess the one concern of ignoring the error is that it could maybe lead to enough file handles building up that some random tests fail?
Thoughts on just changing from library.close().unwrap()
to let _ = library.close();
. That or just removing the line completely. The library should close when dropped(just ignoring errors). So maybe explicitly closing isn't need at all.
yeah seems unnecessary in tests, unless I'm missing something!
I thought we had a case in CI where we ran out of file handles. Thought this was maybe added because of that...but I could be wrong.
we did do that because multiple tests might try to rebuild eg the cli platform at the same time
I put something in place to manage that but i’m not confident it always works
That is just locking which platform can build, right? This shouldn't relate to opening and closing dynamic libraries in test_gen
. This failure is specifically from closing dls and making sure that doesn't error.
is the problem only due to trying to close the dylib? i’m not sure i totally follow what the symptom is aside tests hanging
on macos we spin trying to open the dylib if it doesn’t load the first time
Yeah. Only on closing the dylib. Always this error: DlClose { desc: "dlclose(0x20fc175a0): cannot dlclose until fork() handlers have completed" }
if we ignore the error on close all the tests pass for me consistently
tried to dig into the why, but really not sure
@Nathan Freestone does cargo test --release
work without errors? That is how we test it on CI.
For these tests, it doesn't for me. I still have to limit threads in release builds for it to pass consistently with current main on test_gen
I think this test_gen failure is the same as here on the github ci macos x86_64 server. I could not reproduce it, so I never ended up bisecting it. If someone wants to give that a try; the failure first showed up 16 days ago. We can probably learn something from the commit that introduced the failure.
I think it is likely different:
test_gen
locally (pretty rare, so i just used 1 test thread when it happened)Oh alright, good to know
Also, it happens often enough that i will hit it consistently over the 1200ish tests in test_gwn.
Based on the comments here: https://github.com/nagisa/rust_libloading/blob/6e284984aee68cc2d1b7e7d5e7b5a2a2279cf8e3/src/os/unix/mod.rs#L316
Especially those inside the function, i think we should just remove the lib.close line and see if it passes CI. With the current API, we can't even retry closing the lib. So i think we should just ignore the errors and make sure that doesn't break anything. The dl will still be requested to close, it just won't crash the test if that fails.
I'll push a CL in a minute
Then we can see if it passes CI
Anton said:
Nathan Freestone does
cargo test --release
work without errors? That is how we test it on CI.
I have the same experience as Brendan, also on Apple sillicon
Ok. merged the fix.
If you pull main, it should work with full threading
I will test
cargo test --release
now works without errors! Thanks for looking into it!
Last updated: Jul 06 2025 at 12:14 UTC