Stream: contributing

Topic: Setting up development enviroment


view this post on Zulip Nathan Freestone (Dec 11 2022 at 23:32):

hey, Just setting up my development environment to play around in, and I hit a small snag.

When I run cargo test within nix from the project root directory, it comes back saying that ~70-90 of the tests have failed (it seems to be a different number every time). This is with a clean working tree on commit 116463893.

Is this expected behavior? If not, any ideas on how to fix that?

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:36):

which tests fail?

view this post on Zulip Nathan Freestone (Dec 11 2022 at 23:37):

they all start with gen_ I can give you a complete list for my last test if you want

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:40):

just a sample is fine. is any particular error printed?

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:40):

and also, what are your system details

view this post on Zulip Nathan Freestone (Dec 11 2022 at 23:43):

failures:
    gen_abilities::encode_derived_list_of_lists_of_strings
    gen_abilities::encode_derived_nested_record_tag_record
    gen_abilities::encode_immediate::u16
    gen_abilities::hash::immediate::u16
    gen_compare::eq_i64
    gen_compare::list_eq_empty
    gen_compare::unit
    gen_list::bool_list_literal
    gen_list::list_drop_if_geq3
    gen_list::list_drop_if_string_eq
    gen_list::list_drop_last_mutable
    gen_list::list_find
    gen_list::list_find_index_not_found
...

the error that cargo comes back with is simply

error: test failed, to rerun pass '-p test_gen --test test_gen'

I'm on an M1 Macbook, macOS Monterey version 12.3.1

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:44):

hmm there is nothing else above the failures: line?

view this post on Zulip Nathan Freestone (Dec 11 2022 at 23:46):

oh sorry, lots, something like this:

---- gen_tags::pattern_matching_unit stdout ----
thread 'gen_tags::pattern_matching_unit' panicked at 'called `Result::unwrap()` on an `Err` value: DlClose { desc: "dlclose(0x20fc175a0): cannot dlclose until fork() handlers have completed" }', crates/compiler/test_gen/src/gen_tags.rs:712:5

79 times

from a quick scan they all seem to be panics on unwrap calls.

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:48):

hmm. well that is the interesting bit

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:48):

but I don't know what it means

view this post on Zulip Folkert de Vries (Dec 11 2022 at 23:48):

some sort of race condition maybe?

view this post on Zulip Nathan Freestone (Dec 12 2022 at 00:06):

ok, well, that must be it, running cargo test -p test_gen --test test_gen -- --test-threads=1 returns no failures.

view this post on Zulip Nathan Freestone (Dec 12 2022 at 00:06):

but it also took 10 minutes. any ideas on how to fix?

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 00:09):

yeah, I always run those tests with --test-threads=1 due to that race condition, generally with release

view this post on Zulip Nathan Freestone (Dec 12 2022 at 00:13):

do you have any idea what causes the race condition? hardware? OS? just the way the tests are designed?

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 00:17):

Not really. I assume it is running a kernel call in parallel over many tests.

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 00:17):

since dlclose is directly a kernel call

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 00:18):

maybe we could modify the test to use a lock when opening and closing dylibs

view this post on Zulip Nathan Freestone (Dec 12 2022 at 00:21):

interesting, also maybe we should add some text explaining this to CONTRIBUTING.mdexplaining the issue in the mean time. (in the running tests section)

view this post on Zulip Nathan Freestone (Dec 12 2022 at 00:23):

ether way, thank you very much for the help :)

view this post on Zulip Nathan Freestone (Dec 12 2022 at 00:57):

Update: it seems to work for up to 4 threads. don't know if that's significant, but it seemed to yield a small performance boost

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 03:11):

So I tried locks and it actually made things worse. Don't know why, maybe I just did something wrong. That said, it did work to just ignore the error instead of crashing. I guess the one concern of ignoring the error is that it could maybe lead to enough file handles building up that some random tests fail?

Thoughts on just changing from library.close().unwrap() to let _ = library.close();. That or just removing the line completely. The library should close when dropped(just ignoring errors). So maybe explicitly closing isn't need at all.

view this post on Zulip Richard Feldman (Dec 12 2022 at 03:52):

yeah seems unnecessary in tests, unless I'm missing something!

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 03:54):

I thought we had a case in CI where we ran out of file handles. Thought this was maybe added because of that...but I could be wrong.

view this post on Zulip Ayaz Hafiz (Dec 12 2022 at 04:06):

we did do that because multiple tests might try to rebuild eg the cli platform at the same time

view this post on Zulip Ayaz Hafiz (Dec 12 2022 at 04:06):

I put something in place to manage that but i’m not confident it always works

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 04:09):

That is just locking which platform can build, right? This shouldn't relate to opening and closing dynamic libraries in test_gen. This failure is specifically from closing dls and making sure that doesn't error.

view this post on Zulip Ayaz Hafiz (Dec 12 2022 at 04:31):

is the problem only due to trying to close the dylib? i’m not sure i totally follow what the symptom is aside tests hanging
on macos we spin trying to open the dylib if it doesn’t load the first time

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 04:34):

Yeah. Only on closing the dylib. Always this error: DlClose { desc: "dlclose(0x20fc175a0): cannot dlclose until fork() handlers have completed" }

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 04:35):

if we ignore the error on close all the tests pass for me consistently

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 04:36):

tried to dig into the why, but really not sure

view this post on Zulip Anton (Dec 12 2022 at 09:00):

@Nathan Freestone does cargo test --release work without errors? That is how we test it on CI.

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 15:56):

For these tests, it doesn't for me. I still have to limit threads in release builds for it to pass consistently with current main on test_gen

view this post on Zulip Anton (Dec 12 2022 at 16:18):

I think this test_gen failure is the same as here on the github ci macos x86_64 server. I could not reproduce it, so I never ended up bisecting it. If someone wants to give that a try; the failure first showed up 16 days ago. We can probably learn something from the commit that introduced the failure.

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 16:23):

I think it is likely different:

  1. I hit it on M1 macos
  2. It is a dlclose here that gets unwrapped, there is no default unlike the issue on the CI server
  3. I have been hitting it for months whenever I have to run test_gen locally (pretty rare, so i just used 1 test thread when it happened)

view this post on Zulip Anton (Dec 12 2022 at 16:24):

Oh alright, good to know

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 16:24):

Also, it happens often enough that i will hit it consistently over the 1200ish tests in test_gwn.

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 16:28):

Based on the comments here: https://github.com/nagisa/rust_libloading/blob/6e284984aee68cc2d1b7e7d5e7b5a2a2279cf8e3/src/os/unix/mod.rs#L316

Especially those inside the function, i think we should just remove the lib.close line and see if it passes CI. With the current API, we can't even retry closing the lib. So i think we should just ignore the errors and make sure that doesn't break anything. The dl will still be requested to close, it just won't crash the test if that fails.

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 16:34):

I'll push a CL in a minute

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 16:34):

Then we can see if it passes CI

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 16:49):

#4741

view this post on Zulip Nathan Freestone (Dec 12 2022 at 18:28):

Anton said:

Nathan Freestone does cargo test --release work without errors? That is how we test it on CI.

I have the same experience as Brendan, also on Apple sillicon

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 19:15):

Ok. merged the fix.

view this post on Zulip Brendan Hansknecht (Dec 12 2022 at 19:15):

If you pull main, it should work with full threading

view this post on Zulip Nathan Freestone (Dec 12 2022 at 19:16):

I will test

view this post on Zulip Nathan Freestone (Dec 12 2022 at 19:46):

cargo test --release now works without errors! Thanks for looking into it!


Last updated: Jul 06 2025 at 12:14 UTC