troubleshooting trunk tests · beginners

Howdy! I'm having some trouble getting tests to complete on trunk (4990fb3). I'm running with earthly +test-all with Docker v20.10.12 on Fedora and getting several tests failing with the message:

          +test-rust *failed* | valgrind stderr was: "runtime: vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0x7C 0x48 0x28 0xD 0x2D 0x69 0x1 0x0
          +test-rust *failed* | vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
          +test-rust *failed* | vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
          +test-rust *failed* | vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
          +test-rust *failed* | ==12150== valgrind: Unrecognised instruction at address 0x10e809.
          +test-rust *failed* | ==12150== Your program just tried to execute an instruction that Valgrind
          +test-rust *failed* | ==12150== did not recognise.  There are two possible reasons for this.
          +test-rust *failed* | ==12150== 1. Your program has a bug and erroneously jumped to a non-code
          +test-rust *failed* | ==12150==    location.  If you are running Memcheck and you just saw a
          +test-rust *failed* | ==12150==    warning about a bad jump, it's probably your program's fault.
          +test-rust *failed* | ==12150== 2. The instruction is legitimate but Valgrind doesn't handle it,
          +test-rust *failed* | ==12150==    i.e. it's Valgrind's fault.  If you think this is the case or
          +test-rust *failed* | ==12150==    you are not sure, please let us know and we'll try to fix it.
          +test-rust *failed* | ==12150== Either way, Valgrind will now raise a SIGILL signal which will
          +test-rust *failed* | ==12150== probably kill your program.
          +test-rust *failed* | "', cli/tests/cli_run.rs:153:17
          +test-rust *failed* | stack backtrace:
          +test-rust *failed* |    0: rust_begin_unwind
          +test-rust *failed* |              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
          +test-rust *failed* |    1: std::panicking::begin_panic_fmt
          +test-rust *failed* |              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:460:5
          +test-rust *failed* |    2: cli_run::cli_run::check_output_with_stdin
          +test-rust *failed* |    3: core::ops::function::FnOnce::call_once
          +test-rust *failed* |    4: serial_test::serial_core
          +test-rust *failed* |    5: core::ops::function::FnOnce::call_once
          +test-rust *failed* |              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:227:

Emi (Feb 20 2022 at 19:51):

Brendan Hansknecht (Feb 20 2022 at 19:54):

I think that most people don't actually run tests with earthly, that is mostly used for CI.

Brendan Hansknecht (Feb 20 2022 at 19:54):

Emi (Feb 20 2022 at 19:54):

Emi (Feb 20 2022 at 19:55):

Brendan Hansknecht (Feb 20 2022 at 20:05):

Emi (Feb 20 2022 at 20:06):

Emi (Feb 20 2022 at 20:23):

I'm actually getting the same error with running tests outside of Docker? Command is cargo test cli_run

Emi (Feb 20 2022 at 20:24):

Emi (Feb 20 2022 at 20:29):

---- cli_run::hello_zig stdout ----
thread 'cli_run::hello_zig' panicked at '`valgrind` exited with no exit code. valgrind stdout was: ""

valgrind stderr was: "vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFE 0x48 0x6F 0x5 0xD6 0xDC 0x4 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==106615== valgrind: Unrecognised instruction at address 0x114c88.
==106615== Your program just tried to execute an instruction that Valgrind
==106615== did not recognise.  There are two possible reasons for this.
==106615== 1. Your program has a bug and erroneously jumped to a non-code
==106615==    location.  If you are running Memcheck and you just saw a
==106615==    warning about a bad jump, it's probably your program's fault.
==106615== 2. The instruction is legitimate but Valgrind doesn't handle it,
==106615==    i.e. it's Valgrind's fault.  If you think this is the case or
==106615==    you are not sure, please let us know and we'll try to fix it.
==106615== Either way, Valgrind will now raise a SIGILL signal which will
==106615== probably kill your program.
"', cli/tests/cli_run.rs:153:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Emi (Feb 20 2022 at 20:30):

the binaries produced seem to run find on their own
(when i run the hello-world binary, i get the expected output)

╰─$ cargo run examples/hello-zig/Hello.roc                                                                                                                                               ↵ 101
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `target/debug/roc examples/hello-zig/Hello.roc`
🔨 Rebuilding host... Done!
Hello, World!
runtime: 0.020ms

Emi (Feb 20 2022 at 20:31):

==107064== Memcheck, a memory error detector
==107064== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==107064== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==107064== Command: examples/hello-zig/hello-world
==107064==
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFE 0x48 0x6F 0x5 0xD6 0xDC 0x4 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==107064== valgrind: Unrecognised instruction at address 0x114c88.
==107064==    at 0x114C88: std.debug.attachSegfaultHandler (debug.zig:1763)
==107064==    by 0x113428: std.debug.maybeEnableSegfaultHandler (debug.zig:1748)
==107064==    by 0x111AD2: std.start.callMainWithArgs (start.zig:366)
==107064==    by 0x111882: main (start.zig:383)
==107064== Your program just tried to execute an instruction that Valgrind
==107064== did not recognise.  There are two possible reasons for this.
==107064== 1. Your program has a bug and erroneously jumped to a non-code
==107064==    location.  If you are running Memcheck and you just saw a
==107064==    warning about a bad jump, it's probably your program's fault.
==107064== 2. The instruction is legitimate but Valgrind doesn't handle it,
==107064==    i.e. it's Valgrind's fault.  If you think this is the case or
==107064==    you are not sure, please let us know and we'll try to fix it.
==107064== Either way, Valgrind will now raise a SIGILL signal which will
==107064== probably kill your program.
==107064==
==107064== Process terminating with default action of signal 4 (SIGILL): dumping core
==107064==  Illegal opcode at address 0x114C88
==107064==    at 0x114C88: std.debug.attachSegfaultHandler (debug.zig:1763)
==107064==    by 0x113428: std.debug.maybeEnableSegfaultHandler (debug.zig:1748)
==107064==    by 0x111AD2: std.start.callMainWithArgs (start.zig:366)
==107064==    by 0x111882: main (start.zig:383)
==107064==
==107064== HEAP SUMMARY:
==107064==     in use at exit: 0 bytes in 0 blocks
==107064==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==107064==
==107064== All heap blocks were freed -- no leaks are possible
==107064==
==107064== For lists of detected and suppressed errors, rerun with: -s
==107064== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
fish: Job 1, 'valgrind examples/hello-zig/hel…' terminated by signal SIGILL (Illegal instruction)

Emi (Feb 20 2022 at 20:32):

Folkert de Vries (Feb 20 2022 at 20:40):

Folkert de Vries (Feb 20 2022 at 20:41):

Brendan Hansknecht (Feb 20 2022 at 21:59):

This likely means that zig is generating an instruction that valgrind doesn't know about. That has happened to me in the past, but it is pretty rare. Though it is totally possible especially with cpu target of native/host.

Brendan Hansknecht (Feb 20 2022 at 22:00):

May mean they need a newer version of valgrind to run the tests, or tell zig to target an older cpu.

Folkert de Vries (Feb 20 2022 at 22:04):

Brian Carroll (Feb 20 2022 at 22:44):

We're still on Zig 0.8.1 rather than the latest 0.9.1, right? What's your zig version, @Emi ?

Emi (Feb 20 2022 at 22:44):

Oh yes! It's just the zig examples, I don't know how I missed that, thank you! I'm running zig 0.8.1, but I'll try messing around with valgrind versions and seeing if I can figure out how to target different cpus, like Brendan said

Brian Carroll (Feb 20 2022 at 22:46):

Also just FYI, I've never run valgrind on this project but I'm a regular contributor!

Emi (Feb 20 2022 at 22:46):

Folkert de Vries (Feb 20 2022 at 22:51):

Emi (Feb 20 2022 at 22:52):

Folkert de Vries (Feb 20 2022 at 22:55):

so something you could do is have valgrind just do nothing (e.g. make a script with the name "valgrind" that just calls the binary)

Folkert de Vries (Feb 20 2022 at 22:55):

if the problem persists of course, it might also be fixed when we upgrade to zig 0.9.1

Emi (Feb 20 2022 at 23:10):

Emi (Feb 20 2022 at 23:15):

Brendan Hansknecht (Feb 20 2022 at 23:27):