Stream: compiler development

Topic: CI


view this post on Zulip Richard Feldman (Aug 17 2023 at 12:31):

looks like Windows CI is failing because the basic-cli 0.5.0 release doesn't have a prebuilt host for Windows: https://github.com/roc-lang/roc/actions/runs/5886643841/job/15964820569?pr=5747#step:12:189

does anyone know if there was a limitation that prevented 0.5.0 from having that? Or is it something we could fix in an 0.5.1 release?

view this post on Zulip Anton (Aug 18 2023 at 08:19):

does anyone know if there was a limitation that prevented 0.5.0 from having that?

I don't think so, I'll try setting up the basic-cli release workflow to include the windows file.

view this post on Zulip Anton (Aug 23 2023 at 16:23):

Many basic-cli examples fail on windows without errors so I'll need to dig into those more before I add the necessary windows files to the next basic-cli release.

view this post on Zulip Richard Feldman (Aug 23 2023 at 16:31):

huh! Was that true of the previous release too?

view this post on Zulip Anton (Aug 23 2023 at 17:36):

I would think so but I'll check

view this post on Zulip Anton (Aug 23 2023 at 18:15):

Yes identical behavior using older roc and basic-cli 0.4.0

view this post on Zulip Richard Feldman (Aug 23 2023 at 18:18):

gotcha

view this post on Zulip Richard Feldman (Aug 23 2023 at 18:18):

so maybe worth doing an 0.5.1 release just to fix CI, and then investigate the errors later?

view this post on Zulip Anton (Aug 23 2023 at 18:21):

I've ignored that basic-cli test to fix CI a while ago. I prefer that option, I dislike making it seem like basic-cli supports windows while it would break easily with normal usage.

view this post on Zulip Richard Feldman (Aug 23 2023 at 18:21):

that's fair :thumbs_up:

view this post on Zulip Luke Boswell (Aug 24 2023 at 00:04):

Yeah, some of the basic-cli tests assume linux tools available. I would like to have it working on Windows but I think there are still issues with Roc itself which really prevent anyone from using it on Windows

view this post on Zulip Richard Feldman (Aug 24 2023 at 00:47):

do you happen to know which issues in particular are blocking it?

view this post on Zulip Folkert de Vries (Sep 12 2023 at 16:28):

have we seen this before?

  thread 'main' panicked at 'zig build object -Drelease=true failed 10 times in a row. The following error is unlikely to be a flaky error: /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:447:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/bpf.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:444:32: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/amdgpu.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:449:30: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/mips.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:442:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/aarch64.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:445:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/arm.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:443:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/arc.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:446:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/avr.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:450:32: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/msp430.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:451:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/nvptx.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:453:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/riscv.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:454:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/sparc.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:455:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/spirv.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:448:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/hexagon.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:452:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/powerpc.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:456:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/systemz.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:457:28: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/ve.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:459:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/x86.zig': FileNotFound
  /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:458:30: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/wasm.zig': FileNotFound
  ', crates/compiler/builtins/bitcode/build.rs:179:25
 ```

view this post on Zulip Folkert de Vries (Sep 12 2023 at 16:28):

on https://github.com/roc-lang/roc/actions/runs/6162086596/job/16722836284?pr=5799

view this post on Zulip Anton (Sep 12 2023 at 17:07):

No, I don't think so

view this post on Zulip Folkert de Vries (Sep 12 2023 at 17:16):

it's hitting 2 PRs that don't really do anything on macos. are we downloading some weird/bad version of zig?

view this post on Zulip Anton (Sep 12 2023 at 18:11):

Oh, I see what's wrong now, I was cleaning up disk space on the CI machine and searched for folders named target and deleted them, but I was under the impression I was searching in my roc folder but it appears to have been systemwide. I'll try to fix it now.

view this post on Zulip Anton (Sep 12 2023 at 18:26):

Re-running now

view this post on Zulip Anton (Sep 12 2023 at 18:44):

Success :tada:

view this post on Zulip Folkert de Vries (Oct 07 2023 at 14:25):

what is happening here?

๐ŸŽ‰ Docs generated in ./generated-docs
+ mv generated-docs/ www/build/builtins
+ find www/build/builtins -type f -name index.html -exec sed -i 's!</nav>!<div class="builtins-tip"><b>Tip:</b> <a href="/different-names">Some names</a> differ from other languages.</div></nav>!' '{}' ';'
+ rm -rf roc_nightly roc_releases.json
+ '[' -v GITHUB_TOKEN_READ_ONLY ']'
Building tutorial.html from tutorial.md...
+ echo 'Building tutorial.html from tutorial.md...'
+ mkdir www/build/tutorial
+ cargo build --release --bin roc
    Finished release [optimized] target(s) in 0.23s
+ roc=target/release/roc
+ target/release/roc version
roc built-from-source
+ target/release/roc run www/generate_tutorial/src/tutorial.roc -- www/generate_tutorial/src/input/ www/build/tutorial/
๐Ÿ”จ Rebuilding platform...
Processing 1 input files...
/home/small-ci-user/actions-runner/_work/roc/roc/www/generate_tutorial/src/input/tutorial.md -> /home/small-ci-user/actions-runner/_work/roc/roc/www/build/tutorial/tutorial.html
Processed 1 files with 1 successes and 0 errors
www/build.sh: line 91: 2753219 Segmentation fault      (core dumped) $roc run www/generate_tutorial/src/tutorial.roc -- www/generate_tutorial/src/input/ www/build/tutorial/

view this post on Zulip Anton (Oct 07 2023 at 14:30):

I think that's a flake, I restarted the job now.

view this post on Zulip Richard Feldman (Oct 07 2023 at 15:02):

yeah I've seen that before

view this post on Zulip Anton (Oct 07 2023 at 15:03):

It segfaulted again, I don't really see why the clippy changes would make it more likely though. I did create #5772 for it earlier with some valgrind output

view this post on Zulip Folkert de Vries (Oct 07 2023 at 15:40):

ah, ok the cause is pretty obvious actually

view this post on Zulip Folkert de Vries (Oct 07 2023 at 15:45):

fix https://github.com/roc-lang/roc/pull/5892

view this post on Zulip Brendan Hansknecht (Nov 28 2023 at 21:16):

@Anton Looks to be issues with a specific ci device:

Runner name: 'anton-m1-mac-mini'
Runner group name: 'Default'
Machine name: 'm1cis-Mac-mini'

exmaple failures:

view this post on Zulip Brendan Hansknecht (Nov 29 2023 at 04:10):

It seems that the mac ci runner is fully down now #5775 is just sitting with the action queued. Same with a few other PRs. I am gonna assume the machine was hitting problems and then crashed or something.

view this post on Zulip Anton (Nov 29 2023 at 09:34):

It's running now, this was due to #6106, I'll set up a temporary fix today

view this post on Zulip Anton (Nov 29 2023 at 15:06):

I've set up a new workflow that will clean up the nix store on the m1 mac mini CI server once a day. It can also be triggered manually if needed.

view this post on Zulip Luke Boswell (Nov 30 2023 at 20:27):

Just saw the new workflow on https://github.com/roc-lang/roc/pull/6122 and it worked great! :muscle:

view this post on Zulip Ayaz Hafiz (Dec 01 2023 at 04:39):

Is this known? https://github.com/roc-lang/roc/actions/runs/7055754074/job/19206699414?pr=6128

view this post on Zulip Brendan Hansknecht (Dec 01 2023 at 06:58):

That definitely was flaky on some of my past PRs. Not sure why

view this post on Zulip Anton (Dec 01 2023 at 11:41):

I've also seen it flake before, we should move the gui examples over to the examples repo

view this post on Zulip Brendan Hansknecht (Dec 04 2023 at 22:00):

#6176 is failing in CI on some valgrind tests that I shouldn't have affected at all. Any ideas?

view this post on Zulip Folkert de Vries (Dec 04 2023 at 23:05):

no idea, likely something specific to the machine/hardware

view this post on Zulip John Murray (Dec 05 2023 at 02:30):

Maybe its some zig unreachables like the issue i had in https://github.com/roc-lang/roc/pull/6062 ?

view this post on Zulip Anton (Dec 05 2023 at 10:22):

[...] that I shouldn't have affected at all.

If the builtins changed things can shift around and end up revealing an existing problem.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 20:32):

Sorry that I forgot about #6062, thought I had already merged it.

view this post on Zulip Brendan Hansknecht (Dec 05 2023 at 20:32):

I hope rebasing on it will fix my ci issues.

view this post on Zulip Brendan Hansknecht (Dec 09 2023 at 18:58):

This keeps timing out for me: https://github.com/roc-lang/roc/actions/runs/7149876106/job/19477835985?pr=6216

view this post on Zulip Brendan Hansknecht (Dec 09 2023 at 20:12):

passes locally on x86 for me in reasonable time. Gonna launch another time and hope.

view this post on Zulip Anton (Dec 23 2023 at 11:11):

Main is failing on:

โ”€โ”€ FILE NOT FOUND โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ UNKNOWN.roc โ”€

I am looking for this file, but it's not there:

    downloaded-basic-cli/src/main.roc

Fixing it now

view this post on Zulip Folkert de Vries (Jan 03 2024 at 02:27):

a change to decimal parsing somehow causes a segfault https://github.com/roc-lang/roc/actions/runs/7391040574/job/20109491250?pr=6340

that is likely unrelated to the actual changes, so maybe something is flaky?

view this post on Zulip Anton (Jan 03 2024 at 08:39):

That's the same segfault I'm hitting in PR#6333, tracked as #5924. I planned on digging into it more today, we can ignore and merge if it's not (close to) fixed today. It's a particularly difficult bug because the segfault only occurs when it's cleaning up the memory, after execution is done. I've found similar bugs on the internet where this happened because of invalid pointers.

Some interesting behavior; the segfault is not triggered if I change:

#[no_mangle]
pub unsafe extern "C" fn roc_dbg(loc: *mut RocStr, msg: *mut RocStr, src: *mut RocStr) {
    eprintln!("[{}] {} = {}", &*loc, &*src, &*msg);
}

to:

#[no_mangle]
pub unsafe extern "C" fn roc_dbg(loc: *mut RocStr, msg: *mut RocStr, src: *mut RocStr) {
    eprintln!("");
    eprintln!("[{}] {} = {}", &*loc, &*src, &*msg);
}

All while dbg is not even used in this test.

If anybody has any debugging tips, I'd love to hear them :)

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:18):

but is the valgrind info clean when you add that extra eprintln? I'd guess not

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:19):

this just reeks of UB. I had something kind of similar happen a while ago when we didn't implement the ABI correctly in the dev backend

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:24):

oh, with the legacy linker it just works. Was that known?

view this post on Zulip Anton (Jan 03 2024 at 13:28):

oh, with the legacy linker it just works. Was that known?

yes indeed

view this post on Zulip Anton (Jan 03 2024 at 13:28):

but is the valgrind info clean when you add that extra eprintln? I'd guess not

yes:

โฏ valgrind ~/Desktop/closures_comparison/closures_good/app
==41746== Memcheck, a memory error detector
==41746== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==41746== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==41746== Command: /home/username/Desktop/closures_comparison/closures_good/app
==41746==
Answer was: 672
==41746==
==41746== HEAP SUMMARY:
==41746==     in use at exit: 0 bytes in 0 blocks
==41746==   total heap usage: 12 allocs, 12 frees, 3,205 bytes allocated
==41746==
==41746== All heap blocks were freed -- no leaks are possible
==41746==
==41746== For lists of detected and suppressed errors, rerun with: -s
==41746== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:29):

well that makes more sense if things do work with the legacy linker

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:30):

it's plausible that e.g. something is not aligned or updated correctly by the legacy linker, and that extra statement somehow makes it work again

view this post on Zulip Anton (Jan 03 2024 at 13:30):

... or updated correctly by the legacy linker

I think you mean surgical linker here right?

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:31):

yes, sorry

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:33):

so maybe we can get something out of diffing the working and non-working program somehow?

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:41):

could be nothing but

Start of section headers:          6556800 (bytes into file) # good
Start of section headers:          6552512 (bytes into file) # bad

that could be an alignment thing?

view this post on Zulip Folkert de Vries (Jan 03 2024 at 13:44):

hmm no other tests here also have arbitrary starts.

view this post on Zulip Anton (Jan 03 2024 at 14:24):

so maybe we can get something out of diffing the working and non-working program somehow?

Sounds good, I'll get on that later today

view this post on Zulip Folkert de Vries (Jan 03 2024 at 15:47):

I tried but with readelf at least I don't really see anything interesting

view this post on Zulip Folkert de Vries (Jan 03 2024 at 15:48):

besides that the text section changes size and it is right before the .fini section so that moves around

view this post on Zulip Folkert de Vries (Jan 03 2024 at 15:48):

but that seems to happen even when I make other changes and then the platform keeps working

view this post on Zulip Anton (Jan 03 2024 at 19:34):

we can ignore and merge if it's not (close to) fixed today.

Given that we're hitting this in four different places I'd like to take some extra time.

I have assembled 4 good/bad versions of executables in a tar. If we find what all bad executables have in common we should be able to figure out the cause :)

view this post on Zulip Folkert de Vries (Jan 03 2024 at 21:17):

so, I'm trying some debugging here. My version segfaults at this pop instruction.

โ”‚   0x555555691ffa <__libc_csu_init+90>     pop    rbx      
โ”‚   0x555555691ffb <__libc_csu_init+91>     pop    rbp      
โ”‚   0x555555691ffc <__libc_csu_init+92>     pop    r12      
โ”‚   0x555555691ffe <__libc_csu_init+94>     pop    r13      
โ”‚  >0x555555692000 <__libc_csu_init+96>     pop    r14      
โ”‚   0x555555692002 <__libc_csu_init+98>     pop    r15      
โ”‚   0x555555692004 <__libc_csu_init+100>    ret

that could make sense if the stack got corrupted somehow. the stack pointer value here:

rax            0x0                 0
rbx            0x555555691fa0      93824993533856
rcx            0x555555691fa0      93824993533856
rdx            0x7fffffffe068      140737488347240
rsi            0x0                 0
rdi            0x5555556f3108      93824993931528
rbp            0x0                 0x0
rsp            0x7fffffffdf58      0x7fffffffdf58
r8             0x0                 0
r9             0x7ffff7fe0d60      140737354009952
r10            0x1                 1

stores the r14 register based on the frame info

Stack level 0, frame at 0x7fffffffdf70:
 rip = 0x555555692000 in __libc_csu_init; saved rip = 0x7ffff7d8f010
 called by frame at 0x7fffffffe040
 Arglist at 0x7fffffffdf50, args:
 Locals at 0x7fffffffdf50, Previous frame's sp is 0x7fffffffdf70
 Saved registers:
  rbx at 0x7fffffffdf38, rbp at 0x7fffffffdf40, r12 at 0x7fffffffdf48, r13 at 0x7fffffffdf50, r14 at 0x7fffffffdf58, r15 at 0x7fffffffdf60,
  rip at 0x7fffffffdf68

So, notice how the arglist and locals start at a stack offset that is in the middle of that sequence of stored registers. A working version does not do that

Stack level 0, frame at 0x7fffffffdf70:
 rip = 0x555555691ff5 in __libc_csu_init; saved rip = 0x7ffff7d8f010
 called by frame at 0x7fffffffe040
 Arglist at 0x7fffffffdf30, args:
 Locals at 0x7fffffffdf30, Previous frame's sp is 0x7fffffffdf70
 Saved registers:
  rbx at 0x7fffffffdf38, rbp at 0x7fffffffdf40, r12 at 0x7fffffffdf48, r13 at 0x7fffffffdf50, r14 at 0x7fffffffdf58, r15 at 0x7fffffffdf60,
  rip at 0x7fffffffdf68

so, somehow the stack got messed up? Here my knowledge kind of runs out. I don't think the bug is in this function, it must already kind of start with a messed-up stack. Do we influence somehow where the stack starts?

view this post on Zulip Folkert de Vries (Jan 03 2024 at 21:28):

hmm, or not? it looks like push/pop change the location of the arglist/locals

view this post on Zulip Folkert de Vries (Jan 03 2024 at 21:29):

at which point I'm back to: why ever would a pop cause a segfault?

view this post on Zulip Folkert de Vries (Jan 03 2024 at 21:33):

@Brendan Hansknecht any thoughts here?

view this post on Zulip Brendan Hansknecht (Jan 03 2024 at 21:39):

Probably the same root cause as #6121

view this post on Zulip Brendan Hansknecht (Jan 03 2024 at 21:43):

No immediate other thoughts. I probably need to play with it for a bit so that I can refresh my brain on the details. Probably can do that later today.

view this post on Zulip Folkert de Vries (Jan 03 2024 at 21:48):

ok well my datapoint then is that with gdb/lldb the error is different to valgrind. based on the naming it seems to error before any roc code is even called. The printing of the answer also does not happen when running with the debugger, so that seems to match up.

probably the same root cause, but the debugger changes something about the environment that makes the problem show up in a different place

view this post on Zulip Anton (Jan 05 2024 at 16:11):

The printing of the answer also does not happen when running with the debugger

The printing does happen for me inside gdb

view this post on Zulip Anton (Jan 05 2024 at 18:59):

glue_option_bad_vs_good.png
I found a suspicious diff when printing the string variable with gdb for the failing option test. The left is for the segfaulting version, and the right for the good version (with the unused extra eprintln). The bytes in the SmallString should form "Hello World!" but that does not seem to work out with my understanding of utf8 :sweat_smile: Also should the length be 0?

view this post on Zulip Anton (Jan 31 2024 at 14:15):

Going to do some CI maintenance, all servers should be available again in ~60 mins

view this post on Zulip Richard Feldman (Apr 26 2024 at 03:14):

I'm seeing this failure on multiple branches - maybe the macOS x64 machine's LLVM setup is misconfigured somehow?

https://github.com/roc-lang/roc/actions/runs/8842548683/job/24281394888?pr=6676#step:6:283
https://github.com/roc-lang/roc/actions/runs/8842567707/job/24281450979?pr=6678#step:6:284

view this post on Zulip Anton (Apr 26 2024 at 12:25):

I'll check it out, I updated z3 on that server Wednesday, homebrew may have updated other things too

view this post on Zulip Anton (Apr 26 2024 at 18:09):

The macos x64 issues have been resolved but I'm still encountering some strange linking issue on macos aarch64. WIll continue investigating tomorrow...

view this post on Zulip Bryce Miller (Apr 26 2024 at 22:58):

Ah, glad it wasnโ€™t just me!

view this post on Zulip Anton (Apr 27 2024 at 12:43):

macos apple silicon issue will be resolved with PR#6680

view this post on Zulip Anton (May 08 2024 at 13:31):

cli_run::static_site_gen is experiencing flaky segfaults, I'll check it out

view this post on Zulip Folkert de Vries (Jun 29 2024 at 16:08):

I think https://github.com/roc-lang/roc/pull/6849 got stuck. this may be my fault because for some reason turning an assert into a debug_assert triggers an infinite loop that eats all ram. So maybe some of those machines didn't recover properly from that earlier

view this post on Zulip Anton (Jun 29 2024 at 16:16):

I'll check it out

view this post on Zulip Anton (Jun 29 2024 at 16:25):

I restarted both macos servers, they indeed ran out of memory. Both servers are running jobs again now :)

view this post on Zulip Brendan Hansknecht (Jun 29 2024 at 16:55):

turning an assert into a debug_assert triggers an infinite loop that eats all ram

That does not make sense to me. Especially when dealing with bools. I am really curious the actual cause.

view this post on Zulip Folkert de Vries (Jun 29 2024 at 18:07):

if anyone wants to look at the details https://github.com/roc-lang/roc/pull/6849/files#r1659918271

view this post on Zulip Luke Boswell (Aug 13 2024 at 03:01):

We are seeing this failure in Ci for @Agus Zubiaga 's PR on module params.

---- lowlevel_list_calls stdout ----
thread 'lowlevel_list_calls' panicked at crates/valgrind/src/lib.rs:202:9:
`valgrind` exited with exit code 1. valgrind stdout was: ""

valgrind stderr was: "--41196-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--41196-- si_code=1;  Faulting address: 0x2C9000;  sp: 0x1002db94f0

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==41196==    at 0x581985FA: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x581BA239: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x581BAAE3: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x58146BE6: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x58147D32: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x5812B932: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x5812C2A0: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x5805A431: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x5809B269: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196==    by 0x580E3F40: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 41196)
client stack range: [0x1FFEFF7000 0x1FFF000FFF] client SP: 0x1FFEFF74D0
valgrind stack range: [0x1002CBA000 0x1002DB9FFF] top usage: 18744 of 1048576

I cannot reproduce the issue locally on my linux x64 dev machine

view this post on Zulip Luke Boswell (Aug 13 2024 at 03:02):

Just looking for ideas for how to investigate further.

view this post on Zulip Luke Boswell (Aug 13 2024 at 03:14):

Perhaps the best COA is to log this as an issue, and skip this test in the PR as it looks unrelated. That way we can unblock work on Module Params.

view this post on Zulip Luke Boswell (Aug 13 2024 at 03:15):

This looks similar to the other CI issue Anton has been investigating for basic-cli 0.13.0, so it may already be tracked by an issue.

view this post on Zulip Sam Mohr (Aug 13 2024 at 06:45):

Luke Boswell said:

Perhaps the best COA is to log this as an issue, and skip this test in the PR as it looks unrelated. That way we can unblock work on Module Params.

I agree. Though maybe I'm missing something, I don't see anything in the Module Params PR that would cause this. We should make an issue and unblock the PR

view this post on Zulip Anton (Aug 13 2024 at 08:40):

This looks similar to the other CI issue Anton has been investigating for basic-cli 0.13.0,

It is possible they have the same cause, but this is a different kind of valgrind error.

view this post on Zulip Anton (Aug 13 2024 at 08:46):

I'll try do a bit of investigating this week and we can ignore it if I don't make progress

view this post on Zulip Luke Boswell (Aug 22 2024 at 08:31):

CI failures in https://github.com/roc-lang/roc/pull/7016 look like a flake. I'm very sure this is unrelated to adding more tests for leb128.

test tests::tuple ... ok
This Roc code crashed with: "Hit an erroneous type when creating a layout for `Num.add`"
test tests::type_problem_binary_operator ... ok
This Roc code crashed with: "Hit an erroneous type when creating a layout for `Num.add`"
test tests::type_problem_function ... ok
This Roc code crashed with: "Hit an erroneous type when creating a layout for `Bool.not`"
test tests::type_problem_unary_operator ... ok

failures:

---- tests::list_concat stdout ----
thread 'tests::list_concat' panicked at crates/repl_test/src/cli.rs:96:13:
repl exited unexpectedly before finishing evaluation. Exit status was ExitStatus(unix_wait_status(25856)) and stderr was ""
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: repl_test::cli::repl_eval
   3: repl_test::cli::expect_success
   4: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    tests::list_concat

test result: FAILED. 142 passed; 1 failed; 2 ignored; 0 measured; 0 filtered out; finished in 6.47s

view this post on Zulip Anton (Nov 26 2024 at 10:21):

Power is out, some work going on in the street. Not sure how long it will take, we didn't receive any communication

view this post on Zulip Anton (Nov 26 2024 at 11:14):

Back up

view this post on Zulip Richard Feldman (May 18 2025 at 18:42):

anyone know what these CI failures are? https://github.com/roc-lang/roc/pull/7795

error: failed to spawn build runner C:\Users\runneradmin\AppData\Local\zig\o\5cc848492ce7dcb02ce75ed22301d2f5\build.exe: FileNotFound

view this post on Zulip Brendan Hansknecht (May 18 2025 at 18:44):

Suggest that zig failed to build the build.zig script

view this post on Zulip Brendan Hansknecht (May 18 2025 at 18:44):

Or some sort of caching issue

view this post on Zulip Brendan Hansknecht (May 18 2025 at 19:50):

The new zig caching action may use too much data...hmm... This really should be one global cache for all this, but I think it is per branch...worse, it seems to be making many caches per branch...more than just per os/arch, but also has a timestamp...
Screenshot 2025-05-18 at 12.49.55โ€ฏPM.png

view this post on Zulip Anthony Bullard (May 18 2025 at 19:52):

can't we have a lru or TTL policy on the cache

view this post on Zulip Brendan Hansknecht (May 18 2025 at 19:53):

Github uses LRU policy for kicking old data

view this post on Zulip Brendan Hansknecht (May 18 2025 at 19:53):

But I'm not convinced it is caching correctly vs just spinning up tons of duplicate caches

view this post on Zulip Brendan Hansknecht (May 18 2025 at 19:53):

Not sure fully though

view this post on Zulip Brendan Hansknecht (May 18 2025 at 20:01):

anyone know what these CI failures are?

I nuked the zig cache and it is working now. So yeah...caching

view this post on Zulip Brendan Hansknecht (May 18 2025 at 20:06):

https://github.com/roc-lang/roc/pull/7796


Last updated: Jul 06 2025 at 12:14 UTC