looks like Windows CI is failing because the basic-cli
0.5.0 release doesn't have a prebuilt host for Windows: https://github.com/roc-lang/roc/actions/runs/5886643841/job/15964820569?pr=5747#step:12:189
does anyone know if there was a limitation that prevented 0.5.0 from having that? Or is it something we could fix in an 0.5.1 release?
does anyone know if there was a limitation that prevented 0.5.0 from having that?
I don't think so, I'll try setting up the basic-cli release workflow to include the windows file.
Many basic-cli examples fail on windows without errors so I'll need to dig into those more before I add the necessary windows files to the next basic-cli release.
huh! Was that true of the previous release too?
I would think so but I'll check
Yes identical behavior using older roc and basic-cli 0.4.0
gotcha
so maybe worth doing an 0.5.1 release just to fix CI, and then investigate the errors later?
I've ignored that basic-cli test to fix CI a while ago. I prefer that option, I dislike making it seem like basic-cli supports windows while it would break easily with normal usage.
that's fair :thumbs_up:
Yeah, some of the basic-cli tests assume linux tools available. I would like to have it working on Windows but I think there are still issues with Roc itself which really prevent anyone from using it on Windows
do you happen to know which issues in particular are blocking it?
have we seen this before?
thread 'main' panicked at 'zig build object -Drelease=true failed 10 times in a row. The following error is unlikely to be a flaky error: /Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:447:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/bpf.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:444:32: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/amdgpu.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:449:30: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/mips.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:442:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/aarch64.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:445:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/arm.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:443:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/arc.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:446:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/avr.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:450:32: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/msp430.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:451:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/nvptx.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:453:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/riscv.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:454:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/sparc.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:455:31: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/spirv.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:448:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/hexagon.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:452:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/powerpc.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:456:33: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/systemz.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:457:28: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/ve.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:459:29: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/x86.zig': FileNotFound
/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target.zig:458:30: error: unable to load '/Users/username1/Downloads/zig-macos-x86_64-0.9.1/lib/std/target/wasm.zig': FileNotFound
', crates/compiler/builtins/bitcode/build.rs:179:25
```
on https://github.com/roc-lang/roc/actions/runs/6162086596/job/16722836284?pr=5799
No, I don't think so
it's hitting 2 PRs that don't really do anything on macos. are we downloading some weird/bad version of zig?
Oh, I see what's wrong now, I was cleaning up disk space on the CI machine and searched for folders named target and deleted them, but I was under the impression I was searching in my roc folder but it appears to have been systemwide. I'll try to fix it now.
Re-running now
Success :tada:
what is happening here?
๐ Docs generated in ./generated-docs
+ mv generated-docs/ www/build/builtins
+ find www/build/builtins -type f -name index.html -exec sed -i 's!</nav>!<div class="builtins-tip"><b>Tip:</b> <a href="/different-names">Some names</a> differ from other languages.</div></nav>!' '{}' ';'
+ rm -rf roc_nightly roc_releases.json
+ '[' -v GITHUB_TOKEN_READ_ONLY ']'
Building tutorial.html from tutorial.md...
+ echo 'Building tutorial.html from tutorial.md...'
+ mkdir www/build/tutorial
+ cargo build --release --bin roc
Finished release [optimized] target(s) in 0.23s
+ roc=target/release/roc
+ target/release/roc version
roc built-from-source
+ target/release/roc run www/generate_tutorial/src/tutorial.roc -- www/generate_tutorial/src/input/ www/build/tutorial/
๐จ Rebuilding platform...
Processing 1 input files...
/home/small-ci-user/actions-runner/_work/roc/roc/www/generate_tutorial/src/input/tutorial.md -> /home/small-ci-user/actions-runner/_work/roc/roc/www/build/tutorial/tutorial.html
Processed 1 files with 1 successes and 0 errors
www/build.sh: line 91: 2753219 Segmentation fault (core dumped) $roc run www/generate_tutorial/src/tutorial.roc -- www/generate_tutorial/src/input/ www/build/tutorial/
I think that's a flake, I restarted the job now.
yeah I've seen that before
It segfaulted again, I don't really see why the clippy changes would make it more likely though. I did create #5772 for it earlier with some valgrind output
ah, ok the cause is pretty obvious actually
fix https://github.com/roc-lang/roc/pull/5892
@Anton Looks to be issues with a specific ci device:
Runner name: 'anton-m1-mac-mini'
Runner group name: 'Default'
Machine name: 'm1cis-Mac-mini'
exmaple failures:
It seems that the mac ci runner is fully down now #5775 is just sitting with the action queued. Same with a few other PRs. I am gonna assume the machine was hitting problems and then crashed or something.
It's running now, this was due to #6106, I'll set up a temporary fix today
I've set up a new workflow that will clean up the nix store on the m1 mac mini CI server once a day. It can also be triggered manually if needed.
Just saw the new workflow on https://github.com/roc-lang/roc/pull/6122 and it worked great! :muscle:
Is this known? https://github.com/roc-lang/roc/actions/runs/7055754074/job/19206699414?pr=6128
That definitely was flaky on some of my past PRs. Not sure why
I've also seen it flake before, we should move the gui examples over to the examples repo
#6176 is failing in CI on some valgrind tests that I shouldn't have affected at all. Any ideas?
no idea, likely something specific to the machine/hardware
Maybe its some zig unreachable
s like the issue i had in https://github.com/roc-lang/roc/pull/6062 ?
[...] that I shouldn't have affected at all.
If the builtins changed things can shift around and end up revealing an existing problem.
Sorry that I forgot about #6062, thought I had already merged it.
I hope rebasing on it will fix my ci issues.
This keeps timing out for me: https://github.com/roc-lang/roc/actions/runs/7149876106/job/19477835985?pr=6216
passes locally on x86 for me in reasonable time. Gonna launch another time and hope.
Main is failing on:
โโ FILE NOT FOUND โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ UNKNOWN.roc โ
I am looking for this file, but it's not there:
downloaded-basic-cli/src/main.roc
Fixing it now
a change to decimal parsing somehow causes a segfault https://github.com/roc-lang/roc/actions/runs/7391040574/job/20109491250?pr=6340
that is likely unrelated to the actual changes, so maybe something is flaky?
That's the same segfault I'm hitting in PR#6333, tracked as #5924. I planned on digging into it more today, we can ignore and merge if it's not (close to) fixed today. It's a particularly difficult bug because the segfault only occurs when it's cleaning up the memory, after execution is done. I've found similar bugs on the internet where this happened because of invalid pointers.
Some interesting behavior; the segfault is not triggered if I change:
#[no_mangle]
pub unsafe extern "C" fn roc_dbg(loc: *mut RocStr, msg: *mut RocStr, src: *mut RocStr) {
eprintln!("[{}] {} = {}", &*loc, &*src, &*msg);
}
to:
#[no_mangle]
pub unsafe extern "C" fn roc_dbg(loc: *mut RocStr, msg: *mut RocStr, src: *mut RocStr) {
eprintln!("");
eprintln!("[{}] {} = {}", &*loc, &*src, &*msg);
}
All while dbg
is not even used in this test.
If anybody has any debugging tips, I'd love to hear them :)
but is the valgrind info clean when you add that extra eprintln? I'd guess not
this just reeks of UB. I had something kind of similar happen a while ago when we didn't implement the ABI correctly in the dev backend
oh, with the legacy linker it just works. Was that known?
oh, with the legacy linker it just works. Was that known?
yes indeed
but is the valgrind info clean when you add that extra eprintln? I'd guess not
yes:
โฏ valgrind ~/Desktop/closures_comparison/closures_good/app
==41746== Memcheck, a memory error detector
==41746== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==41746== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==41746== Command: /home/username/Desktop/closures_comparison/closures_good/app
==41746==
Answer was: 672
==41746==
==41746== HEAP SUMMARY:
==41746== in use at exit: 0 bytes in 0 blocks
==41746== total heap usage: 12 allocs, 12 frees, 3,205 bytes allocated
==41746==
==41746== All heap blocks were freed -- no leaks are possible
==41746==
==41746== For lists of detected and suppressed errors, rerun with: -s
==41746== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
well that makes more sense if things do work with the legacy linker
it's plausible that e.g. something is not aligned or updated correctly by the legacy linker, and that extra statement somehow makes it work again
... or updated correctly by the legacy linker
I think you mean surgical linker here right?
yes, sorry
so maybe we can get something out of diffing the working and non-working program somehow?
could be nothing but
Start of section headers: 6556800 (bytes into file) # good
Start of section headers: 6552512 (bytes into file) # bad
that could be an alignment thing?
hmm no other tests here also have arbitrary starts.
so maybe we can get something out of diffing the working and non-working program somehow?
Sounds good, I'll get on that later today
I tried but with readelf at least I don't really see anything interesting
besides that the text section changes size and it is right before the .fini section so that moves around
but that seems to happen even when I make other changes and then the platform keeps working
we can ignore and merge if it's not (close to) fixed today.
Given that we're hitting this in four different places I'd like to take some extra time.
I have assembled 4 good/bad versions of executables in a tar. If we find what all bad executables have in common we should be able to figure out the cause :)
so, I'm trying some debugging here. My version segfaults at this pop instruction.
โ 0x555555691ffa <__libc_csu_init+90> pop rbx
โ 0x555555691ffb <__libc_csu_init+91> pop rbp
โ 0x555555691ffc <__libc_csu_init+92> pop r12
โ 0x555555691ffe <__libc_csu_init+94> pop r13
โ >0x555555692000 <__libc_csu_init+96> pop r14
โ 0x555555692002 <__libc_csu_init+98> pop r15
โ 0x555555692004 <__libc_csu_init+100> ret
that could make sense if the stack got corrupted somehow. the stack pointer value here:
rax 0x0 0
rbx 0x555555691fa0 93824993533856
rcx 0x555555691fa0 93824993533856
rdx 0x7fffffffe068 140737488347240
rsi 0x0 0
rdi 0x5555556f3108 93824993931528
rbp 0x0 0x0
rsp 0x7fffffffdf58 0x7fffffffdf58
r8 0x0 0
r9 0x7ffff7fe0d60 140737354009952
r10 0x1 1
stores the r14 register based on the frame info
Stack level 0, frame at 0x7fffffffdf70:
rip = 0x555555692000 in __libc_csu_init; saved rip = 0x7ffff7d8f010
called by frame at 0x7fffffffe040
Arglist at 0x7fffffffdf50, args:
Locals at 0x7fffffffdf50, Previous frame's sp is 0x7fffffffdf70
Saved registers:
rbx at 0x7fffffffdf38, rbp at 0x7fffffffdf40, r12 at 0x7fffffffdf48, r13 at 0x7fffffffdf50, r14 at 0x7fffffffdf58, r15 at 0x7fffffffdf60,
rip at 0x7fffffffdf68
So, notice how the arglist and locals start at a stack offset that is in the middle of that sequence of stored registers. A working version does not do that
Stack level 0, frame at 0x7fffffffdf70:
rip = 0x555555691ff5 in __libc_csu_init; saved rip = 0x7ffff7d8f010
called by frame at 0x7fffffffe040
Arglist at 0x7fffffffdf30, args:
Locals at 0x7fffffffdf30, Previous frame's sp is 0x7fffffffdf70
Saved registers:
rbx at 0x7fffffffdf38, rbp at 0x7fffffffdf40, r12 at 0x7fffffffdf48, r13 at 0x7fffffffdf50, r14 at 0x7fffffffdf58, r15 at 0x7fffffffdf60,
rip at 0x7fffffffdf68
so, somehow the stack got messed up? Here my knowledge kind of runs out. I don't think the bug is in this function, it must already kind of start with a messed-up stack. Do we influence somehow where the stack starts?
hmm, or not? it looks like push/pop change the location of the arglist/locals
at which point I'm back to: why ever would a pop cause a segfault?
@Brendan Hansknecht any thoughts here?
Probably the same root cause as #6121
No immediate other thoughts. I probably need to play with it for a bit so that I can refresh my brain on the details. Probably can do that later today.
ok well my datapoint then is that with gdb/lldb the error is different to valgrind. based on the naming it seems to error before any roc code is even called. The printing of the answer also does not happen when running with the debugger, so that seems to match up.
probably the same root cause, but the debugger changes something about the environment that makes the problem show up in a different place
The printing of the answer also does not happen when running with the debugger
The printing does happen for me inside gdb
glue_option_bad_vs_good.png
I found a suspicious diff when printing the string variable with gdb for the failing option test. The left is for the segfaulting version, and the right for the good version (with the unused extra eprintln). The bytes in the SmallString should form "Hello World!" but that does not seem to work out with my understanding of utf8 :sweat_smile: Also should the length be 0?
Going to do some CI maintenance, all servers should be available again in ~60 mins
I'm seeing this failure on multiple branches - maybe the macOS x64 machine's LLVM setup is misconfigured somehow?
https://github.com/roc-lang/roc/actions/runs/8842548683/job/24281394888?pr=6676#step:6:283
https://github.com/roc-lang/roc/actions/runs/8842567707/job/24281450979?pr=6678#step:6:284
I'll check it out, I updated z3 on that server Wednesday, homebrew may have updated other things too
The macos x64 issues have been resolved but I'm still encountering some strange linking issue on macos aarch64. WIll continue investigating tomorrow...
Ah, glad it wasnโt just me!
macos apple silicon issue will be resolved with PR#6680
cli_run::static_site_gen is experiencing flaky segfaults, I'll check it out
I think https://github.com/roc-lang/roc/pull/6849 got stuck. this may be my fault because for some reason turning an assert into a debug_assert triggers an infinite loop that eats all ram. So maybe some of those machines didn't recover properly from that earlier
I'll check it out
I restarted both macos servers, they indeed ran out of memory. Both servers are running jobs again now :)
turning an assert into a debug_assert triggers an infinite loop that eats all ram
That does not make sense to me. Especially when dealing with bools. I am really curious the actual cause.
if anyone wants to look at the details https://github.com/roc-lang/roc/pull/6849/files#r1659918271
We are seeing this failure in Ci for @Agus Zubiaga 's PR on module params.
---- lowlevel_list_calls stdout ----
thread 'lowlevel_list_calls' panicked at crates/valgrind/src/lib.rs:202:9:
`valgrind` exited with exit code 1. valgrind stdout was: ""
valgrind stderr was: "--41196-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--41196-- si_code=1; Faulting address: 0x2C9000; sp: 0x1002db94f0
valgrind: the 'impossible' happened:
Killed by fatal signal
host stacktrace:
==41196== at 0x581985FA: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x581BA239: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x581BAAE3: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x58146BE6: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x58147D32: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x5812B932: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x5812C2A0: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x5805A431: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x5809B269: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==41196== by 0x580E3F40: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 41196)
client stack range: [0x1FFEFF7000 0x1FFF000FFF] client SP: 0x1FFEFF74D0
valgrind stack range: [0x1002CBA000 0x1002DB9FFF] top usage: 18744 of 1048576
I cannot reproduce the issue locally on my linux x64 dev machine
Just looking for ideas for how to investigate further.
Perhaps the best COA is to log this as an issue, and skip this test in the PR as it looks unrelated. That way we can unblock work on Module Params.
This looks similar to the other CI issue Anton has been investigating for basic-cli 0.13.0, so it may already be tracked by an issue.
Luke Boswell said:
Perhaps the best COA is to log this as an issue, and skip this test in the PR as it looks unrelated. That way we can unblock work on Module Params.
I agree. Though maybe I'm missing something, I don't see anything in the Module Params PR that would cause this. We should make an issue and unblock the PR
This looks similar to the other CI issue Anton has been investigating for basic-cli 0.13.0,
It is possible they have the same cause, but this is a different kind of valgrind error.
I'll try do a bit of investigating this week and we can ignore it if I don't make progress
CI failures in https://github.com/roc-lang/roc/pull/7016 look like a flake. I'm very sure this is unrelated to adding more tests for leb128.
test tests::tuple ... ok
This Roc code crashed with: "Hit an erroneous type when creating a layout for `Num.add`"
test tests::type_problem_binary_operator ... ok
This Roc code crashed with: "Hit an erroneous type when creating a layout for `Num.add`"
test tests::type_problem_function ... ok
This Roc code crashed with: "Hit an erroneous type when creating a layout for `Bool.not`"
test tests::type_problem_unary_operator ... ok
failures:
---- tests::list_concat stdout ----
thread 'tests::list_concat' panicked at crates/repl_test/src/cli.rs:96:13:
repl exited unexpectedly before finishing evaluation. Exit status was ExitStatus(unix_wait_status(25856)) and stderr was ""
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: repl_test::cli::repl_eval
3: repl_test::cli::expect_success
4: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
tests::list_concat
test result: FAILED. 142 passed; 1 failed; 2 ignored; 0 measured; 0 filtered out; finished in 6.47s
Power is out, some work going on in the street. Not sure how long it will take, we didn't receive any communication
Back up
anyone know what these CI failures are? https://github.com/roc-lang/roc/pull/7795
error: failed to spawn build runner C:\Users\runneradmin\AppData\Local\zig\o\5cc848492ce7dcb02ce75ed22301d2f5\build.exe: FileNotFound
Suggest that zig failed to build the build.zig script
Or some sort of caching issue
The new zig caching action may use too much data...hmm... This really should be one global cache for all this, but I think it is per branch...worse, it seems to be making many caches per branch...more than just per os/arch, but also has a timestamp...
Screenshot 2025-05-18 at 12.49.55โฏPM.png
can't we have a lru or TTL policy on the cache
Github uses LRU policy for kicking old data
But I'm not convinced it is caching correctly vs just spinning up tons of duplicate caches
Not sure fully though
anyone know what these CI failures are?
I nuked the zig cache and it is working now. So yeah...caching
https://github.com/roc-lang/roc/pull/7796
Last updated: Jul 06 2025 at 12:14 UTC