Stream: contributing

Topic: CI


view this post on Zulip Anton (Nov 18 2023 at 18:31):

If any of you were getting "No space left on device" errors for apple silicon test, that should now be resolved.

view this post on Zulip Norbert Hajagos (Nov 22 2023 at 07:42):

@Anton I just got one for https://github.com/roc-lang/roc/pull/5983. They ran ~8 hours ago with the last commit from main being 2afd9ca0a9b846fc4127b3d7fb55c521c6ae9ff9, which was done on Nov 20.
Here are the two failed runs: devtools macos and nix macos applse silicon.
I re-merged main into the PR branch, so the workflows are waiting for approval again, but idk if that fixes anything

view this post on Zulip Anton (Nov 22 2023 at 09:27):

I've approved the run, I think merging main should fix it, I very recently added a clean up step that should prevent "No space left on device".

view this post on Zulip Anton (Nov 22 2023 at 11:16):

The apple silicon CI server is not picking up jobs, I'm investigating...

view this post on Zulip Anton (Nov 22 2023 at 11:29):

Should be fixed now

view this post on Zulip Norbert Hajagos (Nov 22 2023 at 11:31):

Cool! Do I need to merge main, or do anything?

view this post on Zulip Anton (Nov 22 2023 at 12:24):

No, your jobs are in the queue and will be started automatically

view this post on Zulip Richard Feldman (Nov 22 2023 at 13:29):

maybe we should add some paths-ignore to our workflows so we don't run them when only .md files change, e.g.

on:
  push:
    branches:
      - main
    paths-ignore:
      - '**/*.md'
  pull_request:
    branches:
      - main
    paths-ignore:
      - '**/*.md'

we might need to make an exception for a workflow that rebuilds the website, since we have .md files which go into that

view this post on Zulip Anton (Nov 22 2023 at 14:06):

paths-ignore will not work unfortunately but a more convoluted solution using if: should be possible. I'll try to look at that this week.

view this post on Zulip Luke Boswell (Nov 23 2023 at 06:20):

Is it possible to also not run CI on Draft PR's? I just cancelled CI for something that is WIP, but I wanted to push it to a PR to record my progress.

view this post on Zulip Luke Boswell (Nov 23 2023 at 06:21):

Apparently this could work

on:
  push:
    branches:
      - main
  pull_request:
    branches: [main]
    paths:
      - "**"
      - "!/*.md"
      - "!/**.md"
    types:
      - ready_for_review

But you have to create them as Draft and then click "Ready for Review".... hmmm, I'm not good with this stuff

view this post on Zulip Brian Carroll (Nov 23 2023 at 08:10):

But you have to create them as Draft and then click "Ready for Review".... hmmm, I'm not good with this stuff

GitHub's blog post, from when they first announced the feature, shows a screenshot of how to do it. It still looks the same. Once you create it, the "ready for review" button appears near the bottom of the PR page.

view this post on Zulip Anton (Nov 24 2023 at 11:10):

Apparently this could work

I think there are some problems with that approach. Including [skip ci] in your commit message is a simple and effective way.

view this post on Zulip Ayaz Hafiz (Nov 24 2023 at 17:27):

For knowledge, what are the problems with the approach Luke listed?

view this post on Zulip Anton (Nov 24 2023 at 18:21):

push is only triggered after the PR is merged, which would be too late :p
If there are only md file changes, required checks (github settings) would not be completed, because of this issue. If you have only md changes, we could then run tests with the "ready for review" button but newcomers will then press this too early. I'm also not sure if anybody but the author can trigger "ready for review". Lots of people don't have CI privileges so "ready for review" will not actually start CI.

But I will hopefully be able to prevent unnecessary runs with some changes to CI tomorrow.

view this post on Zulip Anton (Nov 25 2023 at 19:17):

I've got a working prototype of the smarter orchestration, I will set it up in full next week
Screenshot_20231125_201542.png

view this post on Zulip Luke Boswell (Nov 25 2023 at 19:24):

That looks really great!

view this post on Zulip Anton (May 10 2024 at 14:27):

Ci issues with the failing static_site_gen test have been resolved :)
You can use the "update branch" button to get the fix on your branch.

view this post on Zulip Anton (Jun 07 2024 at 14:24):

macos-11 is deprecated by github CI and will soon be removed so I'm going to remove it from all our workflows.

view this post on Zulip Luke Boswell (Jun 10 2024 at 00:28):

Looks like we have a common issue with CI missing xcrun on the X86-64 MacOS machine

view this post on Zulip Anton (Jun 10 2024 at 06:48):

Yeah that will be due to the upgrade to macos 12, I'll check it out.

view this post on Zulip Anton (Jun 10 2024 at 10:27):

The xcrun issue has been fixed, I'm doing a full test run now to see if any issues come up

view this post on Zulip Anton (Jun 10 2024 at 10:48):

Test run succeeded :)

view this post on Zulip Anton (Dec 16 2024 at 13:11):

You may hit this issue when running CI on macos #7380:

error: failed to run custom build command for `roc_bitcode_bc v0.0.1 (/Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_RELEASE-WITH-LTO_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `/Users/m1ci/actions-runner2/_work/roc/roc/target/release-with-lto/build/roc_bitcode_bc-8ac377685705d80e/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=build.rs
  Compiling host ir to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-host.ll
  Compiling 64-bit bitcode to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-host.bc
  Compiling host ir to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-wasm32.ll
  Compiling 64-bit bitcode to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-wasm32.bc

  --- stderr
  An internal compiler expectation was broken.
  This is definitely a compiler bug.
  Please file an issue here: <https://github.com/roc-lang/roc/issues/new/choose>
  zig build ir-wasm32 -Drelease=true failed with:

I'm investigating it now

view this post on Zulip Anton (Dec 16 2024 at 14:51):

i've disconnected the macos apple Silicon CI server so I don't have to debug this in my garage

view this post on Zulip Anton (Dec 16 2024 at 18:54):

That server is back up, the bug did not want to reproduce for me, just going add a retry workaround...

view this post on Zulip Anton (Dec 16 2024 at 19:00):

It is interesting to see the amount of flaky errors we accumulated in builtins/bitcode/build.rs:

 error_str.contains("FileNotFound")
 || error_str.contains("unable to save cached ZIR code")
 || error_str.contains("LLVM failed to emit asm")
 || error_str.contains("ir-wasm32 transitive failure")

Perhaps they share the same parallelism weirdness

view this post on Zulip Anton (Dec 20 2024 at 17:20):

I've seen a bunch more failures with builtins bitcode on macos, I'm working on a new workaround

It is interesting to see the amount of flaky errors we accumulated in builtins/bitcode/build.rs:

 error_str.contains("FileNotFound")
 || error_str.contains("unable to save cached ZIR code")
 || error_str.contains("LLVM failed to emit asm")
 || error_str.contains("ir-wasm32 transitive failure")

Perhaps they share the same parallelism weirdness

view this post on Zulip Brendan Hansknecht (Dec 20 2024 at 17:32):

Hopefully we can fix the root at some point

view this post on Zulip Anton (Dec 30 2024 at 11:42):

Heads up: CI is broken on main on nix apple silicon main, I'll check it out

view this post on Zulip Anton (Dec 30 2024 at 13:30):

This was due to 723e35f PR#7424 I'm going to revert it

view this post on Zulip Anton (Dec 30 2024 at 14:36):

Fixed in #7435

view this post on Zulip Jakub Konka (Jan 04 2025 at 10:47):

Is it just me or does it look like sometimes it takes a workflow over 1.5h to just compile roc, and sometimes it is almost instant. Is this some caching issue perhaps that then leads to some flakiness? On the other hand, I also occasionally see the following

error: failed to run custom build command for `roc_bitcode_bc v0.0.1 (/Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `/Users/m1ci/actions-runner2/_work/roc/roc/target/release/build/roc_bitcode_bc-01dfdbf045b35be2/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=build.rs
  Compiling host ir to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-host.ll
  Compiling 64-bit bitcode to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-host.bc
  Compiling host ir to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-wasm32.ll
  Compiling 64-bit bitcode to: /Users/m1ci/actions-runner2/_work/roc/roc/crates/compiler/builtins/bitcode/bc/../zig-out/builtins-wasm32.bc

  --- stderr
  An internal compiler expectation was broken.
  This is definitely a compiler bug.
  Please file an issue here: <https://github.com/roc-lang/roc/issues/new/choose>
  zig build ir-wasm32 -Drelease=true failed with:

    error: Unexpected


  Location: crates/compiler/builtins/bitcode/bc/build.rs:115:21

view this post on Zulip Anton (Jan 04 2025 at 10:48):

it takes a workflow over 1.5h to just compile roc

Does it get stuck, or does it actually finish in that time?

view this post on Zulip Anton (Jan 04 2025 at 10:50):

The bitcode/bc/build.rs issue is caused by multithreading but I don't know much more than that

view this post on Zulip Anton (Jan 04 2025 at 10:51):

Is this some caching issue

It's not due to a cache that we use for CI fyi, just seems to be a problematic interaction with the rust and zig build processes

view this post on Zulip Anton (Jan 04 2025 at 10:53):

We used to have a bandaid solution for this problem but it stopped working since the llvm 18 zig 13 upgrade

view this post on Zulip Jakub Konka (Jan 04 2025 at 10:54):

Anton said:

We used to have a bandaid solution for this problem but it stopped working since the llvm 18 zig 13 upgrade

Interesting, and what was the bandaid solution?

view this post on Zulip Jakub Konka (Jan 04 2025 at 10:54):

Anton said:

it takes a workflow over 1.5h to just compile roc

Does it get stuck, or does it actually finish in that time?

I've seen it do both: finish and not finish in time

view this post on Zulip Anton (Jan 04 2025 at 10:59):

Interesting, and what was the bandaid solution?

Retry the failing command up to 10 times

view this post on Zulip Jakub Konka (Jan 04 2025 at 11:05):

Ahh classic. FWIW I've seen a similar solution used in the past https://github.com/rust-lang/rust/pull/40422/files Interestingely, last I checked this bit of code is still present in the rustc implementation.

view this post on Zulip Jakub Konka (Jan 04 2025 at 11:50):

Argh, looks like clippy is hanging on the M1 https://github.com/roc-lang/roc/actions/runs/12608939154/job/35144084934?pr=7455

view this post on Zulip Anton (Jan 06 2025 at 15:12):

I'm going to look at the flaky CI issues now

view this post on Zulip Anton (Jan 06 2025 at 17:48):

We've picked up a new issue with nix-linux-x86-64-tests:

 test cli_tests::test_platform_effects_zig::effectful_form has been running for over 60 seconds

Looking at it now

view this post on Zulip Anton (Jan 07 2025 at 13:42):

Fixed in PR#7475

view this post on Zulip Anton (Feb 03 2025 at 17:37):

I'm going to disconnect the macos x64 CI machine for easier debugging of a build failure.

view this post on Zulip Anton (Feb 03 2025 at 19:01):

Fixed :)

view this post on Zulip Anton (Feb 04 2025 at 13:16):

If you get the error below, update your branch with latest main:

Run zig version
/Users/m1ci/actions-runner2/_work/_temp/f59e70e6-faef-46ea-a406-0a88e0decf65.sh: line 1: zig: command not found
Error: Process completed with exit code 127.

view this post on Zulip Anton (Feb 07 2025 at 21:19):

The new CI workflow is on main :)
.github/workflows/ci_zig.yml is called if there are changes to the src folder, build.zig or build.zig.zon; modify the two lists here if you want to alter that change detection. The old workflows are not called if the changes are only to new compiler files.

If you want to add additional CI checks for the new compiler they can be added here.

I did a bunch of testing, but I may still have missed something, so feel free to mention me if you think something's off.

view this post on Zulip Brendan Hansknecht (Feb 07 2025 at 21:36):

As a note, I think you can just zig build test instead of doing any sort of direct exe running or oa based checks

view this post on Zulip Anton (Feb 08 2025 at 12:55):

I'm going to disconnect macos x64 CI for investigation again, it's hitting the same issue as before. If your changes are limited to the new compiler files, CI should still be able to complete.

view this post on Zulip Anton (Feb 08 2025 at 18:24):

macos x64 CI is back up but I'm still trying workarounds

view this post on Zulip Anton (Feb 24 2025 at 16:16):

Going to do some CI maintenance, this should not affect zig compiler workflows, those all use github CI machines

view this post on Zulip Anton (Feb 24 2025 at 16:37):

Done

view this post on Zulip Anton (Mar 04 2025 at 17:27):

Looks like a recent github image runner update broke something on our windows-2022 tests, I'm looking at it now

view this post on Zulip Anton (Mar 04 2025 at 18:35):

It's only on my PR but it doesn't make any sense given the changes :thinking:

view this post on Zulip Brendan Hansknecht (Mar 04 2025 at 18:40):

Rust or zig? Also, have a link?

view this post on Zulip Anton (Mar 04 2025 at 18:41):

zig
https://github.com/roc-lang/roc/actions/runs/13658753462/job/38188514556

view this post on Zulip Anton (Mar 04 2025 at 18:43):

It may be due to random CI machines, some that have the update and some that don't

view this post on Zulip Anton (Mar 04 2025 at 18:46):

No, they're the same version :sweat_smile:
I'm just going to open a new PR, this one is haunted

view this post on Zulip Brendan Hansknecht (Mar 04 2025 at 18:53):

Best I can guess is zig cache issue and corrupted download. Worst case, try adding:

rm -rf .zig-cache zig-out $ZIG_LOCAL_CACHE_DIR

...
Actually, might need to nuke the global cache.

view this post on Zulip Brendan Hansknecht (Mar 04 2025 at 18:55):

Not sure it would make a difference, but I know that the mlugg zig cache thing sets some cache folders (I assume it does this cause it saves the folders).

view this post on Zulip Anton (Mar 04 2025 at 18:56):

Uhu, that could be a likely cause, it passed in the new PR, so I think we're good now but we know where to look if we see it again

view this post on Zulip Anton (Mar 10 2025 at 19:11):

CI is now calling old rust workflows when it doesn't need to :(
I'll try to fix it tomorrow.

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:13):

If you are referring to this PR specifically, There is a zig file from the old compiler that got reformatted: https://github.com/roc-lang/roc/pull/7672

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:13):

So CI may be working as expected.

view this post on Zulip Luke Boswell (Mar 11 2025 at 05:19):

It's not just that PR. I've had to force merge a couple today that shouldnt have ran the old workflows

view this post on Zulip Luke Boswell (Mar 11 2025 at 05:20):

I wanted the changes in main so I could merge for my PR and avoid more conflicts, so I didnt wait for Anton to look at it.

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:31):

more-sexprs ran full ci due to editing .gitignore

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:33):

improve-zig-comments looks like it may have been a path filters bug/issue...

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:36):

Oh, found it: https://github.com/roc-lang/roc/pull/7683

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:36):

it removed the predicate-quantifier

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:40):

Fix: https://github.com/roc-lang/roc/pull/7685

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 05:43):

Aside, we may want to go through and explicitly ignore ci an certain files/folders like .gitignore. Would just take expanding the !file list in that filter.

view this post on Zulip Anton (Mar 11 2025 at 09:03):

Brendan Hansknecht said:

Fix: https://github.com/roc-lang/roc/pull/7685

Yeah, I removed it because I spotted a warning in CI saying that predicate-quantifier was not a valid option, but now I no longer see it :shrug:

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 15:42):

Yeah, the warning was a bug that got fixed last week

view this post on Zulip Brendan Hansknecht (Mar 11 2025 at 15:42):

They added the feature, but forgot to document it, so GitHub didn't know it existed

view this post on Zulip Anton (Mar 11 2025 at 15:43):

Oh, that makes sense :)

view this post on Zulip Anton (Mar 25 2025 at 15:46):

I may have finally found a workaround for the nix apple silicon workflow failures (old compiler), it ran successfully 3 times in a row :)


Last updated: Jul 05 2025 at 12:14 UTC