ci: cancel in-progress jobs immediately when a sibling fails#5732
Conversation
mythical-fred
left a comment
There was a problem hiding this comment.
LGTM. Sentinel pattern is the right approach here — GitHub Actions has no native "cancel siblings on failure" setting, so per-job cancel sentinels with if: failure() is the standard workaround. The YAML duplication is unfortunately unavoidable. -fsSL on the curl is fine; if the workflow is already being cancelled a second sentinel would just get a 409 and exit non-zero, which doesn't matter since it's being torn down anyway.
gz
left a comment
There was a problem hiding this comment.
ok lets make sure we test it in the ci-test repo first before we end up merging
| environment: ci | ||
| secrets: inherit | ||
|
|
||
| # GitHub doesn't cancel in-progress sibling jobs when one job fails — they keep running |
There was a problem hiding this comment.
looks good
minor nit: is it somehow possible to write some "function" in github that does this e.g., for eveyr job we define above liek
invoke-publish-crates-dry-run:
name: Publish Crates (Dry Run)
needs: [invoke-build-rust]
uses: ./.github/workflows/publish-crates.yml
with:
environment: ci
secrets: inherit
it generates the necessary abort task/step?
(If this isn't possible we can maybe at least add a comment at the top where the real invoke-* things are to make sure we always add a cancel job in case we need to add a new job there)
There was a problem hiding this comment.
was thinking if some of thsi stuff might work?
https://docs.github.com/en/actions/tutorials/create-actions/create-a-composite-action
There was a problem hiding this comment.
Good point! Composite actions only work for steps inside a job, not for generating job definitions themselves.
So following your suggestion, I added a comment above the invoke-* jobs to remind anyone adding a new job to also add a matching sentinel.
c99252e to
501c028
Compare
GitHub does not automatically abort running parallel jobs when one fails — they keep running until completion, which wastes runner time on long builds and tests. Add a cancel sentinel job for each workflow job. Each sentinel depends only on its paired job (`needs: [invoke-X]`) and runs `if: failure()`, so it starts the moment that job fails. It then calls the GitHub API to cancel the entire workflow run, stopping all other in-progress jobs. Fixes feldera#5534
501c028 to
1dc301e
Compare
GitHub does not automatically abort running parallel jobs when one fails, they keep running until completion, which wastes runner time on long builds and tests.
Add a cancel sentinel job for each workflow job. Each sentinel depends only on its paired job ('needs: [invoke-X]') and runs 'if: failure()', so it starts the moment that job fails. It then calls the GitHub API to cancel the entire workflow run, stopping all other in-progress jobs.
Fixes #5534