Skip to content

ci: cancel in-progress jobs immediately when a sibling fails#5732

Merged
jyotshnayaparla-00 merged 2 commits intofeldera:mainfrom
jyotshnayaparla-00:ci-fast-fail-on-job-failure
Mar 2, 2026
Merged

ci: cancel in-progress jobs immediately when a sibling fails#5732
jyotshnayaparla-00 merged 2 commits intofeldera:mainfrom
jyotshnayaparla-00:ci-fast-fail-on-job-failure

Conversation

@jyotshnayaparla-00
Copy link
Contributor

GitHub does not automatically abort running parallel jobs when one fails, they keep running until completion, which wastes runner time on long builds and tests.

Add a cancel sentinel job for each workflow job. Each sentinel depends only on its paired job ('needs: [invoke-X]') and runs 'if: failure()', so it starts the moment that job fails. It then calls the GitHub API to cancel the entire workflow run, stopping all other in-progress jobs.

Fixes #5534

@jyotshnayaparla-00 jyotshnayaparla-00 requested a review from gz March 2, 2026 18:40
Copy link
Collaborator

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sentinel pattern is the right approach here — GitHub Actions has no native "cancel siblings on failure" setting, so per-job cancel sentinels with if: failure() is the standard workaround. The YAML duplication is unfortunately unavoidable. -fsSL on the curl is fine; if the workflow is already being cancelled a second sentinel would just get a 409 and exit non-zero, which doesn't matter since it's being torn down anyway.

Copy link
Contributor

@gz gz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lets make sure we test it in the ci-test repo first before we end up merging

environment: ci
secrets: inherit

# GitHub doesn't cancel in-progress sibling jobs when one job fails — they keep running
Copy link
Contributor

@gz gz Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

minor nit: is it somehow possible to write some "function" in github that does this e.g., for eveyr job we define above liek


  invoke-publish-crates-dry-run:
    name: Publish Crates (Dry Run)
    needs: [invoke-build-rust]
    uses: ./.github/workflows/publish-crates.yml
    with:
      environment: ci
    secrets: inherit

it generates the necessary abort task/step?

(If this isn't possible we can maybe at least add a comment at the top where the real invoke-* things are to make sure we always add a cancel job in case we need to add a new job there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Composite actions only work for steps inside a job, not for generating job definitions themselves.

So following your suggestion, I added a comment above the invoke-* jobs to remind anyone adding a new job to also add a matching sentinel.

@jyotshnayaparla-00 jyotshnayaparla-00 force-pushed the ci-fast-fail-on-job-failure branch from c99252e to 501c028 Compare March 2, 2026 19:39
GitHub does not automatically abort running parallel jobs when one
fails — they keep running until completion, which wastes runner time
on long builds and tests.

Add a cancel sentinel job for each workflow job. Each sentinel depends
only on its paired job (`needs: [invoke-X]`) and runs `if: failure()`,
so it starts the moment that job fails. It then calls the GitHub API to
cancel the entire workflow run, stopping all other in-progress jobs.

Fixes feldera#5534
@jyotshnayaparla-00 jyotshnayaparla-00 force-pushed the ci-fast-fail-on-job-failure branch from 501c028 to 1dc301e Compare March 2, 2026 19:45
@jyotshnayaparla-00 jyotshnayaparla-00 added this pull request to the merge queue Mar 2, 2026
Merged via the queue into feldera:main with commit 2f1299e Mar 2, 2026
3 checks passed
@jyotshnayaparla-00 jyotshnayaparla-00 deleted the ci-fast-fail-on-job-failure branch March 2, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Aborting the CI run should abort all outstanding subtasks

3 participants