libstore: split DerivationBuilder::unprepareBuild to fix flock self-deadlock (#15846)#15859
Open
RazYang wants to merge 1 commit into
Open
libstore: split DerivationBuilder::unprepareBuild to fix flock self-deadlock (#15846)#15859RazYang wants to merge 1 commit into
RazYang wants to merge 1 commit into
Conversation
…eadlock Fixes NixOS#15846. `PathLocks::lockPaths` uses `flock(2)`, whose locks are associated with the open file description rather than the process. When a goal running `registerOutputs` tries to flock the actual output path while a sibling goal in the same daemon already holds that path's lock via a different open file description, the flock blocks even though both file descriptors live in the same process. Because `registerOutputs` runs synchronously from the build goal coroutine, that flock freezes the daemon's worker thread; the sibling goal can no longer make progress (its builder's stdout pipe fills and the build blocks on write), and the daemon deadlocks. The reporter's `lsof` shows the same `-source.lock` file open on two file descriptors of one process. Split `DerivationBuilder` output registration into two phases: * `unprepareBuild()` tears down the build environment, validates the builder exit status, canonicalises the scratch outputs and computes their content-addressed paths. It returns the set of output store paths that the per-derivation pre-build locks do not already cover (floating CA outputs, and fixed-output derivations whose actual hash differs from the declared one). It no longer moves outputs. * `registerOutputs()` moves the prepared outputs into the store, registers them as valid, and applies output checks. The caller is responsible for holding `PathLocks` on the paths returned by `unprepareBuild()`. `DerivationBuildingGoal::buildLocally` acquires the dynamic output locks between the two calls using the same try-lock + `co_await waitForAWhile()` retry pattern already used for the pre-build locks in `acquireResources`. The worker thread yields the coroutine instead of blocking on flock, so any sibling goal holding the lock can progress and release it.
Author
|
@xokdvium — would appreciate your eyes on this if you have a moment, given the libstore/coroutine angle. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Fixes #15846.
PathLocks::lockPathsusesflock(2), whose locks are associated with theopen file description rather than the process. When a goal running
registerOutputstries to flock the actual output path while a siblinggoal in the same daemon already holds that path's lock via a different
open file description, the flock blocks even though both file descriptors
live in the same process. Because
registerOutputsruns synchronouslyfrom the build goal coroutine, that flock freezes the daemon's worker
thread; the sibling goal can no longer make progress (its builder's
stdout pipe fills and the build blocks on write), and the daemon
deadlocks. The reporter's
lsofshows the same-source.lockfile openon two file descriptors of one process — the smoking gun for this
pattern.
Approach
Split
DerivationBuilderoutput registration into two phases so thedynamic output locks can be acquired from the goal coroutine, where a
non-blocking try-lock can yield instead of blocking the worker:
unprepareBuild()tears down the build environment, validates thebuilder exit status, canonicalises the scratch outputs and computes
their content-addressed paths. It now returns the set of output store
paths that the per-derivation pre-build locks do not already cover
(floating CA outputs, and fixed-output derivations whose actual hash
differs from the declared one). It no longer moves outputs.
registerOutputs()moves the prepared outputs into the store,registers them as valid, and applies output checks. The caller is
responsible for holding
PathLockson the paths returned byunprepareBuild().DerivationBuildingGoal::buildLocallyacquires the dynamic output locksbetween the two calls using the same try-lock +
co_await waitForAWhile()retry pattern already used for the pre-build locks inacquireResources. The worker thread yields the coroutine instead ofblocking on flock, so any sibling goal holding the lock can progress and
release it.
The set of paths locked, and the order in which they are locked, is
unchanged versus the pre-existing code; only the call site (and thus
the blocking discipline) moves.
Notes on behavioural preservation
The
finish()call that updatesoutputRewritesfor subsequentiterations now runs at the end of phase 1 (before the move) instead of
at the end of phase 2 (after the move). The update is guarded with
if (buildMode != bmCheck)so that bmCheck does not callfinish()for newly-built outputs, matching the prior behaviour.
Per-output state that previously lived in the third loop's stack frame
(
newInfo,actualPath,references,tempDirFd,delTempDir) isstashed on the builder as
pendingOutputs. The RAII guards arecleared after the move loop in phase 2 (success) or by the
DerivationBuilderdestructor (failure paths).ChrootLinuxDerivationBuilder::unprepareBuildis updated to the newsignature; its namespace-reset behaviour is preserved.
I deliberately avoided fcntl OFD locks: they have the same per-OFD
semantics as
flockand would not fix the in-process collision.Plain POSIX
F_SETLKis process-wide but has well-known "close any fddrops all locks" semantics that interact badly with how
PathLocksmanages file descriptors. Moving the lock acquisition into the
coroutine seemed like the smallest principled fix.
Test plan
nix buildon the patched tree builds clean. The full test suitederivations (
nix-store-tests-run,nix-expr-tests-run,nix-flake-tests-run,nix-fetchers-tests-run,nix-functional-tests-...) all build green.nix build -j3 'github:RazYang/nix_deadlock') no longer hangs. Three fetchers whosedeclared and actual hashes disagree now serialise at the dynamic-lock
stage instead of deadlocking the worker thread.
finish()re-timing in phase 1 vs phase 2 (derivation-builder.ccin the third loop). I believe the bmCheck guard preserves the prior
semantics exactly, but a second opinion would be welcome.
DerivationBuilderinterface change(
unprepareBuild()return type, newregisterOutputs()) isacceptable, or whether you'd prefer a non-breaking internal
refactor with a wrapper. Only
ChrootLinuxDerivationBuilderoverrides this in-tree.
Known follow-up
A pre-existing benign
error (ignored): cannot rename: Permission deniedfrom
ChrootDerivationBuilder::cleanupBuild's rescue-rename loop becomesvisible after this fix (it was previously masked by the deadlock — the
build never reached the post-build cleanup path). The rescue path uses an
initialOutputssnapshot of validity rather than querying the store, soon a successful build it still tries to rename the newly-built outputs
over their (now-registered) destinations. The build result is unaffected
and the error is logged with the
(ignored)prefix. Happy to address itas a follow-up commit on this PR or as a separate PR — let me know which
you'd prefer.
Disclosure
Parts of this change were drafted with the assistance of an LLM (Anthropic
Claude, via Claude Code). The LLM helped with: navigating the existing
registerOutputsbody to find a clean split point, drafting thetwo-phase API, and explaining the
flock(2)per-OFD semantics thatcaused the original deadlock.
All architectural decisions, the choice between candidate fixes,
behavioural-preservation reasoning, build/test verification, and the
final code were reviewed and verified by me. The reproducer is mine; the
backtrace and
lsofevidence in the linked issue are real and werecaptured before this work began. I am responsible for the correctness of
the patch.