gc: add CollectResult, stats fields, get_referrers, and fix count reset#7354
gc: add CollectResult, stats fields, get_referrers, and fix count reset#7354youknowone wants to merge 6 commits intoRustPython:mainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces a public Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python (gc module)
participant VM as VirtualMachine
participant GC as GcState
participant Heap as Object Heap
Py->>VM: gc.collect_force(generation)
VM->>GC: collect_force(generation)
GC->>Heap: scan generations / mark & sweep (candidates)
GC-->>GC: measure duration (start / stop)
GC->>VM: return CollectResult{collected, uncollectable, candidates, duration}
VM->>Py: invoke_callbacks(phase, generation, &CollectResult)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] test: cpython/Lib/test/test_gc.py (TODO: 2) dependencies: dependent tests: (167 tests)
Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/vm/src/stdlib/gc.rs (1)
191-202: Short-circuitget_referrers()when no targets are provided.At Line 201, this still scans all tracked objects when
argsis empty, but the result must be empty. An early return avoids an unnecessary heap-wide scan.Suggested patch
let targets: HashSet<usize> = args .args .iter() .map(|obj| obj.as_ref() as *const crate::PyObject as usize) .collect(); +if targets.is_empty() { + return vm.ctx.new_list(Vec::<PyObjectRef>::new()); +} + let mut result = Vec::new();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/gc.rs` around lines 191 - 202, get_referrers currently builds targets and then scans all tracked objects even when args.args is empty; short-circuit by returning an empty Vec immediately when args.args is empty (or when targets.is_empty()) before calling gc_state::gc_state().get_objects to avoid the heap-wide scan—update the get_referrers function to check args (or targets) and return early.crates/vm/src/gc_state.rs (1)
423-430: Extract the repeated “finish collection” block into one shared path.The branches at Line 423, Line 536, and Line 556 repeat the same duration calculation, stats update, and
CollectResultconstruction. Consolidating this will reduce maintenance drift.As per coding guidelines "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
Also applies to: 536-543, 556-563
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 423 - 430, Branches duplicate the "finish collection" sequence (compute duration via start_time.elapsed().as_secs_f64(), call self.generations[generation].update_stats(...), and return a CollectResult) in gc_state.rs; refactor by first assigning the differing numeric results (collected, uncollectable, candidates) into local variables in each branch, then move the common logic that computes duration, calls self.generations[generation].update_stats, and constructs/returns the CollectResult into a single shared path at the end of the function (referencing start_time, generation, self.generations, update_stats, and CollectResult) so the repeated block is centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 420-422: The loop in gc_state.rs that zeroes generation counters
(the for i in 0..=generation { self.generations[i].count.store(0, ...) } block)
can corrupt counts during a full GC because promote_survivors can exit early
leaving live objects in lists while counters are zeroed; update this by guarding
zeroing so it does not run for the full-collection generation (e.g., only zero
for non-full collections or when generation != FULL_GEN/2), or alternatively
recompute counts from the actual generation lists when generation ==
full-getting-value; apply the same fix to the other reset sites noted (the
similar zeroing at the blocks referenced and ensure untrack_object's fetch_sub
won't underflow by preserving correct counters).
---
Nitpick comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 423-430: Branches duplicate the "finish collection" sequence
(compute duration via start_time.elapsed().as_secs_f64(), call
self.generations[generation].update_stats(...), and return a CollectResult) in
gc_state.rs; refactor by first assigning the differing numeric results
(collected, uncollectable, candidates) into local variables in each branch, then
move the common logic that computes duration, calls
self.generations[generation].update_stats, and constructs/returns the
CollectResult into a single shared path at the end of the function (referencing
start_time, generation, self.generations, update_stats, and CollectResult) so
the repeated block is centralized.
In `@crates/vm/src/stdlib/gc.rs`:
- Around line 191-202: get_referrers currently builds targets and then scans all
tracked objects even when args.args is empty; short-circuit by returning an
empty Vec immediately when args.args is empty (or when targets.is_empty())
before calling gc_state::gc_state().get_objects to avoid the heap-wide
scan—update the get_referrers function to check args (or targets) and return
early.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 653e5adb-1fc3-44e1-a8b5-9ab538cc9236
⛔ Files ignored due to path filters (1)
Lib/test/test_gc.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/vm/src/gc_state.rscrates/vm/src/stdlib/gc.rs
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 8097-8100: specialization_eval_frame_active currently always
returns false, disabling eval-frame guards; change it to consult the VM
tracing/profiling state so specializations are suppressed when tracing is
active. Replace the constant false with a check that returns false when
tracing/profile hooks are enabled — e.g. use vm.use_tracing.get() (and, if
available in the VM API, also check for frame trace callbacks such as
vm.has_frame_trace_callbacks() or vm.tracefunc presence) and only return true
when tracing/profile hooks are inactive. Update the body of
specialization_eval_frame_active(&self, vm: &VirtualMachine) to use these checks
so fast paths (SendGen, ForIterGen, specialized call variants) are disabled
during active tracing/profiling.
In `@crates/vm/src/gc_state.rs`:
- Around line 536-538: The reset loop currently iterates `for i in
0..generation` which differs from other reset paths; change it to compute
`reset_end = if generation >= 2 { 2 } else { generation + 1 }` and iterate `for
i in 0..reset_end` so `self.generations[i].count.store(0, Ordering::SeqCst)`
clears the same set of generations as the other code paths (referencing the
`generation` variable and the `self.generations[..].count.store` calls).
- Around line 556-558: Replace the manual for-loop that resets generation counts
with the same implementation used earlier to avoid the inconsistent reset
behavior: iterate over self.generations up to generation (e.g.,
self.generations.iter().take(generation)) and reset each generation's count
using the same store/order semantics used in the previous fix (match the exact
Ordering and method used there) so that the reset logic for self.generations and
count.store mirrors the earlier corrected code path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 52975f19-5596-4763-b703-c4a1be226b57
📒 Files selected for processing (3)
crates/vm/src/coroutine.rscrates/vm/src/frame.rscrates/vm/src/gc_state.rs
fad1b25 to
36e912b
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/gc_state.rs (1)
527-529:⚠️ Potential issue | 🟠 MajorUnify generation counter reset bounds in these early-return paths.
Line 527 and Line 547 still use
for i in 0..generation, while other return paths (Line 416 and Line 693) use thereset_endlogic. This makes post-collection counter resets path-dependent and can skewgc.get_count()behavior after successful collections with no unreachable objects.Proposed fix
- for i in 0..generation { + let reset_end = if generation >= 2 { 2 } else { generation + 1 }; + for i in 0..reset_end { self.generations[i].count.store(0, Ordering::SeqCst); }Apply the same change in both blocks (Line 527-529 and Line 547-549).
As per coding guidelines, when branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code.
Also applies to: 547-549
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 527 - 529, The two early-return blocks reset per-generation counters using for i in 0..generation which is inconsistent with other paths that use the shared reset_end logic; replace these loops to compute the same reset_end value used elsewhere and then run the common reset loop that sets self.generations[i].count.store(0, Ordering::SeqCst). Concretely, in the blocks containing for i in 0..generation (the occurrences around self.generations[i].count.store...), compute reset_end the same way other branches do (matching the logic used at lines that currently use reset_end) and then call the shared loop that iterates 0..reset_end to zero the counters so all return paths use the identical bounds and logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 527-529: The two early-return blocks reset per-generation counters
using for i in 0..generation which is inconsistent with other paths that use the
shared reset_end logic; replace these loops to compute the same reset_end value
used elsewhere and then run the common reset loop that sets
self.generations[i].count.store(0, Ordering::SeqCst). Concretely, in the blocks
containing for i in 0..generation (the occurrences around
self.generations[i].count.store...), compute reset_end the same way other
branches do (matching the logic used at lines that currently use reset_end) and
then call the shared loop that iterates 0..reset_end to zero the counters so all
return paths use the identical bounds and logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d3c56248-f6a8-4a1f-a6d9-4ce2e4718315
⛔ Files ignored due to path filters (1)
Lib/test/test_gc.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/vm/src/gc_state.rscrates/vm/src/stdlib/gc.rs
c995570 to
3b7879e
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
crates/vm/src/gc_state.rs (1)
526-528:⚠️ Potential issue | 🟠 MajorFix generation count reset inconsistency in early-return paths.
At Line 526 and Line 546,
for i in 0..generationunder-resets counts (e.g., generation 0 resets nothing; generation 1 skips gen1). This diverges from thereset_endlogic at Line 692 and can leave stale counts.Suggested patch
if unreachable.is_empty() { drop(gen_locks); self.promote_survivors(generation, &survivor_refs); - for i in 0..generation { + let reset_end = if generation >= 2 { 2 } else { generation + 1 }; + for i in 0..reset_end { self.generations[i].count.store(0, Ordering::SeqCst); } let duration = start_time.elapsed().as_secs_f64(); self.generations[generation].update_stats(0, 0, candidates, duration); return CollectResult { @@ if unreachable_refs.is_empty() { self.promote_survivors(generation, &survivor_refs); - for i in 0..generation { + let reset_end = if generation >= 2 { 2 } else { generation + 1 }; + for i in 0..reset_end { self.generations[i].count.store(0, Ordering::SeqCst); } let duration = start_time.elapsed().as_secs_f64(); self.generations[generation].update_stats(0, 0, candidates, duration); return CollectResult {As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Also applies to: 546-548
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 526 - 528, The loop that zeros generation counts uses 0..generation which omits the current generation (e.g., generation==1 skips generations[1]); change the range to include the current generation (use 0..=generation or equivalent) so counts for generations[0..=generation] are cleared; apply the same fix at the other occurrence (the second 0..generation block) and refactor by computing the inclusive end index once and calling the shared reset logic on self.generations[..=end].iter().for_each(|g| g.count.store(0, Ordering::SeqCst)) to avoid duplicated code and match the reset_end behavior.crates/vm/src/frame.rs (1)
8171-8173:⚠️ Potential issue | 🟠 MajorImplement
specialization_eval_frame_activeinstead of returning a constant.Line 8172 hardcodes
false, so all!self.specialization_eval_frame_active(vm)guards are effectively always enabled, even when frame-eval/tracing-sensitive paths should fall back.Suggested fix
- fn specialization_eval_frame_active(&self, _vm: &VirtualMachine) -> bool { - false + fn specialization_eval_frame_active(&self, vm: &VirtualMachine) -> bool { + vm.use_tracing.get() && !vm.is_none(&self.object.trace.lock()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 8171 - 8173, The method specialization_eval_frame_active currently returns a hardcoded false, which disables correct guards (see uses like !self.specialization_eval_frame_active(vm)); replace the constant with the real runtime check: consult the frame's state and the VM tracing/eval flags (use the appropriate field or VM accessor that indicates frame-eval/tracing is active) and return that boolean so callers correctly detect when frame-eval/tracing-sensitive paths must fall back; update specialization_eval_frame_active(&self, _vm: &VirtualMachine) to read and return that real condition rather than false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 8171-8173: The method specialization_eval_frame_active currently
returns a hardcoded false, which disables correct guards (see uses like
!self.specialization_eval_frame_active(vm)); replace the constant with the real
runtime check: consult the frame's state and the VM tracing/eval flags (use the
appropriate field or VM accessor that indicates frame-eval/tracing is active)
and return that boolean so callers correctly detect when
frame-eval/tracing-sensitive paths must fall back; update
specialization_eval_frame_active(&self, _vm: &VirtualMachine) to read and return
that real condition rather than false.
In `@crates/vm/src/gc_state.rs`:
- Around line 526-528: The loop that zeros generation counts uses 0..generation
which omits the current generation (e.g., generation==1 skips generations[1]);
change the range to include the current generation (use 0..=generation or
equivalent) so counts for generations[0..=generation] are cleared; apply the
same fix at the other occurrence (the second 0..generation block) and refactor
by computing the inclusive end index once and calling the shared reset logic on
self.generations[..=end].iter().for_each(|g| g.count.store(0, Ordering::SeqCst))
to avoid duplicated code and match the reset_end behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: f1a87e25-424d-47a0-aa70-390e46364f68
📒 Files selected for processing (4)
crates/vm/src/coroutine.rscrates/vm/src/frame.rscrates/vm/src/gc_state.rscrates/vm/src/stdlib/gc.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/vm/src/stdlib/gc.rs
06db3b4 to
ee0ffda
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
crates/vm/src/gc_state.rs (1)
526-528:⚠️ Potential issue | 🟡 MinorNormalize generation-count reset in these early-return branches.
At Line 526 and Line 546,
for i in 0..generationdiverges from the rest of the function’s reset behavior and can skip resetting generation 0/1 counts in paths where other branches do reset them.♻️ Proposed fix
- for i in 0..generation { + let reset_end = if generation >= 2 { 2 } else { generation + 1 }; + for i in 0..reset_end { self.generations[i].count.store(0, Ordering::SeqCst); } @@ - for i in 0..generation { + let reset_end = if generation >= 2 { 2 } else { generation + 1 }; + for i in 0..reset_end { self.generations[i].count.store(0, Ordering::SeqCst); }As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
Also applies to: 546-548
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 526 - 528, The early-return branches use for i in 0..generation which differs from other branches and can omit resetting generation 0/1; change this to use a single, consistent reset path: compute the target generation bound once (e.g. let end = generation) and then reset counts for all generations up through that bound (use 0..=end or a helper like self.reset_generation_counts(end) that calls self.generations[i].count.store(0, Ordering::SeqCst) for each i), and replace the duplicate loops at the spots using generation (the blocks that call self.generations[i].count.store) with that single common call.crates/vm/src/frame.rs (1)
8312-8315:⚠️ Potential issue | 🟠 Major
specialization_eval_frame_active()is still stubbed, so all eval-frame/tracing guards are effectively disabled.Line 8314 always returns
false, which means the new guard checks at Line 3156, Line 3195, Line 5355, Line 7709, and Line 8029 never block specialized execution. This can break tracing/profiling semantics.Suggested fix
- fn specialization_eval_frame_active(&self, _vm: &VirtualMachine) -> bool { - false + fn specialization_eval_frame_active(&self, vm: &VirtualMachine) -> bool { + vm.use_tracing.get() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 8312 - 8315, specialization_eval_frame_active currently always returns false, disabling all eval-frame/tracing guards; implement it to consult the VM's tracing/profiling state and return true only when eval-frame tracing is actually active. Specifically, inside specialization_eval_frame_active(&self, vm: &VirtualMachine) query the appropriate VM flags or methods (e.g., vm.is_tracing_enabled(), vm.trace_config().eval_frame_enabled, or the thread-local/frame-level trace flag used elsewhere) and return that boolean so the guard checks in the call sites (the specialization guards) correctly enable/disable specialized execution. Ensure you use the existing VM APIs for tracing state rather than introducing new globals.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 8312-8315: specialization_eval_frame_active currently always
returns false, disabling all eval-frame/tracing guards; implement it to consult
the VM's tracing/profiling state and return true only when eval-frame tracing is
actually active. Specifically, inside specialization_eval_frame_active(&self,
vm: &VirtualMachine) query the appropriate VM flags or methods (e.g.,
vm.is_tracing_enabled(), vm.trace_config().eval_frame_enabled, or the
thread-local/frame-level trace flag used elsewhere) and return that boolean so
the guard checks in the call sites (the specialization guards) correctly
enable/disable specialized execution. Ensure you use the existing VM APIs for
tracing state rather than introducing new globals.
In `@crates/vm/src/gc_state.rs`:
- Around line 526-528: The early-return branches use for i in 0..generation
which differs from other branches and can omit resetting generation 0/1; change
this to use a single, consistent reset path: compute the target generation bound
once (e.g. let end = generation) and then reset counts for all generations up
through that bound (use 0..=end or a helper like
self.reset_generation_counts(end) that calls self.generations[i].count.store(0,
Ordering::SeqCst) for each i), and replace the duplicate loops at the spots
using generation (the blocks that call self.generations[i].count.store) with
that single common call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 97fe905e-141b-4ef0-8f3a-534d909a531b
⛔ Files ignored due to path filters (1)
Lib/test/test_gc.pyis excluded by!Lib/**
📒 Files selected for processing (8)
crates/derive-impl/src/pyclass.rscrates/derive-impl/src/pymodule.rscrates/derive-impl/src/util.rscrates/vm/src/coroutine.rscrates/vm/src/frame.rscrates/vm/src/function/method.rscrates/vm/src/gc_state.rscrates/vm/src/stdlib/gc.rs
a3e2c87 to
c092718
Compare
- Add CollectResult struct with collected/uncollectable/candidates/duration - Add candidates and duration fields to GcStats and gc.get_stats() - Pass CollectResult to gc.callbacks info dict - Reset generation counts for all collected generations (0..=N) - Return 0 for third value in gc.get_threshold() (3.13+) - Implement gc.get_referrers() by scanning all tracked objects - Add DEBUG_COLLECTABLE output for collectable objects - Update test_gc.py to expect candidates/duration in stats
Taken from v3.15 (not v3.14.3) because get_stats() candidates/duration fields were added in 3.13+ and the corresponding test assertions only exist in 3.15.
…d exc leak - get_referrers: skip frame objects on the execution stack, since they are not GC-tracked in CPython (_PyInterpreterFrame) - _asyncio Future/Task make_cancelled_error_impl: clear the stored cancelled exception after returning it, matching the Python _make_cancelled_error behavior
c092718 to
357ff59
Compare
Summary
CollectResultstruct withcollected/uncollectable/candidates/durationfieldscandidatesanddurationtoGcStatsandgc.get_stats()to match CPython 3.13+CollectResulttogc.callbacksinfo dict0..=N), not just gen00for third value ingc.get_threshold()(CPython 3.13+ compat)gc.get_referrers()by scanning all tracked objectsDEBUG_COLLECTABLEoutput for collectable objectstest_gc.pyto expectcandidates/durationin statsTest plan
cargo fmt— no issuescargo clippy -p rustpython-vm— no warningscargo run -- -m test test_gc— 57 tests passed, 16 skipped🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Refactor