Replace GC tracking HashSet with intrusive linked list#7328
Replace GC tracking HashSet with intrusive linked list#7328youknowone merged 5 commits intoRustPython:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces HashSet-based GC tracking with intrusive per-generation Changes
Sequence Diagram(s)sequenceDiagram
participant Alloc as Allocator
participant GcState as GcState
participant GenList as LinkedList<GcLink, PyObject>
participant PyObj as PyObject
participant Collector as Collector
Alloc->>GcState: track_object(PyObject)
GcState->>GenList: push_front(PyObject) rgba(70,130,180,0.5)
note right of GenList: object inserted into gen0 list
Collector->>GcState: collect(generation)
GcState->>GenList: iterate -> build GcPtr set rgba(34,139,34,0.5)
Collector->>PyObj: reachability marking
Collector->>GcState: promote_survivors(from_gen, survivors)
GcState->>GenList: remove nodes from old list, push_front into higher-gen list rgba(255,165,0,0.5)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin gc-intrusive-linked-list |
35ee888 to
adf9282
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 703-708: freeze/unfreeze hold locks in opposite orders and set
GC_PERMANENT before insertion, causing deadlocks and transient inconsistent
state; fix both methods (freeze and unfreeze) to acquire locks in a consistent
order (always lock generation_lists entries first, then permanent_list) and
avoid marking objects permanent until they are inserted into permanent_list —
e.g. drain each generation into a temporary Vec while holding the
generation_list write lock, then acquire permanent_list write lock and
push_front each ptr and call ptr.as_ref().set_gc_generation(GC_PERMANENT) while
holding the permanent_list lock; apply the same consistent lock-ordering pattern
for the unfreeze path and the other affected block (lines ~721–727) referencing
generation_lists, permanent_list, set_gc_generation, GC_PERMANENT, and the
freeze/unfreeze functions.
- Around line 275-283: In untrack_object the GC_PERMANENT branch returns early
when list.remove(obj) is None, which can miss concurrent moves; change the logic
in untrack_object so that if unsafe { list.remove(obj) } returns None you do not
return immediately but instead re-check the object's generation and continue to
the appropriate handling (or loop/retry) so the object will still be untracked
if it moved between lists. Update the block that touches permanent_list,
permanent.count.fetch_sub, obj_ref.clear_gc_tracked, and
obj_ref.set_gc_generation(GC_UNTRACKED) to only perform those when a remove
actually succeeded; otherwise recompute obj_ref.get_gc_generation() (or retry)
and handle the object in the correct branch (GC_YOUNG/GC_OLD) instead of
returning.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/common/src/linked_list.rscrates/vm/src/gc_state.rscrates/vm/src/object/core.rscrates/vm/src/object/mod.rs
adf9282 to
c9f0692
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/gc_state.rs (1)
711-735:⚠️ Potential issue | 🔴 CriticalResolve lock-order inversion between
freezeandunfreeze(deadlock risk).Line 712 locks
permanent_listfirst, while Lines 733-734 lockgeneration_lists[2]thenpermanent_list. That opposing order can deadlock under concurrent calls.🔧 Suggested lock-order fix for
freezepub fn freeze(&self) { let mut count = 0usize; - let mut perm = self.permanent_list.write(); for (gen_idx, gen_list) in self.generation_lists.iter().enumerate() { let mut list = gen_list.write(); + let mut perm = self.permanent_list.write(); while let Some(ptr) = list.pop_front() { perm.push_front(ptr); unsafe { ptr.as_ref().set_gc_generation(GC_PERMANENT) }; count += 1; } self.generations[gen_idx].count.store(0, Ordering::SeqCst); } self.permanent.count.fetch_add(count, Ordering::SeqCst); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 711 - 735, The freeze function currently acquires self.permanent_list.write() before taking per-generation locks, which inverts the lock order used by unfreeze (which takes generation_lists[2].write() then permanent_list.write()) and can deadlock; fix by reordering locks in freeze to always acquire generation_lists (e.g., iterate and take each gen_list.write() or at least acquire generation_lists[2].write() before permanent_list.write()) so the lock ordering matches unfreeze (generation_lists -> permanent_list), and update the logic in freeze (and any helper blocks) to move nodes into permanent_list after holding the generation lock(s) but only take permanent_list.write() after those generation locks are held to maintain consistent lock order between freeze and unfreeze.
🧹 Nitpick comments (1)
crates/vm/src/gc_state.rs (1)
260-287: Extract shared untrack logic instead of duplicating generation/permanent branches.The two branches repeat the same re-check/remove/clear flow; only target list/counter selection differs. Consolidating this reduces maintenance risk.
As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 260 - 287, Extract the duplicated re-check/remove/clear sequence into a single helper and call it for both generation and permanent cases: compute the target list and counter first (for gen <= 2 set list = self.generation_lists[obj_gen as usize].write() and counter = &self.generations[obj_gen as usize].count; for GC_PERMANENT set list = self.permanent_list.write() and counter = &self.permanent.count), then perform the re-check using obj_ref.gc_generation(), call unsafe { list.remove(obj) } and if Some decrement counter.fetch_sub(1, Ordering::SeqCst), call obj_ref.clear_gc_tracked() and obj_ref.set_gc_generation(GC_UNTRACKED), and return; keep the outer loop/continue logic unchanged so the helper is retried when the generation changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 711-735: The freeze function currently acquires
self.permanent_list.write() before taking per-generation locks, which inverts
the lock order used by unfreeze (which takes generation_lists[2].write() then
permanent_list.write()) and can deadlock; fix by reordering locks in freeze to
always acquire generation_lists (e.g., iterate and take each gen_list.write() or
at least acquire generation_lists[2].write() before permanent_list.write()) so
the lock ordering matches unfreeze (generation_lists -> permanent_list), and
update the logic in freeze (and any helper blocks) to move nodes into
permanent_list after holding the generation lock(s) but only take
permanent_list.write() after those generation locks are held to maintain
consistent lock order between freeze and unfreeze.
---
Nitpick comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 260-287: Extract the duplicated re-check/remove/clear sequence
into a single helper and call it for both generation and permanent cases:
compute the target list and counter first (for gen <= 2 set list =
self.generation_lists[obj_gen as usize].write() and counter =
&self.generations[obj_gen as usize].count; for GC_PERMANENT set list =
self.permanent_list.write() and counter = &self.permanent.count), then perform
the re-check using obj_ref.gc_generation(), call unsafe { list.remove(obj) } and
if Some decrement counter.fetch_sub(1, Ordering::SeqCst), call
obj_ref.clear_gc_tracked() and obj_ref.set_gc_generation(GC_UNTRACKED), and
return; keep the outer loop/continue logic unchanged so the helper is retried
when the generation changed.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/common/src/linked_list.rscrates/vm/src/gc_state.rscrates/vm/src/object/core.rscrates/vm/src/object/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/common/src/linked_list.rs
c9f0692 to
c0054a4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 107-108: Remove the unconditional unsafe impls for Send and Sync
on the GcPtr type: delete the lines "unsafe impl Send for GcPtr {}" and "unsafe
impl Sync for GcPtr {}". GcPtr is only used within the GC collection algorithm
(the GC-related functions in this module) and must not be advertised as
thread-safe; if any compilation error occurs after removal, narrow the required
bounds where a type must be Send/Sync (rather than making GcPtr global
Send/Sync) or add a targeted wrapper that implements Send/Sync with a safe
justification; run cargo check/tests to confirm no other code relies on those
impls.
- Around line 286-297: The methods is_finalized and mark_finalized currently
dereference NonNull<PyObject> in safe public functions; change their signatures
to unsafe fn is_finalized(&self, obj: NonNull<PyObject>) -> bool and unsafe fn
mark_finalized(&self, obj: NonNull<PyObject>) so callers must guarantee the
pointer is valid before dereferencing, and update all call sites to wrap calls
in unsafe blocks (or otherwise ensure safety invariants); keep the internal
logic (unsafe { obj.as_ref() } / obj_ref.set_gc_finalized()) as-is but only
reachable from the new unsafe functions.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/common/src/linked_list.rscrates/vm/src/gc_state.rscrates/vm/src/object/core.rscrates/vm/src/object/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/vm/src/object/mod.rs
c38f9d2 to
40fc922
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/gc_state.rs (1)
289-299:⚠️ Potential issue | 🔴 CriticalMake raw-pointer finalization helpers unsafe (or take
&PyObject).
is_finalizedandmark_finalizedare safepub fns but dereference raw pointers internally (Line 290, Line 297). That makes UB reachable from safe callers passing an invalidNonNull<PyObject>.🔧 Proposed fix
- pub fn is_finalized(&self, obj: NonNull<PyObject>) -> bool { + /// # Safety + /// `obj` must point to a valid live `PyObject`. + pub unsafe fn is_finalized(&self, obj: NonNull<PyObject>) -> bool { let obj_ref = unsafe { obj.as_ref() }; obj_ref.gc_finalized() } - pub fn mark_finalized(&self, obj: NonNull<PyObject>) { + /// # Safety + /// `obj` must point to a valid live `PyObject`. + pub unsafe fn mark_finalized(&self, obj: NonNull<PyObject>) { let obj_ref = unsafe { obj.as_ref() }; obj_ref.set_gc_finalized(); }Run this to verify definition/call-site updates after the signature change:
#!/bin/bash set -euo pipefail # Confirm helper signatures rg -nP --type=rust 'pub\s+(unsafe\s+)?fn\s+(is_finalized|mark_finalized)\s*\(' crates/vm/src/gc_state.rs -C2 # Find all call sites that must satisfy unsafe contract (or be ref-based after refactor) rg -nP --type=rust '\.(is_finalized|mark_finalized)\s*\(' crates/vm/src -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/gc_state.rs` around lines 289 - 299, The methods is_finalized and mark_finalized currently dereference a raw NonNull<PyObject> inside safe functions which is UB; change their signatures to either pub unsafe fn is_finalized(&self, obj: NonNull<PyObject>) -> bool and pub unsafe fn mark_finalized(&self, obj: NonNull<PyObject>) or accept a safe reference pub fn is_finalized(&self, obj: &PyObject) / pub fn mark_finalized(&self, obj: &PyObject), update all call sites accordingly to either call them within an unsafe block or pass &PyObject, and run the ripgrep checks suggested to locate and fix all call sites that must be adjusted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 289-299: The methods is_finalized and mark_finalized currently
dereference a raw NonNull<PyObject> inside safe functions which is UB; change
their signatures to either pub unsafe fn is_finalized(&self, obj:
NonNull<PyObject>) -> bool and pub unsafe fn mark_finalized(&self, obj:
NonNull<PyObject>) or accept a safe reference pub fn is_finalized(&self, obj:
&PyObject) / pub fn mark_finalized(&self, obj: &PyObject), update all call sites
accordingly to either call them within an unsafe block or pass &PyObject, and
run the ripgrep checks suggested to locate and fix all call sites that must be
adjusted.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/common/src/linked_list.rscrates/vm/src/gc_state.rscrates/vm/src/object/core.rscrates/vm/src/object/mod.rs
7b739f8 to
19e1ab3
Compare
Replace per-generation HashSet<GcObjectPtr> with intrusive doubly-linked lists for GC object tracking. Each PyInner now carries gc_pointers (prev/next) and gc_generation fields, enabling O(1) track/untrack without hashing. - Add gc_pointers (Pointers<PyObject>) and gc_generation (u8) to PyInner - Implement GcLink trait for intrusive list integration - Replace generation_objects/permanent_objects/tracked_objects/finalized_objects HashSets with generation_lists/permanent_list LinkedLists - Use GcBits::FINALIZED flag instead of finalized_objects HashSet - Change default_dealloc to untrack directly before memory free - Hold both src/dst list locks in promote_survivors to prevent race conditions with concurrent untrack_object calls - Add pop_front to LinkedList for freeze/unfreeze operations
Move unreachable_refs creation before drop(gen_locks) so that raw pointer dereferences and refcount increments happen while generation list read locks are held. Previously, after dropping read locks, other threads could untrack and free objects, causing use-after-free when creating strong references from the raw GcPtr pointers. Also add linked list integrity assertions and diagnostic warnings for debugging GC list corruption.
19e1ab3 to
b279716
Compare
- Remove unused is_finalized/mark_finalized methods from GcState - Remove unnecessary Send/Sync impls on GcPtr (only used locally within collect_inner, never shared across threads)
Replace per-generation HashSet with intrusive doubly-linked lists for GC object tracking. Each PyInner now carries gc_pointers (prev/next) and gc_generation fields, enabling O(1) track/untrack without hashing.
Summary by CodeRabbit
Refactor
Bug Fixes