Skip to content

Fix thread-safety in GC, type cache, and instruction cache#7355

Merged
youknowone merged 6 commits intoRustPython:mainfrom
youknowone:gc-use-after-free
Mar 5, 2026
Merged

Fix thread-safety in GC, type cache, and instruction cache#7355
youknowone merged 6 commits intoRustPython:mainfrom
youknowone:gc-use-after-free

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 5, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Tightened reference-count and ownership checks to reduce use-after-free and TOCTOU races.
    • Safer GC and callback collection paths to avoid torn weak references.
    • Improved inline-cache and attribute access guards to prevent stale references.
  • New Features

    • Pointer-sized atomic cache support for runtime caches.
    • SeqLock-based method cache protocol for stronger cache invalidation.
    • RLock state save/restore helpers for context switches.
  • Other

    • Dict-keys now handled via the generic iterable path.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a CAS-based ownership upgrade (try_to_owned) and tightens RefCount::safe_inc to refuse increments when strong count is zero. Weakref, GC, frame inline-cache, and type cache flows now attempt guarded ownership; codeunit caches switch to pointer-sized atomics. RLock gained save/restore helpers.

Changes

Cohort / File(s) Summary
Refcount core
crates/common/src/refcount.rs
RefCount::safe_inc now returns false when strong() == 0, disallowing increments when the object is effectively dead.
PyObject & weakrefs
crates/vm/src/object/core.rs
Added pub fn try_to_owned(&self) -> Option<PyObjectRef> (CAS-based); weakref callback collection and weak-ref iteration now use try_to_owned() instead of raw strong-count checks.
GC iteration/collection
crates/vm/src/gc_state.rs
Replaced strong_count() > 0 checks with try_to_owned() when collecting survivors/unreachables; added TOCTOU/CAS comments around ownership attempts.
Inline caches & attribute lookup
crates/vm/src/frame.rs, crates/vm/src/builtins/type.rs
Switched cache storage to pointer-sized atomics; added SeqLock-style begin/end read-write helpers; cache reads attempt try_to_owned() and fall back to deopt/slow path on failure; invalidation ordering adjusted.
VM iterable handling
crates/vm/src/vm/mod.rs
Removed special-case snapshot path for dict_keys_type in extract_elements_with; PyDictKeys special import removed and now handled via generic iterable path.
Bytecode pointer cache
crates/compiler-core/src/bytecode.rs
Added pointer_cache: Box<[AtomicUsize]> to CodeUnits; replaced 64-bit cache methods with pointer-sized write_cache_ptr/read_cache_ptr; updated Clone/From to copy pointer_cache entries.
Threading / RLock
crates/vm/src/stdlib/thread.rs
RLock::release now enforces ownership; added _release_save(&self, vm) -> PyResult<(usize, u64)> and _acquire_restore(&self, state, vm) -> PyResult<()> to save/restore reentrant lock state.

Sequence Diagram

sequenceDiagram
    participant GC as GC Collector
    participant PyObj as PyObject
    participant RefCnt as RefCount
    participant Callback as CallbackHandler

    Note over GC,Callback: Guarded ownership attempt before using object
    GC->>PyObj: try_to_owned()
    activate PyObj
    PyObj->>RefCnt: safe_inc() (CAS)
    activate RefCnt
    RefCnt->>RefCnt: check destructed()
    RefCnt->>RefCnt: check strong() == 0
    alt guard fails
        RefCnt-->>PyObj: return None
    else guard passes
        RefCnt-->>PyObj: increment strong -> return Some(PyObjectRef)
    end
    deactivate RefCnt
    deactivate PyObj

    alt Ownership acquired
        GC->>Callback: invoke callback with owned PyObjectRef
    else Ownership failed
        GC->>GC: skip / deopt to slow path
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped in with a CAS so bright,

Safe_inc now guards against the night,
Weakrefs wait while ownership's checked,
Caches publish pointers, nicely decked,
A rabbit cheers — the refs feel right.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objectives of the PR, which addresses thread-safety issues across three critical subsystems: garbage collection, type cache, and instruction cache.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review March 5, 2026 02:34
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/vm/src/vm/mod.rs (1)

1322-1329: Consolidate duplicated dict-key snapshot branches.

Both branches do the same map/collect after selecting key source; extracting only the differing source keeps this tighter and easier to maintain.

Refactor sketch
-        } else if cls.is(self.ctx.types.dict_type) {
-            // Atomic snapshot of dict keys for thread-safe iteration.
-            let keys = value.downcast_ref::<PyDict>().unwrap().keys_vec();
-            return keys.into_iter().map(func).collect();
-        } else if cls.is(self.ctx.types.dict_keys_type) {
-            // Atomic snapshot of dict keys - prevents race condition during iteration
-            let keys = value.downcast_ref::<PyDictKeys>().unwrap().dict.keys_vec();
+        } else if cls.is(self.ctx.types.dict_type) || cls.is(self.ctx.types.dict_keys_type) {
+            let keys = if cls.is(self.ctx.types.dict_type) {
+                value.downcast_ref::<PyDict>().unwrap().keys_vec()
+            } else {
+                value.downcast_ref::<PyDictKeys>().unwrap().dict.keys_vec()
+            };
             return keys.into_iter().map(func).collect();

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 1322 - 1329, The two branches that
handle dict and dict_keys are duplicated: both compute an atomic snapshot via
keys_vec() and then map/collect with func; refactor by first selecting the key
source into a single variable (e.g., if cls.is(self.ctx.types.dict_type) use
value.downcast_ref::<PyDict>().unwrap().keys_vec(), else if
cls.is(self.ctx.types.dict_keys_type) use
value.downcast_ref::<PyDictKeys>().unwrap().dict.keys_vec()), assign that Vec to
a local variable (e.g., keys), and then perform
keys.into_iter().map(func).collect() once; update the code around the cls checks
and preserve the unwrap() calls and func usage so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/vm/src/vm/mod.rs`:
- Around line 1322-1329: The two branches that handle dict and dict_keys are
duplicated: both compute an atomic snapshot via keys_vec() and then map/collect
with func; refactor by first selecting the key source into a single variable
(e.g., if cls.is(self.ctx.types.dict_type) use
value.downcast_ref::<PyDict>().unwrap().keys_vec(), else if
cls.is(self.ctx.types.dict_keys_type) use
value.downcast_ref::<PyDictKeys>().unwrap().dict.keys_vec()), assign that Vec to
a local variable (e.g., keys), and then perform
keys.into_iter().map(func).collect() once; update the code around the cls checks
and preserve the unwrap() calls and func usage so behavior is unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 5893b646-9765-4d5f-9945-0b0e63fdcb36

📥 Commits

Reviewing files that changed from the base of the PR and between 62766fd and f56a0b3.

⛔ Files ignored due to path filters (1)
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • crates/common/src/refcount.rs
  • crates/vm/src/gc_state.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/vm/mod.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode.rs (1)

630-635: Document read_cache_ptr panic behavior for API parity.

This method indexes directly and can panic on out-of-bounds; adding # Panics would align it with read_cache_u16.

Suggested doc update
     /// Load a pointer-sized value atomically from the pointer cache at `index`.
     ///
     /// Uses a single `AtomicUsize` load to prevent torn reads.
+    ///
+    /// # Panics
+    /// Panics if `index` is out of bounds.
     pub fn read_cache_ptr(&self, index: usize) -> usize {
         self.pointer_cache[index].load(Ordering::Relaxed)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 630 - 635, The doc comment
for read_cache_ptr is missing a Panics section describing that direct indexing
into pointer_cache can panic on out-of-bounds; update the documentation for pub
fn read_cache_ptr(&self, index: usize) -> usize to include a "# Panics"
paragraph mirroring read_cache_u16, stating that it will panic if index is out
of bounds of self.pointer_cache (because it uses direct indexing), so callers
know the behavior and API parity is maintained.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 3696-3706: The code pops the attribute owner via self.pop_value()
before confirming the property getter exists, which mutates the stack and causes
load_attr_slow() to double-pop on the slow path; change the sequence in the
read_cache_ptr handling (around read_cache_ptr, PyProperty, get_fget, pop_value,
push_value) to query prop.get_fget() first and only call self.pop_value() after
you have Some(getter), then call getter.call((owner,), vm) and
push_value(result); this ensures the stack is not modified when fget is absent
and preserves correct behavior when falling back to load_attr_slow().

---

Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 630-635: The doc comment for read_cache_ptr is missing a Panics
section describing that direct indexing into pointer_cache can panic on
out-of-bounds; update the documentation for pub fn read_cache_ptr(&self, index:
usize) -> usize to include a "# Panics" paragraph mirroring read_cache_u16,
stating that it will panic if index is out of bounds of self.pointer_cache
(because it uses direct indexing), so callers know the behavior and API parity
is maintained.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 831c40c5-46e1-44fe-9ddc-925a16fc0e57

📥 Commits

Reviewing files that changed from the base of the PR and between f56a0b3 and 95ba6ee.

⛔ Files ignored due to path filters (1)
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/common/src/refcount.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/gc_state.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/vm/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/gc_state.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)

3702-3704: ⚠️ Potential issue | 🔴 Critical

Avoid popping owner before confirming property getter exists.

Line 3702 pops from the stack before prop.get_fget() is validated at Line 3703. If fget is missing, fallback to load_attr_slow() runs with a mutated stack and can double-pop.

Proposed fix
-                            let owner = self.pop_value();
-                            if let Some(getter) = prop.get_fget() {
+                            if let Some(getter) = prop.get_fget() {
+                                let owner = self.pop_value();
                                 let result = getter.call((owner,), vm)?;
                                 self.push_value(result);
                                 return Ok(None);
                             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 3702 - 3704, The code pops the owner
unconditionally before checking prop.get_fget(), which corrupts the stack if
fget is None and load_attr_slow() runs; change the flow so you first check
prop.get_fget() and only call self.pop_value() when the getter exists (i.e.,
move the let owner = self.pop_value() inside the if let Some(getter) branch),
and ensure the else path calls load_attr_slow() without having mutated the stack
(use existing stack state/peek rather than pop). Update uses of
getter.call((owner,), vm) accordingly so owner is only created when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 741-754: The code dereferences the raw pointer ptr into &*ptr
(creating a &PyObject) before calling obj.try_to_owned(), causing a potential
use-after-free; change the flow so you do not form a reference before the
ownership upgrade: call try_to_owned (or a new unsafe wrapper that accepts *mut
PyObject) using the raw pointer directly and only create the &PyObject / &*ptr
after try_to_owned succeeds (or return the owned clone without ever forming a
temporary reference), then perform the seq-lock version recheck (entry.version
v2 vs v1) and return cloned; update usages of try_to_owned/safe_inc to accept a
raw pointer if needed and remove the early &*ptr dereference to avoid undefined
behavior.

---

Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 3702-3704: The code pops the owner unconditionally before checking
prop.get_fget(), which corrupts the stack if fget is None and load_attr_slow()
runs; change the flow so you first check prop.get_fget() and only call
self.pop_value() when the getter exists (i.e., move the let owner =
self.pop_value() inside the if let Some(getter) branch), and ensure the else
path calls load_attr_slow() without having mutated the stack (use existing stack
state/peek rather than pop). Update uses of getter.call((owner,), vm)
accordingly so owner is only created when needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 6116694d-725f-4541-abfd-4e002eba8b2c

📥 Commits

Reviewing files that changed from the base of the PR and between 95ba6ee and 2adc355.

📒 Files selected for processing (3)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs

GC / refcount:
- Add safe_inc() check for strong()==0 in RefCount
- Add try_to_owned() to PyObject for atomic refcount acquire
- Replace strong_count()+to_owned() with try_to_owned() in GC
  collection and weakref callback paths to prevent TOCTOU races

Type cache:
- Add proper SeqLock (sequence counter) to TypeCacheEntry
- Readers spin-wait on odd sequence, validate after read
- Writers bracket updates with begin_write/end_write
- Use try_to_owned + pointer revalidation on read path
- Call modified() BEFORE attribute modification in set_attr

Instruction cache:
- Add pointer_cache (AtomicUsize array) to CodeUnits for
  single atomic pointer load/store (prevents torn reads)
- Add try_read_cached_descriptor with try_to_owned + pointer
  and version revalidation after increment
- Add write_cached_descriptor with version-bracketed writes

RLock:
- Fix release() to check is_owned_by_current_thread
- Add _release_save/_acquire_restore methods
@youknowone youknowone changed the title Fix GC use-after-free via atomic try_to_owned Fix thread-safety in GC, type cache, and instruction cache Mar 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
crates/vm/src/builtins/type.rs (1)

786-788: ⚠️ Potential issue | 🔴 Critical

Do not form &*ptr before ownership upgrade; this is still a UAF/UB window.

At Line 786, creating &PyObject from raw ptr happens before successful try_to_owned(). If another thread drops the cached strong ref first, that dereference is invalid.

Suggested direction
- let obj: &PyObject = unsafe { &*ptr };
- if let Some(cloned) = obj.try_to_owned() {
+ // Upgrade from raw pointer atomically first; only then materialize references.
+ if let Some(cloned) = unsafe { PyObject::try_to_owned_raw(ptr) } {
     let same_ptr = core::ptr::eq(entry.value.load(Ordering::Relaxed), ptr);
     if same_ptr && entry.end_read(seq1) {
         return Some(cloned);
     }
     drop(cloned);
     continue;
 }

(If try_to_owned_raw does not exist yet, add it in crates/vm/src/object/core.rs and keep all pre-upgrade handling pointer-only.)

#!/bin/bash
set -euo pipefail

echo "== cache read path around raw deref =="
sed -n '770,800p' crates/vm/src/builtins/type.rs

echo
echo "== all raw-pointer -> reference conversions in type cache path =="
rg -n -C3 'unsafe \{ &\*ptr \}|try_to_owned\(' crates/vm/src/builtins/type.rs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 786 - 788, The code currently
forms a reference with `let obj: &PyObject = unsafe { &*ptr };` before calling
`obj.try_to_owned()`, which can cause a UAF/UB if another thread drops the
cached strong ref; change the upgrade path to operate on the raw pointer only
(e.g., add and use a pointer-based upgrade like `try_to_owned_raw` in
crates/vm/src/object/core.rs) so you never create `&PyObject` before the
ownership is confirmed; update the call site in the type cache (the block using
`ptr`, `entry.value.load(Ordering::Relaxed)`, and `try_to_owned`) to call the
new raw-pointer upgrade routine and only create a safe reference after the
ownership upgrade succeeds and the pointer is still the same.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode.rs (1)

619-638: Add focused tests for the new pointer-cache API.

Please add direct coverage for round-trip read/write and out-of-bounds panic behavior to prevent regressions in this concurrency-critical path.

✅ Suggested tests
 #[cfg(test)]
 mod tests {
     use super::*;
     use alloc::{vec, vec::Vec};

+    #[test]
+    fn test_pointer_cache_roundtrip_and_clone() {
+        let units = CodeUnits::from(vec![CodeUnit::new(Instruction::Cache, 0u8.into())]);
+        unsafe {
+            units.write_cache_ptr(0, 0x1234usize);
+        }
+        assert_eq!(units.read_cache_ptr(0), 0x1234usize);
+
+        let cloned = units.clone();
+        assert_eq!(cloned.read_cache_ptr(0), 0x1234usize);
+    }
+
+    #[test]
+    #[should_panic]
+    fn test_pointer_cache_oob_panics() {
+        let units = CodeUnits::from(vec![CodeUnit::new(Instruction::Cache, 0u8.into())]);
+        let _ = units.read_cache_ptr(1);
+    }
+
     #[test]
     fn test_exception_table_encode_decode() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 619 - 638, Add focused
unit tests exercising the pointer-cache API: create tests that (1) perform a
round-trip using write_cache_ptr and read_cache_ptr to store and retrieve a
usize value (covering typical and edge values) and assert equality, and (2)
assert that read_cache_ptr and unsafe write_cache_ptr panic or trigger UB when
given an out-of-bounds index (use std::panic::catch_unwind or #[should_panic] as
appropriate). Locate usages of pointer_cache, read_cache_ptr, and
write_cache_ptr to implement these tests, and include concurrent access variants
if possible to exercise atomic behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 624-627: The Safety docs for unsafe fn write_cache_ptr currently
only require an in-bounds index; extend the contract to state that `value` must
be either 0 (sentinel) or a valid `*const PyObject` encoded as a usize, and that
callers must follow the cache invalidation/upgrade protocol used by
write_cached_descriptor (i.e., ensure version guards are published/checked in
the proper order before any reader dereferences the stored pointer such as the
dereference at descr_ptr in frame.rs); update the doc comment on write_cache_ptr
to enumerate these constraints so callers know they must write only valid
pointer values and respect the version-guard semantics.

In `@crates/vm/src/builtins/type.rs`:
- Around line 106-113: begin_write currently uses sequence.fetch_add(1) which
allows concurrent writers and breaks the SeqLock invariant; change it to acquire
an exclusive writer lock (e.g., a dedicated atomic flag or Mutex) before making
the sequence odd, so only one writer can enter, then increment sequence to mark
the start of write; in end_write release the lock after incrementing sequence to
mark write completion. Update the implementations of begin_write and end_write
and add/use the new writer lock field (referencing sequence, begin_write,
end_write) to ensure writers are serialized.

In `@crates/vm/src/stdlib/thread.rs`:
- Around line 295-305: The _acquire_restore implementation currently ignores the
saved owner; update fn _acquire_restore(&self, state: (usize, u64), _vm:
&VirtualMachine) to unpack and validate the saved owner against the current
thread id (use current_thread_id() or the crate's equivalent) and return an
error if they differ instead of silently accepting it; only when the owner
matches proceed to re-lock by calling self.mu.lock() count times and restore
self.count.store(count, Ordering::Relaxed). Ensure the error return follows the
surrounding error type/contract used by other methods (e.g., match the error
style used by _release_save/_acquire) so callers receive a proper
protocol-mismatch failure.

---

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 786-788: The code currently forms a reference with `let obj:
&PyObject = unsafe { &*ptr };` before calling `obj.try_to_owned()`, which can
cause a UAF/UB if another thread drops the cached strong ref; change the upgrade
path to operate on the raw pointer only (e.g., add and use a pointer-based
upgrade like `try_to_owned_raw` in crates/vm/src/object/core.rs) so you never
create `&PyObject` before the ownership is confirmed; update the call site in
the type cache (the block using `ptr`, `entry.value.load(Ordering::Relaxed)`,
and `try_to_owned`) to call the new raw-pointer upgrade routine and only create
a safe reference after the ownership upgrade succeeds and the pointer is still
the same.

---

Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 619-638: Add focused unit tests exercising the pointer-cache API:
create tests that (1) perform a round-trip using write_cache_ptr and
read_cache_ptr to store and retrieve a usize value (covering typical and edge
values) and assert equality, and (2) assert that read_cache_ptr and unsafe
write_cache_ptr panic or trigger UB when given an out-of-bounds index (use
std::panic::catch_unwind or #[should_panic] as appropriate). Locate usages of
pointer_cache, read_cache_ptr, and write_cache_ptr to implement these tests, and
include concurrent access variants if possible to exercise atomic behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8a57c331-cc95-44dc-8045-0b04cccb47ed

📥 Commits

Reviewing files that changed from the base of the PR and between 2adc355 and 70169e8.

⛔ Files ignored due to path filters (1)
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
📒 Files selected for processing (8)
  • crates/common/src/refcount.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/gc_state.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/common/src/refcount.rs
  • crates/vm/src/vm/mod.rs

Comment on lines +106 to +113
fn begin_write(&self) {
self.sequence.fetch_add(1, Ordering::AcqRel);
}

#[inline]
fn end_write(&self) {
self.sequence.fetch_add(1, Ordering::Release);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Full TypeCacheEntry impl with SeqLock helpers =="
sed -n '90,150p' crates/vm/src/builtins/type.rs

echo
echo "== Unsafe pointer deref sequence around line 786 =="
sed -n '780,795p' crates/vm/src/builtins/type.rs

echo
echo "== All begin_write/end_write call sites =="
rg -n 'begin_write|end_write' crates/vm/src/builtins/type.rs

Repository: RustPython/RustPython

Length of output: 3200


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Writer context 1: lines 165-180 (type_cache_clear_version) =="
sed -n '165,180p' crates/vm/src/builtins/type.rs

echo
echo "== Writer context 2: lines 190-205 (type_cache_clear) =="
sed -n '190,205p' crates/vm/src/builtins/type.rs

echo
echo "== Writer context 3: lines 815-835 (find_name_in_mro write) =="
sed -n '815,835p' crates/vm/src/builtins/type.rs

Repository: RustPython/RustPython

Length of output: 2369


🏁 Script executed:

#!/bin/bash
set -euo pipeftp

echo "== Verify SeqLock sequence usage in begin_read/end_read =="
sed -n '119,129p' crates/vm/src/builtins/type.rs

echo
echo "== Full unsafe deref context =="
sed -n '775,800p' crates/vm/src/builtins/type.rs

echo
echo "== Check if readers protect against concurrent writers =="
sed -n '750,800p' crates/vm/src/builtins/type.rs | grep -A50 "begin_read"

Repository: RustPython/RustPython

Length of output: 117


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Verify SeqLock sequence usage in begin_read/end_read =="
sed -n '119,129p' crates/vm/src/builtins/type.rs

echo
echo "== Full unsafe deref context (lines 750-800) =="
sed -n '750,800p' crates/vm/src/builtins/type.rs

Repository: RustPython/RustPython

Length of output: 2739


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Verify what try_to_owned does =="
rg -A10 'fn try_to_owned' --type rust

echo
echo "== Confirm fetch_add behavior is atomic but unordered =="
echo "fetch_add returns old value and atomically increments, but does not prevent concurrent increments"
echo "Two concurrent fetch_add operations on same atomic will both complete, resulting in +2"
echo "If seq=0: Thread A fetch_add → seq=1, Thread B fetch_add → seq=2"
echo "Readers see seq=2 (even) and incorrectly assume no writer is active"

Repository: RustPython/RustPython

Length of output: 1033


Serialize writers in begin_write; fetch_add allows concurrent writers and breaks SeqLock invariants.

At Line 107, fetch_add(1) does not acquire exclusive writer ownership. With concurrent writers starting from an even sequence: Writer A makes sequence odd, but Writer B's simultaneous fetch_add makes it even again. Readers see an even sequence and proceed to read while both writers are modifying state, violating the SeqLock invariant that sequence must remain odd during any write.

Suggested fix
 fn begin_write(&self) {
-    self.sequence.fetch_add(1, Ordering::AcqRel);
+    let mut seq = self.sequence.load(Ordering::Acquire);
+    loop {
+        while (seq & 1) != 0 {
+            core::hint::spin_loop();
+            seq = self.sequence.load(Ordering::Acquire);
+        }
+        match self.sequence.compare_exchange_weak(
+            seq,
+            seq.wrapping_add(1),
+            Ordering::AcqRel,
+            Ordering::Acquire,
+        ) {
+            Ok(_) => break,
+            Err(observed) => seq = observed,
+        }
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn begin_write(&self) {
self.sequence.fetch_add(1, Ordering::AcqRel);
}
#[inline]
fn end_write(&self) {
self.sequence.fetch_add(1, Ordering::Release);
}
fn begin_write(&self) {
let mut seq = self.sequence.load(Ordering::Acquire);
loop {
while (seq & 1) != 0 {
core::hint::spin_loop();
seq = self.sequence.load(Ordering::Acquire);
}
match self.sequence.compare_exchange_weak(
seq,
seq.wrapping_add(1),
Ordering::AcqRel,
Ordering::Acquire,
) {
Ok(_) => break,
Err(observed) => seq = observed,
}
}
}
#[inline]
fn end_write(&self) {
self.sequence.fetch_add(1, Ordering::Release);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 106 - 113, begin_write currently
uses sequence.fetch_add(1) which allows concurrent writers and breaks the
SeqLock invariant; change it to acquire an exclusive writer lock (e.g., a
dedicated atomic flag or Mutex) before making the sequence odd, so only one
writer can enter, then increment sequence to mark the start of write; in
end_write release the lock after incrementing sequence to mark write completion.
Update the implementations of begin_write and end_write and add/use the new
writer lock field (referencing sequence, begin_write, end_write) to ensure
writers are serialized.

Comment on lines +295 to +305
fn _acquire_restore(&self, state: (usize, u64), _vm: &VirtualMachine) {
let (count, _owner) = state;
if count == 0 {
return;
}
for _ in 0..count {
self.mu.lock();
}
self.count
.store(count, core::sync::atomic::Ordering::Relaxed);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect expected Python-side protocol
echo "=== Python Lib/threading.py RLock methods ==="
rg -n -A10 'def _release_save|def _acquire_restore|def _is_owned' Lib/threading.py || echo "File not found or patterns not matched"

# Inspect current Rust implementation
echo -e "\n=== Rust thread.rs RLock implementation ==="
rg -n -A16 -B3 'fn _release_save|fn _acquire_restore|fn _is_owned' crates/vm/src/stdlib/thread.rs || echo "File not found or patterns not matched"

Repository: RustPython/RustPython

Length of output: 3208


Validate or reject mismatched owner in _acquire_restore.

The Python RLock protocol at lines 251–263 of Lib/threading.py shows that _acquire_restore(state) unpacks and restores both _count and _owner from the saved state (line 253). The Rust implementation extracts the owner field into _owner but never uses it, silently discarding it instead of validating or restoring it.

While the current code "works" when the same thread calls _release_save and then _acquire_restore, it creates a protocol divergence: any thread can call _acquire_restore with saved state from another thread and will silently acquire the lock for itself instead of rejecting the mismatch. This breaks the semantic contract expected by Python's threading module.

The proposed fix is appropriate: validate that the owner matches current_thread_id() and return an error if it doesn't, ensuring the RLock protocol is correctly enforced.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 295 - 305, The _acquire_restore
implementation currently ignores the saved owner; update fn
_acquire_restore(&self, state: (usize, u64), _vm: &VirtualMachine) to unpack and
validate the saved owner against the current thread id (use current_thread_id()
or the crate's equivalent) and return an error if they differ instead of
silently accepting it; only when the owner matches proceed to re-lock by calling
self.mu.lock() count times and restore self.count.store(count,
Ordering::Relaxed). Ensure the error return follows the surrounding error
type/contract used by other methods (e.g., match the error style used by
_release_save/_acquire) so callers receive a proper protocol-mismatch failure.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/threading.py
[x] lib: cpython/Lib/_threading_local.py
[ ] test: cpython/Lib/test/test_threading.py (TODO: 17)
[ ] test: cpython/Lib/test/test_threadedtempfile.py
[ ] test: cpython/Lib/test/test_threading_local.py (TODO: 3)

dependencies:

  • threading

dependent tests: (148 tests)

  • threading: test_android test_asyncio test_bz2 test_code test_concurrent_futures test_contextlib test_ctypes test_decimal test_docxmlrpc test_email test_enum test_fork1 test_frame test_ftplib test_functools test_gc test_hashlib test_httplib test_httpservers test_imaplib test_importlib test_inspect test_io test_itertools test_largefile test_linecache test_logging test_memoryview test_opcache test_pathlib test_poll test_queue test_robotparser test_sched test_signal test_smtplib test_socket test_socketserver test_sqlite3 test_ssl test_subprocess test_super test_sys test_syslog test_termios test_threadedtempfile test_threading test_threading_local test_time test_urllib2_localnet test_weakref test_winreg test_wsgiref test_xmlrpc test_zstd
    • asyncio: test_asyncio test_contextlib_async test_os test_sys_settrace test_unittest
    • bdb: test_bdb
    • concurrent.futures._base: test_concurrent_futures
    • concurrent.futures.process: test_compileall test_concurrent_futures
    • concurrent.futures.thread: test_genericalias
    • dummy_threading: test_dummy_threading
    • http.cookiejar: test_http_cookiejar test_urllib2
      • urllib.request: test_pathlib test_pydoc test_sax test_site test_urllib test_urllib2net test_urllibnet
    • importlib.util: test_ctypes test_doctest test_importlib test_pkgutil test_py_compile test_reprlib test_runpy test_zipfile test_zipimport
      • py_compile: test_argparse test_cmd_line_script test_importlib test_multiprocessing_main_handling
      • pyclbr: test_pyclbr
      • sysconfig: test_c_locale_coercion test_dtrace test_launcher test_osx_env test_posix test_pyexpat test_regrtest test_support test_sysconfig test_tools test_venv
      • zipfile: test_shutil test_zipapp test_zipfile test_zipfile64
    • logging: test_unittest
      • hashlib: test_hmac test_tarfile test_unicodedata
    • multiprocessing: test_fcntl test_re
    • queue: test_dummy_thread
    • subprocess: test_atexit test_audit test_cmd_line test_ctypes test_faulthandler test_file_eintr test_gzip test_json test_msvcrt test_ntpath test_platform test_plistlib test_quopri test_repl test_script_helper test_select test_tempfile test_traceback test_unittest test_utf8_mode test_wait3 test_webbrowser
      • ctypes.util: test_ctypes
      • ensurepip: test_ensurepip
      • platform: test__locale test__osx_support test_baseexception test_builtin test_cmath test_math test_mimetypes
    • sysconfig:
      • trace: test_trace
    • zipfile:
      • shutil: test_filecmp test_glob test_string_literals test_unicode_file test_zoneinfo

[ ] lib: cpython/Lib/zipapp.py
[ ] test: cpython/Lib/test/test_zipapp.py

dependencies:

  • zipapp (native: sys)
    • argparse, contextlib, os, pathlib, shutil, stat, zipfile

dependent tests: (1 tests)

  • zipapp: test_zipapp

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
crates/vm/src/builtins/type.rs (1)

807-809: ⚠️ Potential issue | 🔴 Critical

Unsafe &*ptr is created before ownership upgrade.

At Line 807, the code builds &PyObject from a raw cache pointer before try_to_owned(). Under concurrent writer replacement/drop, this can create a reference to freed memory.

Suggested direction
- let obj: &PyObject = unsafe { &*ptr };
- if let Some(cloned) = obj.try_to_owned() {
+ // Avoid creating `&PyObject` before ownership is safely upgraded.
+ // Use a raw-pointer upgrade helper (e.g. `PyObject::try_to_owned_from_ptr(ptr)`)
+ // that performs safe refcount CAS first, then materializes owned ref.
+ if let Some(cloned) = unsafe { PyObject::try_to_owned_from_ptr(ptr) } {
#!/bin/bash
set -euo pipefail

echo "=== Cache-hit dereference site ==="
rg -n -C4 'let obj: &PyObject = unsafe \{ &\*ptr \};|try_to_owned\(' crates/vm/src/builtins/type.rs

echo
echo "=== Ownership upgrade APIs currently available ==="
rg -n -C4 'fn try_to_owned|safe_inc' crates/vm/src/object/core.rs crates/common/src/refcount.rs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 807 - 809, The code currently
forms a borrowed reference let obj: &PyObject = unsafe { &*ptr } before
attempting an ownership upgrade, which can produce a dangling reference under
concurrent replacement; instead, perform the ownership upgrade first from the
raw pointer (use the available try_to_owned API on the raw pointer/handle
returned by entry.value.load(Ordering::Relaxed)) and only if try_to_owned
succeeds obtain a safe reference from the owned handle (e.g.,
owned.as_ref()/deref) to use for validation; also compute same_ptr against the
original raw pointer value from entry.value.load so comparisons remain correct
(avoid creating any &PyObject from ptr until after successful try_to_owned).
crates/vm/src/stdlib/thread.rs (1)

295-303: ⚠️ Potential issue | 🟠 Major

_acquire_restore currently ignores saved owner and can accept cross-thread state.

At Line 302, owner is parsed but discarded. That lets a different thread call _acquire_restore with foreign saved state and proceed, which diverges from the expected restore contract.

Suggested fix
 fn _acquire_restore(&self, state: PyTupleRef, vm: &VirtualMachine) -> PyResult<()> {
     let [count_obj, owner_obj] = state.as_slice() else {
         return Err(
             vm.new_type_error("_acquire_restore() argument 1 must be a 2-item tuple")
         );
     };
     let count: usize = count_obj.clone().try_into_value(vm)?;
-    let _owner: u64 = owner_obj.clone().try_into_value(vm)?;
+    let owner: u64 = owner_obj.clone().try_into_value(vm)?;
+    if owner != current_thread_id() {
+        return Err(vm.new_runtime_error(
+            "cannot restore lock acquired by another thread".to_owned(),
+        ));
+    }
     if count == 0 {
         return Ok(());
     }
#!/bin/bash
set -euo pipefail

echo "=== Rust _thread RLock restore implementation ==="
sed -n '281,312p' crates/vm/src/stdlib/thread.rs

echo
echo "=== Python threading protocol reference ==="
sed -n '251,263p' Lib/threading.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 295 - 303, The _acquire_restore
implementation parses but ignores the saved owner; change it to validate that
the restored owner matches the current thread before accepting the state. In
function _acquire_restore, keep the parsed owner (currently assigned to _owner),
fetch the current thread's native id (or equivalent runtime thread id used
elsewhere in this module), and if owner != current_thread_id return an
Err(vm.new_runtime_error(...) or appropriate PyError) so cross-thread restore is
rejected; only when the owner matches and count is valid should you proceed with
the acquire restore logic. Ensure you reference and use the symbols
_acquire_restore, owner (parsed from state), and whatever helper/field provides
the current thread id in this module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 807-809: The code currently forms a borrowed reference let obj:
&PyObject = unsafe { &*ptr } before attempting an ownership upgrade, which can
produce a dangling reference under concurrent replacement; instead, perform the
ownership upgrade first from the raw pointer (use the available try_to_owned API
on the raw pointer/handle returned by entry.value.load(Ordering::Relaxed)) and
only if try_to_owned succeeds obtain a safe reference from the owned handle
(e.g., owned.as_ref()/deref) to use for validation; also compute same_ptr
against the original raw pointer value from entry.value.load so comparisons
remain correct (avoid creating any &PyObject from ptr until after successful
try_to_owned).

In `@crates/vm/src/stdlib/thread.rs`:
- Around line 295-303: The _acquire_restore implementation parses but ignores
the saved owner; change it to validate that the restored owner matches the
current thread before accepting the state. In function _acquire_restore, keep
the parsed owner (currently assigned to _owner), fetch the current thread's
native id (or equivalent runtime thread id used elsewhere in this module), and
if owner != current_thread_id return an Err(vm.new_runtime_error(...) or
appropriate PyError) so cross-thread restore is rejected; only when the owner
matches and count is valid should you proceed with the acquire restore logic.
Ensure you reference and use the symbols _acquire_restore, owner (parsed from
state), and whatever helper/field provides the current thread id in this module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 76a8b21b-1f36-480c-9237-ad3e45b51fbb

📥 Commits

Reviewing files that changed from the base of the PR and between 70169e8 and de01099.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_threading.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/stdlib/thread.rs

Instead of calling lock()/unlock() N times for recursion depth N,
keep parking_lot at 1 level and manage the count ourselves.
This makes acquire/release O(1) and matches CPython's
_PyRecursiveMutex approach (lock once + set level directly).
Use addr_of! to access ref_count directly from a raw pointer
without forming &PyObject first. Applied in type cache and
instruction cache hit paths where the pointer may be stale.
- Fix "minimising" -> "minimizing" for cspell
- xfail test_thread_safety: dict iteration races with
  concurrent GC mutations in _finalizer_registry
@youknowone youknowone merged commit 375b547 into RustPython:main Mar 5, 2026
14 checks passed
@youknowone youknowone deleted the gc-use-after-free branch March 5, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant