Conversation
📝 WalkthroughWalkthroughThis PR makes bytecode slots atomic for concurrent specialization, centralizes adaptive inline-caching and deoptimization flows, moves JIT cache storage to a mutex-protected optional, strengthens several atomic ordering choices across types/dicts, and distinguishes Python-level methods from native slots in slot lookup/update logic. Changes
Sequence Diagram(s)sequenceDiagram
participant Frame as ExecutingFrame
participant CodeUnits as Code Storage
participant Adaptive as Adaptive Engine
participant Specialized as Specialized Handler
participant Base as Base Handler
Frame->>CodeUnits: read_op(index) (Acquire)
CodeUnits-->>Frame: opcode
Frame->>Adaptive: check adaptive counter & candidate (Relaxed)
Adaptive->>Adaptive: increment counter (Relaxed)
Adaptive->>Specialized: attempt specialization
alt Success
Specialized->>CodeUnits: compare_exchange_op(index, expected, new) (Release)
CodeUnits-->>Specialized: CAS success
Specialized-->>Frame: execute specialized path
else Fail
Specialized->>CodeUnits: write base opcode (Release)
CodeUnits-->>Base: base opcode written
Specialized-->>Frame: fall back to base handler
end
sequenceDiagram
participant Setter as PyType SetAttr
participant Version as Version Tag
participant Readers as Concurrent Readers
Setter->>Version: modified() (SeqCst store)
Version->>Readers: readers load version (Acquire)
Setter->>Setter: perform attribute insert/remove
Setter-->>Readers: changes ordered via version operations
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/doctest.py dependencies:
dependent tests: (33 tests)
Legend:
|
b3456e0 to
2954b60
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/builtins/type.rs (1)
2171-2196:⚠️ Potential issue | 🔴 CriticalPost-mutation cache invalidation is missing after pre-invalidation.
At Line 2174,
zelf.modified()runs before the attribute write.find_name_in_mro()can reassigntp_version_tagand repopulate cache entries from pre-mutation state in that window. Since this path has no second invalidation after commit, stale cache entries can survive.💡 Proposed fix (keep pre-invalidate, add post-commit invalidate)
// Invalidate inline caches before modifying attributes. // This ensures other threads see the version invalidation before // any attribute changes, preventing use-after-free of cached descriptors. zelf.modified(); if let PySetterValue::Assign(value) = value { zelf.attributes.write().insert(attr_name, value); } else { let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable? if prev_value.is_none() { return Err(vm.new_attribute_error(format!( "type object '{}' has no attribute '{}'", zelf.name(), attr_name, ))); } } if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") { if assign { zelf.update_slot::<true>(attr_name, &vm.ctx); } else { zelf.update_slot::<false>(attr_name, &vm.ctx); } } + // Flush any entries repopulated between pre-invalidation and commit. + zelf.modified(); Ok(())#!/bin/bash set -euo pipefail # Verify invalidation placement in setattro rg -n -C8 'fn setattro|modified\(\)|attributes\.write\(\)\.(insert|shift_remove)' crates/vm/src/builtins/type.rs # Verify cache/version reassignment path that can run concurrently rg -n -C8 'fn find_name_in_mro|assign_version_tag|tp_version_tag\.load|tp_version_tag\.store' crates/vm/src/builtins/type.rsExpected result:
modified()appears before mutation insetattro, and no post-mutation invalidation currently exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/type.rs` around lines 2171 - 2196, The pre-mutation cache invalidation (zelf.modified()) in setattro is currently invoked before the attribute write but there is no post-commit invalidation, allowing find_name_in_mro to repopulate caches from the old state; fix this by keeping the existing pre-invalidation and adding a second invalidation immediately after the mutation commit (after attributes.write().insert or shift_remove completes and before calling update_slot::<...>), i.e., invoke zelf.modified() again to ensure tp_version_tag/cache entries are refreshed and prevent stale cache reuse.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 509-510: The code casts CodeUnit pointers to *const AtomicU16 (see
usages like units.as_ptr().wrapping_add(index) as *const AtomicU16 and the
subsequent .store calls), but CodeUnit is currently #[repr(C)] with two u8
fields and only 1-byte alignment, which violates atomic alignment requirements;
fix by changing the struct declaration to #[repr(C, align(2))] for CodeUnit and
add a compile-time assertion to ensure alignment (e.g.
assert_eq!(std::mem::align_of::<CodeUnit>(), 2) or a const_assert) so the
AtomicU16 casts are safe.
- Around line 485-499: The public safe readers (read_op, read_arg,
read_adaptive_counter) currently compute raw pointers with wrapping_add(index)
and dereference without bounds checks, which can cause UB on out-of-range
indices; update each function to load units first (as in read_cache_u16) and
assert the index (and any additional byte offsets, e.g., index+1 for arg) is
within units.len() before computing pointers and dereferencing so out-of-bounds
inputs panic instead of producing UB, keeping the existing pointer/dereference
logic otherwise.
In `@crates/vm/src/frame.rs`:
- Around line 4717-4723: The specialized LoadGlobalBuiltin path incorrectly
accepts dict subclasses via downcast_ref::<PyDict>(), which can bypass subclass
mapping; change the guard to require an exact PyDict type before doing the
direct dict lookup: first check that self.builtins is exactly the dict type
(e.g. via an exact-type check against vm.ctx.types.dict_type or a
PyDict::is_exact method), only then downcast_ref::<PyDict>() and use
cached_builtins_ver/current_builtins_ver and the fast get_item_opt path;
otherwise fall back to the generic global lookup; apply the same exact-type
check change to the other occurrence mentioned (around the 7313–7319 region).
- Around line 2442-2445: The current synchronization using
self.code.quickened.swap(true, atomic::Ordering::Relaxed) plus
atomic::fence(atomic::Ordering::Release) is misleading because there is no
matching Acquire reader; update the quickening logic in frame.rs: either remove
the explicit atomics (drop the swap + fence) if the actual memory ordering is
already provided by the replace_op (Release) ↔ read_op (Acquire) protocol in
CodeUnits, or make the intent explicit by switching to an acquire-aware pattern
(e.g., use swap/store with Release paired with an explicit Acquire load where
readers check quickened) and add a short comment referencing the CodeUnits
protocol; target the quickening site (self.code.quickened,
self.code.instructions.quicken(), atomic::fence) when making the change.
---
Outside diff comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 2171-2196: The pre-mutation cache invalidation (zelf.modified())
in setattro is currently invoked before the attribute write but there is no
post-commit invalidation, allowing find_name_in_mro to repopulate caches from
the old state; fix this by keeping the existing pre-invalidation and adding a
second invalidation immediately after the mutation commit (after
attributes.write().insert or shift_remove completes and before calling
update_slot::<...>), i.e., invoke zelf.modified() again to ensure
tp_version_tag/cache entries are refreshed and prevent stale cache reuse.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rs
| pub fn read_op(&self, index: usize) -> Instruction { | ||
| let units = unsafe { &*self.0.get() }; | ||
| let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8; | ||
| let byte = unsafe { &*ptr }.load(Ordering::Acquire); | ||
| // SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op. | ||
| unsafe { mem::transmute::<u8, Instruction>(byte) } | ||
| } | ||
|
|
||
| /// Atomically read the arg byte at `index` with Relaxed ordering. | ||
| pub fn read_arg(&self, index: usize) -> OpArgByte { | ||
| let units = unsafe { &*self.0.get() }; | ||
| let ptr = units.as_ptr().wrapping_add(index) as *const u8; | ||
| let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; | ||
| OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed)) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd crates/compiler-core && wc -l src/bytecode.rsRepository: RustPython/RustPython
Length of output: 85
🏁 Script executed:
cd crates/compiler-core && sed -n '480,510p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1585
🏁 Script executed:
cd crates/compiler-core && sed -n '564,580p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 712
🏁 Script executed:
cd crates/compiler-core && sed -n '440,460p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 956
🏁 Script executed:
cd crates/compiler-core && sed -n '1,30p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1183
🏁 Script executed:
cd crates/compiler-core && rg -A 3 -B 3 "read_op\(" src/bytecode.rs | head -40Repository: RustPython/RustPython
Length of output: 985
🏁 Script executed:
cd crates/compiler-core && rg "read_arg\(" src/bytecode.rs | head -20Repository: RustPython/RustPython
Length of output: 160
🏁 Script executed:
cd crates/compiler-core && rg "read_adaptive_counter\(" src/bytecode.rs | head -20Repository: RustPython/RustPython
Length of output: 126
🏁 Script executed:
cd crates/compiler-core && sed -n '320,380p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1985
🏁 Script executed:
cd crates/compiler-core && grep -n "struct CodeUnit\|struct CodeUnits\|#\[repr" src/bytecode.rs | head -20Repository: RustPython/RustPython
Length of output: 268
🏁 Script executed:
cd crates/compiler-core && grep -rn "read_op\|read_arg\|read_adaptive_counter" src/ --include="*.rs" | grep -v "^\s*//" | head -40Repository: RustPython/RustPython
Length of output: 691
🏁 Script executed:
cd crates/compiler-core && sed -n '630,680p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1808
🏁 Script executed:
cd crates/compiler-core && sed -n '380,440p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1989
🏁 Script executed:
cd crates/compiler-core && sed -n '590,640p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1911
🏁 Script executed:
cd crates/compiler-core && grep -n "pub fn len\|pub fn is_empty\|impl.*CodeUnits" src/bytecode.rsRepository: RustPython/RustPython
Length of output: 415
🏁 Script executed:
cd crates/compiler-core && sed -n '373,430p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1990
🏁 Script executed:
cd crates/compiler-core && grep -rn "\.read_op\|\.read_arg\|\.read_adaptive_counter" . --include="*.rs" | grep -v "src/bytecode.rs"Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
cd crates/compiler-core && sed -n '446,480p' src/bytecode.rsRepository: RustPython/RustPython
Length of output: 1406
🏁 Script executed:
rg -r crates/ "\.read_op\(|\.read_arg\(|\.read_adaptive_counter\(" --include="*.rs" | head -50Repository: RustPython/RustPython
Length of output: 506
🏁 Script executed:
rg "CodeUnits" --include="*.rs" -l | head -20Repository: RustPython/RustPython
Length of output: 506
🏁 Script executed:
rg "\.read_op\(|\.read_arg\(|\.read_adaptive_counter\(" -t rsRepository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
rg "CodeUnits" -t rs -lRepository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
rg "\.read_op\(|\.read_arg\(|\.read_adaptive_counter\("Repository: RustPython/RustPython
Length of output: 1856
🏁 Script executed:
rg "CodeUnits"Repository: RustPython/RustPython
Length of output: 1699
🏁 Script executed:
sed -n '1,50p' crates/vm/src/frame.rsRepository: RustPython/RustPython
Length of output: 1953
🏁 Script executed:
rg -B 5 -A 5 "\.read_op\(" crates/vm/src/frame.rs | head -80Repository: RustPython/RustPython
Length of output: 2743
🏁 Script executed:
sed -n '700,800p' crates/vm/src/frame.rs | grep -A 10 "fn eval"Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
rg "idx.*read_op|instr_idx.*read_op" crates/vm/src/frame.rs -B 3 -A 3 | head -100Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
sed -n '485,500p' crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 846
🏁 Script executed:
sed -n '569,580p' crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 567
🏁 Script executed:
rg "while.*<.*len\(\)|for.*in.*0.*len" crates/vm/src/frame.rs -B 2 -A 5 | head -60Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
rg "pub fn" crates/compiler-core/src/bytecode.rs | head -20Repository: RustPython/RustPython
Length of output: 1228
🏁 Script executed:
sed -n '435,445p' crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 419
🏁 Script executed:
sed -n '435,500p' crates/compiler-core/src/bytecode.rs | grep -E "fn len|fn is_empty"Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
rg "impl.*Deref.*CodeUnits" crates/compiler-core/src/bytecode.rs -A 15Repository: RustPython/RustPython
Length of output: 668
🏁 Script executed:
grep -n "pub fn len\|pub fn is_empty" crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
sed -n '435,450p' crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 668
🏁 Script executed:
rg "\.len\(\)" crates/compiler-core/src/bytecode.rs | head -10Repository: RustPython/RustPython
Length of output: 489
🏁 Script executed:
sed -n '515,570p' crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 2142
Safe readers allow potential UB on invalid index.
read_op, read_arg, and read_adaptive_counter are public safe functions that dereference raw pointers derived from wrapping_add(index) without bounds validation. Out-of-range input will cause undefined behavior instead of panicking. This contradicts the pattern already established in the same file for read_cache_u16, which includes an assertion.
Add bounds checks to all three functions:
Proposed fix
pub fn read_op(&self, index: usize) -> Instruction {
let units = unsafe { &*self.0.get() };
+ assert!(index < units.len(), "read_op: index out of bounds");
let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8;
let byte = unsafe { &*ptr }.load(Ordering::Acquire);
// SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op.
unsafe { mem::transmute::<u8, Instruction>(byte) }
}
pub fn read_arg(&self, index: usize) -> OpArgByte {
let units = unsafe { &*self.0.get() };
+ assert!(index < units.len(), "read_arg: index out of bounds");
let ptr = units.as_ptr().wrapping_add(index) as *const u8;
let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed))
}
pub fn read_adaptive_counter(&self, index: usize) -> u8 {
let units = unsafe { &*self.0.get() };
+ assert!(index < units.len(), "read_adaptive_counter: index out of bounds");
let ptr = units.as_ptr().wrapping_add(index) as *const u8;
let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
unsafe { &*arg_ptr }.load(Ordering::Relaxed)
}Per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The public safe
readers (read_op, read_arg, read_adaptive_counter) currently compute raw
pointers with wrapping_add(index) and dereference without bounds checks, which
can cause UB on out-of-range indices; update each function to load units first
(as in read_cache_u16) and assert the index (and any additional byte offsets,
e.g., index+1 for arg) is within units.len() before computing pointers and
dereferencing so out-of-bounds inputs panic instead of producing UB, keeping the
existing pointer/dereference logic otherwise.
| let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU16; | ||
| unsafe { &*ptr }.store(value, Ordering::Relaxed); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the relevant section of the bytecode file
fd bytecode.rs | head -1Repository: RustPython/RustPython
Length of output: 101
🏁 Script executed:
# Read the file to check the CodeUnit struct and the atomic operations around the mentioned lines
cat -n crates/compiler-core/src/bytecode.rs | sed -n '1,50p'Repository: RustPython/RustPython
Length of output: 2067
🏁 Script executed:
# Get the CodeUnit struct definition and surrounding code
cat -n crates/compiler-core/src/bytecode.rs | sed -n '30,100p'Repository: RustPython/RustPython
Length of output: 2781
🏁 Script executed:
# Check lines 509-525 to see the actual code
cat -n crates/compiler-core/src/bytecode.rs | sed -n '500,530p'Repository: RustPython/RustPython
Length of output: 1618
🏁 Script executed:
# Check the size and alignment assertions
cat -n crates/compiler-core/src/bytecode.rs | grep -A 3 "const _:"Repository: RustPython/RustPython
Length of output: 275
🏁 Script executed:
# Find the CodeUnit struct definition
cat -n crates/compiler-core/src/bytecode.rs | sed -n '340,360p'Repository: RustPython/RustPython
Length of output: 738
🏁 Script executed:
# Search for the CodeUnit struct definition more broadly
rg "struct CodeUnit" -A 5 crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 492
🏁 Script executed:
# Check if there's any alignment attribute or repr clause on CodeUnit
rg "CodeUnit" crates/compiler-core/src/bytecode.rs | head -20Repository: RustPython/RustPython
Length of output: 857
🏁 Script executed:
# Check what Instruction and OpArgByte are
rg "type Instruction|struct Instruction|enum Instruction" crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check OpArgByte definition
rg "type OpArgByte|struct OpArgByte|enum OpArgByte" crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Look at the instruction module for Instruction definition
cat -n crates/compiler-core/src/bytecode/instruction.rs | head -50Repository: RustPython/RustPython
Length of output: 1871
🏁 Script executed:
# Look at the oparg module for OpArgByte
cat -n crates/compiler-core/src/bytecode/oparg.rs | head -30Repository: RustPython/RustPython
Length of output: 857
🏁 Script executed:
# Check what the alignment of AtomicU16 is and the current alignment of CodeUnit
cat > /tmp/check_alignment.rs << 'EOF'
use core::{mem, sync::atomic::AtomicU16};
#[repr(C)]
#[derive(Copy, Clone, Debug)]
pub struct CodeUnit {
pub op: u8,
pub arg: u8,
}
fn main() {
println!("CodeUnit size: {}", mem::size_of::<CodeUnit>());
println!("CodeUnit align: {}", mem::align_of::<CodeUnit>());
println!("AtomicU16 size: {}", mem::size_of::<AtomicU16>());
println!("AtomicU16 align: {}", mem::align_of::<AtomicU16>());
println!("u16 align: {}", mem::align_of::<u16>());
}
EOF
rustc /tmp/check_alignment.rs -o /tmp/check_alignment && /tmp/check_alignmentRepository: RustPython/RustPython
Length of output: 111
🏁 Script executed:
# Verify the actual lines 509-510 and 521-523 in context
cat -n crates/compiler-core/src/bytecode.rs | sed -n '507,524p'Repository: RustPython/RustPython
Length of output: 961
🏁 Script executed:
# Check if there are any other alignment-related comments or safety notes
rg "align|Alignment" crates/compiler-core/src/bytecode.rs -B 2 -A 2Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Verify the actual Instruction and OpArgByte alignment requirements by checking their definitions
rg "pub struct Instruction|pub enum Instruction" crates/compiler-core/src/bytecode/instruction.rs -A 1Repository: RustPython/RustPython
Length of output: 147
🏁 Script executed:
# Look for any repr attributes on Instruction
cat -n crates/compiler-core/src/bytecode/instruction.rs | sed -n '16,30p'Repository: RustPython/RustPython
Length of output: 636
🏁 Script executed:
# Check if there are any tests or documentation about alignment
rg "align|unsafe.*AtomicU16" crates/compiler-core/src/ -B 3 -A 3Repository: RustPython/RustPython
Length of output: 1165
🏁 Script executed:
# Check what the actual byte layout is for CodeUnit
python3 << 'EOF'
# Instruction is #[repr(u8)] - 1 byte
# OpArgByte is #[repr(transparent)] wrapping u8 - 1 byte
# So CodeUnit = 1 + 1 = 2 bytes, but with #[repr(C)], alignment is 1 byte
print("With #[repr(C)] and two u8 fields:")
print(" Size: 2 bytes")
print(" Alignment: 1 byte (minimum of field alignments)")
print()
print("AtomicU16 requires:")
print(" Size: 2 bytes")
print(" Alignment: 2 bytes (natural alignment for atomic operations)")
print()
print("Issue: Casting 1-byte aligned pointer to 2-byte aligned is UB")
EOFRepository: RustPython/RustPython
Length of output: 328
🏁 Script executed:
# Check if there are any other unsafe pointer casts in the file that might have similar issues
rg "as \*const AtomicU" crates/compiler-core/src/bytecode.rsRepository: RustPython/RustPython
Length of output: 623
🏁 Script executed:
# Check the SAFETY comment around these operations
cat -n crates/compiler-core/src/bytecode.rs | sed -n '495,535p'Repository: RustPython/RustPython
Length of output: 2081
Add explicit 2-byte alignment guarantee to CodeUnit struct.
The casts to *const AtomicU16 at lines 509 and 521 require 2-byte alignment, but CodeUnit with #[repr(C)] and two u8 fields has only 1-byte alignment. This violates atomic type requirements and is undefined behavior on any target. The current size assertion only checks that CodeUnit is 2 bytes, not that it's 2-byte aligned.
Add align(2) to the repr attribute and assert alignment:
Proposed fix
-#[repr(C)]
+#[repr(C, align(2))]
#[derive(Copy, Clone, Debug)]
pub struct CodeUnit {
pub op: Instruction,
pub arg: OpArgByte,
}
-const _: () = assert!(mem::size_of::<CodeUnit>() == 2);
+const _: () = {
+ assert!(mem::size_of::<CodeUnit>() == 2);
+ assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>());
+};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU16; | |
| unsafe { &*ptr }.store(value, Ordering::Relaxed); | |
| #[repr(C, align(2))] | |
| #[derive(Copy, Clone, Debug)] | |
| pub struct CodeUnit { | |
| pub op: Instruction, | |
| pub arg: OpArgByte, | |
| } | |
| const _: () = { | |
| assert!(mem::size_of::<CodeUnit>() == 2); | |
| assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>()); | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/compiler-core/src/bytecode.rs` around lines 509 - 510, The code casts
CodeUnit pointers to *const AtomicU16 (see usages like
units.as_ptr().wrapping_add(index) as *const AtomicU16 and the subsequent .store
calls), but CodeUnit is currently #[repr(C)] with two u8 fields and only 1-byte
alignment, which violates atomic alignment requirements; fix by changing the
struct declaration to #[repr(C, align(2))] for CodeUnit and add a compile-time
assertion to ensure alignment (e.g. assert_eq!(std::mem::align_of::<CodeUnit>(),
2) or a const_assert) so the AtomicU16 casts are safe.
2954b60 to
0b66b19
Compare
Optimizes user-defined class instantiation MyClass(args...) when tp_new == object.__new__ and __init__ is a simple PyFunction. Allocates the object directly and calls __init__ via invoke_exact_args, bypassing the generic type.__call__ dispatch path.
Change jitted_code from OnceCell to PyMutex<Option<CompiledCode>> so it can be cleared on __code__ assignment. The setter now sets the cached JIT code to None to prevent executing stale machine code.
- range iterator: deduplicate fast_next/next_fast - Replace raw pointer reads/writes in CodeUnits with atomic operations (AtomicU8/AtomicU16) for thread safety - Add read_op (Acquire), read_arg (Relaxed), compare_exchange_op - Use Release ordering in replace_op to synchronize cache writes - Dispatch loop reads opcodes atomically via read_op/read_arg - Fix adaptive counter access: use read/write_adaptive_counter instead of read/write_cache_u16 (was reading wrong bytes) - Add pre-check guards to all specialize_* functions to prevent concurrent specialization races - Move modified() before attribute changes in type.__setattr__ to prevent use-after-free of cached descriptors - Use SeqCst ordering in modified() for version invalidation - Add Release fence after quicken() initialization
For __getattribute__: only use getattro_wrapper when the type itself defines the attribute; otherwise inherit native slot from base class via MRO. For __setattr__/__delattr__: only store setattro_wrapper when the type has its own __setattr__ or __delattr__; otherwise keep the inherited base slot.
write_cache_u32 at cache_base+3 writes 2 code units (positions 3 and 4), but STORE_ATTR only has 4 cache entries (positions 0-3). This overwrites the next instruction with the upper 16 bits of the slot offset. Changed to write_cache_u16/read_cache_u16 since member descriptor offsets fit within u16 (max 65535 bytes).
has_python_cmp incorrectly treated method_descriptor as Python-level comparison methods, causing richcompare slot to use wrapper dispatch instead of inheriting the native slot.
partial_cmp returns None for NaN comparisons. is_some_and incorrectly returned false for all NaN comparisons, but NaN != x should be true per IEEE 754 semantics.
Change lookup_slot_in_mro to return a 3-state SlotLookupResult enum (NativeSlot/PythonMethod/NotFound) instead of Option<T>. Previously, both "found a Python-level method" and "found nothing" returned None, causing incorrect slot inheritance. For example, class Test(Mixin, TestCase) would inherit object.slot_init from Mixin via inherit_from_mro instead of using init_wrapper to dispatch TestCase.__init__. Apply this fix consistently to all slot update sites: update_main_slot!, update_sub_slot!, TpGetattro, TpSetattro, TpDescrSet, TpHash, TpRichcompare, SqAssItem, MpAssSubscript.
- deoptimize() / deoptimize_at(): replace specialized op with base op - adaptive(): decrement warmup counter or call specialize function - commit_specialization(): replace op on success, backoff on failure - execute_binary_op_int() / execute_binary_op_float(): typed binary ops Removes 10 duplicate deoptimize_* functions, consolidates 13 adaptive counter blocks, 6 binary op handlers, and 7 specialize tail patterns. Also replaces inline deopt blocks in LoadAttr/StoreAttr handlers.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/vm/src/frame.rs (2)
7391-7407:⚠️ Potential issue | 🔴 CriticalGuard
UnpackSequenceTwoTuplebehind expected arity (UNPACK_SEQUENCE 2)
Instruction::UnpackSequenceTwoTupleis selected from runtime tuple length only. If this opcode isUNPACK_SEQUENCE NwhereN != 2, a later execution can incorrectly succeed with 2 elements instead of raising. This is a correctness break and can corrupt stack expectations.💡 Proposed fix
@@ - Instruction::UnpackSequence { size } => { - self.adaptive(|s, ii, cb| s.specialize_unpack_sequence(vm, ii, cb)); - self.unpack_sequence(size.get(arg), vm) + Instruction::UnpackSequence { size } => { + let expected_size = size.get(arg); + self.adaptive(|s, ii, cb| s.specialize_unpack_sequence(vm, expected_size, ii, cb)); + self.unpack_sequence(expected_size, vm) } @@ - fn specialize_unpack_sequence( + fn specialize_unpack_sequence( &mut self, vm: &VirtualMachine, + expected_size: u32, instr_idx: usize, cache_base: usize, ) { @@ - let new_op = if let Some(tuple) = obj.downcast_ref_if_exact::<PyTuple>(vm) { - if tuple.len() == 2 { + let new_op = if let Some(tuple) = obj.downcast_ref_if_exact::<PyTuple>(vm) { + if expected_size == 2 && tuple.len() == 2 { Some(Instruction::UnpackSequenceTwoTuple) } else { Some(Instruction::UnpackSequenceTuple) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7391 - 7407, The specialization picks Instruction::UnpackSequenceTwoTuple based solely on the runtime tuple length, which can violate the original UNPACK_SEQUENCE N expectation; update specialize_unpack_sequence to first inspect the existing instruction at instr_idx (match on Instruction::UnpackSequence { count }) and only choose UnpackSequenceTwoTuple when count == 2 in addition to the obj being an exact PyTuple of length 2; keep all other paths unchanged so UNPACK_SEQUENCE N (N != 2) still preserves the generic UnpackSequence behavior.
7475-7486:⚠️ Potential issue | 🟠 MajorAvoid unchecked narrowing of slot offset in
StoreAttrSlotcache
offset as u16can truncate. If that happens, specialization can write to the wrong slot index at runtime. Please guard the conversion and back off specialization when it does not fit.💡 Proposed fix
if let Some(ref descr) = cls_attr && let Some(member_descr) = descr.downcast_ref::<PyMemberDescriptor>() && let MemberGetter::Offset(offset) = member_descr.member.getter { - unsafe { - self.code - .instructions - .write_cache_u32(cache_base + 1, type_version); - self.code - .instructions - .write_cache_u16(cache_base + 3, offset as u16); - self.code - .instructions - .replace_op(instr_idx, Instruction::StoreAttrSlot); - } + if let Ok(slot_offset) = u16::try_from(offset) { + unsafe { + self.code + .instructions + .write_cache_u32(cache_base + 1, type_version); + self.code + .instructions + .write_cache_u16(cache_base + 3, slot_offset); + self.code + .instructions + .replace_op(instr_idx, Instruction::StoreAttrSlot); + } + } else { + unsafe { + self.code + .instructions + .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE); + } + } } else {As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7475 - 7486, The code currently casts offset as u16 unconditionally in the StoreAttrSlot cache path (see cls_attr, member_descr and MemberGetter::Offset(offset) and the call to write_cache_u16), which can truncate large offsets; add a guard that checks whether offset fits in u16 (e.g. offset <= u16::MAX) before taking the specialization/unsafe path and writing the cache (write_cache_u32/write_cache_u16); if the offset does not fit, back off the specialization and fall through to the generic store-attribute path instead of writing a truncated offset into the cache.
♻️ Duplicate comments (2)
crates/compiler-core/src/bytecode.rs (2)
485-499:⚠️ Potential issue | 🔴 CriticalSafe readers dereference raw pointers without bounds validation.
Lines [485-499] and [567-574] are safe methods, but they compute/deref raw pointers without checking
index < len. Out-of-range calls can produce UB instead of a panic.Proposed fix
pub fn read_op(&self, index: usize) -> Instruction { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_op: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8; let byte = unsafe { &*ptr }.load(Ordering::Acquire); unsafe { mem::transmute::<u8, Instruction>(byte) } } pub fn read_arg(&self, index: usize) -> OpArgByte { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_arg: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed)) } pub fn read_adaptive_counter(&self, index: usize) -> u8 { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_adaptive_counter: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; unsafe { &*arg_ptr }.load(Ordering::Relaxed) }#!/bin/bash set -e sed -n '483,500p' crates/compiler-core/src/bytecode.rs sed -n '567,575p' crates/compiler-core/src/bytecode.rsAs per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 567-574
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The safe reader methods read_op and read_arg perform raw pointer arithmetic/derefs without validating index, which can cause UB for out-of-range indices; modify both functions to first obtain units (as now), check that index < units.len() and if not panic with a clear message (e.g., index out of bounds) before doing any pointer arithmetic/dereference, then proceed with the existing AtomicU8 load and transmute/OpArgByte conversion; ensure the check uses the same units reference used for pointer math so bounds are validated against the exact slice.
342-350:⚠️ Potential issue | 🔴 Critical
AtomicU16pointer casts require alignment thatCodeUnitdoes not guarantee.Lines [509-523] cast
CodeUnit*toAtomicU16*, butCodeUnitat Lines [342-350] is#[repr(C)]with byte-sized fields, so alignment is not guaranteed to be 2. That is UB for atomic u16 accesses.Proposed fix
-#[repr(C)] +#[repr(C, align(2))] #[derive(Copy, Clone, Debug)] pub struct CodeUnit { pub op: Instruction, pub arg: OpArgByte, } -const _: () = assert!(mem::size_of::<CodeUnit>() == 2); +const _: () = { + assert!(mem::size_of::<CodeUnit>() == 2); + assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>()); +};#!/bin/bash set -e sed -n '340,352p' crates/compiler-core/src/bytecode.rs rg -n 'as \*const AtomicU16' crates/compiler-core/src/bytecode.rsAs per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 509-523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 342 - 350, The CodeUnit struct is not guaranteed 2-byte aligned but later code casts CodeUnit pointers to AtomicU16 (unsafe for misaligned data); fix by making CodeUnit 2-byte aligned—add #[repr(C, align(2))] to the CodeUnit definition (keeping the existing fields op: Instruction and arg: OpArgByte) and verify the size assertion still holds, then confirm the atomic pointer casts around the code that converts CodeUnit* to AtomicU16* (the casts at the section around lines that perform "as *const AtomicU16"/"as *mut AtomicU16") are now safe due to the enforced alignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 7391-7407: The specialization picks
Instruction::UnpackSequenceTwoTuple based solely on the runtime tuple length,
which can violate the original UNPACK_SEQUENCE N expectation; update
specialize_unpack_sequence to first inspect the existing instruction at
instr_idx (match on Instruction::UnpackSequence { count }) and only choose
UnpackSequenceTwoTuple when count == 2 in addition to the obj being an exact
PyTuple of length 2; keep all other paths unchanged so UNPACK_SEQUENCE N (N !=
2) still preserves the generic UnpackSequence behavior.
- Around line 7475-7486: The code currently casts offset as u16 unconditionally
in the StoreAttrSlot cache path (see cls_attr, member_descr and
MemberGetter::Offset(offset) and the call to write_cache_u16), which can
truncate large offsets; add a guard that checks whether offset fits in u16 (e.g.
offset <= u16::MAX) before taking the specialization/unsafe path and writing the
cache (write_cache_u32/write_cache_u16); if the offset does not fit, back off
the specialization and fall through to the generic store-attribute path instead
of writing a truncated offset into the cache.
---
Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 485-499: The safe reader methods read_op and read_arg perform raw
pointer arithmetic/derefs without validating index, which can cause UB for
out-of-range indices; modify both functions to first obtain units (as now),
check that index < units.len() and if not panic with a clear message (e.g.,
index out of bounds) before doing any pointer arithmetic/dereference, then
proceed with the existing AtomicU8 load and transmute/OpArgByte conversion;
ensure the check uses the same units reference used for pointer math so bounds
are validated against the exact slice.
- Around line 342-350: The CodeUnit struct is not guaranteed 2-byte aligned but
later code casts CodeUnit pointers to AtomicU16 (unsafe for misaligned data);
fix by making CodeUnit 2-byte aligned—add #[repr(C, align(2))] to the CodeUnit
definition (keeping the existing fields op: Instruction and arg: OpArgByte) and
verify the size assertion still holds, then confirm the atomic pointer casts
around the code that converts CodeUnit* to AtomicU16* (the casts at the section
around lines that perform "as *const AtomicU16"/"as *mut AtomicU16") are now
safe due to the enforced alignment.
ℹ️ Review info
Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: cdcead23-a33a-4961-b807-327211593155
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/dict_inner.rs
- crates/vm/src/builtins/frame.rs
- CONTAINS_OP_SET: add frozenset support in handler and specialize - TO_BOOL_ALWAYS_TRUE: cache type version instead of checking slots - LOAD_GLOBAL_BUILTIN: cache builtins dict version alongside globals - mark_stacks: deoptimize specialized opcodes for correct reachability
0b66b19 to
dd89151
Compare
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin specialization2 |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
crates/compiler-core/src/bytecode.rs (2)
485-499:⚠️ Potential issue | 🔴 CriticalAdd bounds checks to all safe atomic readers.
read_op,read_arg, andread_adaptive_counterare safe APIs but dereference pointers derived from uncheckedindex. Out-of-range input is UB instead of a panic.#!/bin/bash set -euo pipefail FILE="crates/compiler-core/src/bytecode.rs" # Verify safe readers and whether they assert bounds. rg -n "pub fn read_op|pub fn read_arg|pub fn read_adaptive_counter|assert!" "$FILE" -A8 -B2🔧 Suggested fix
pub fn read_op(&self, index: usize) -> Instruction { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_op: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8; let byte = unsafe { &*ptr }.load(Ordering::Acquire); // SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op. unsafe { mem::transmute::<u8, Instruction>(byte) } } pub fn read_arg(&self, index: usize) -> OpArgByte { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_arg: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed)) } pub fn read_adaptive_counter(&self, index: usize) -> u8 { let units = unsafe { &*self.0.get() }; + assert!(index < units.len(), "read_adaptive_counter: index out of bounds"); let ptr = units.as_ptr().wrapping_add(index) as *const u8; let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8; unsafe { &*arg_ptr }.load(Ordering::Relaxed) }As per coding guidelines: "Follow Rust best practices for error handling and memory management".
Also applies to: 567-574
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The safe atomic readers (read_op, read_arg, read_adaptive_counter) currently compute pointers from an unchecked index causing UB on out-of-range input; fix by reading units = unsafe { &*self.0.get() } then perform explicit bounds checks (e.g., assert!(index < units.len()) for read_op and read_adaptive_counter, and assert!(index + 1 < units.len()) for read_arg because it accesses the next byte) before creating/wrapping the pointer; only after those checks compute the ptr/arg_ptr and perform the AtomicU8 load so out-of-range indices panic instead of producing UB.
509-510:⚠️ Potential issue | 🔴 CriticalGuarantee 2-byte alignment before casting to
AtomicU16.The
CodeUnit* -> AtomicU16*cast requires 2-byte alignment. With#[repr(C)]and two 1-byte fields, this alignment is not guaranteed, so these atomic accesses are UB-prone.#!/bin/bash set -euo pipefail FILE="crates/compiler-core/src/bytecode.rs" # Show CodeUnit layout declaration + AtomicU16 pointer casts. rg -n "struct CodeUnit|repr\\(C|AtomicU16|size_of::<CodeUnit>|align_of::<CodeUnit>" "$FILE" -A3 -B2🔧 Suggested fix
-#[repr(C)] +#[repr(C, align(2))] #[derive(Copy, Clone, Debug)] pub struct CodeUnit { pub op: Instruction, pub arg: OpArgByte, } -const _: () = assert!(mem::size_of::<CodeUnit>() == 2); +const _: () = { + assert!(mem::size_of::<CodeUnit>() == 2); + assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>()); +};As per coding guidelines: "Follow Rust best practices for error handling and memory management".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 509 - 510, The code casts CodeUnit pointers to AtomicU16 pointers (units.as_ptr().wrapping_add(index) as *const AtomicU16) but CodeUnit (with #[repr(C)] and two 1-byte fields) may not be 2-byte aligned, causing UB; fix by guaranteeing 2-byte alignment on CodeUnit (e.g., annotate the CodeUnit struct with #[repr(C, align(2))]) so the cast is valid, then replace the cast with a safe pointer cast like units.as_ptr().wrapping_add(index).cast::<AtomicU16>() and keep the unsafe { &*ptr }.store(value, Ordering::Relaxed) usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 711-716: The update to __code__ must be atomic with JIT-cache
invalidation to avoid running stale jitted code; modify set___code__ so it
acquires the jitted_code mutex and clears the cached entry before performing
self.code.swap_to_temporary_refs, or hold the same jitted_code lock while doing
both operations so no other thread can observe cleared/uncleared state in
between. Specifically, in set___code__, lock jitted_code (the Mutex used at
*self.jitted_code.lock()), set it to None, then call
self.code.swap_to_temporary_refs(code, vm) while still holding the lock, and
finally release the lock so the swap and cache invalidation occur atomically.
In `@crates/vm/src/builtins/type.rs`:
- Around line 2171-2175: The current pre-invalidation (calling zelf.modified()
once before attribute mutation) leaves a window where a concurrent
find_name_in_mro may cache results under the new tag during the mutation; fix
this by invoking zelf.modified() both before starting the mutation and once more
immediately after finishing attribute modifications so any caches created during
the mutation are replay-invalidated; update the code path where attributes are
changed (the block containing the single zelf.modified() call) to perform a
second modified() post-mutation and ensure this behavior aligns with the
cache-checking logic in find_name_in_mro.
---
Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 485-499: The safe atomic readers (read_op, read_arg,
read_adaptive_counter) currently compute pointers from an unchecked index
causing UB on out-of-range input; fix by reading units = unsafe { &*self.0.get()
} then perform explicit bounds checks (e.g., assert!(index < units.len()) for
read_op and read_adaptive_counter, and assert!(index + 1 < units.len()) for
read_arg because it accesses the next byte) before creating/wrapping the
pointer; only after those checks compute the ptr/arg_ptr and perform the
AtomicU8 load so out-of-range indices panic instead of producing UB.
- Around line 509-510: The code casts CodeUnit pointers to AtomicU16 pointers
(units.as_ptr().wrapping_add(index) as *const AtomicU16) but CodeUnit (with
#[repr(C)] and two 1-byte fields) may not be 2-byte aligned, causing UB; fix by
guaranteeing 2-byte alignment on CodeUnit (e.g., annotate the CodeUnit struct
with #[repr(C, align(2))]) so the cast is valid, then replace the cast with a
safe pointer cast like units.as_ptr().wrapping_add(index).cast::<AtomicU16>()
and keep the unsafe { &*ptr }.store(value, Ordering::Relaxed) usage.
ℹ️ Review info
Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 3cc6b83b-aec4-4da8-a316-976f49a64f3c
📒 Files selected for processing (7)
crates/compiler-core/src/bytecode.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/type.rscrates/vm/src/dict_inner.rscrates/vm/src/frame.rscrates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/src/builtins/frame.rs
- crates/vm/src/dict_inner.rs
| fn set___code__(&self, code: PyRef<PyCode>, vm: &VirtualMachine) { | ||
| self.code.swap_to_temporary_refs(code, vm); | ||
| #[cfg(feature = "jit")] | ||
| { | ||
| *self.jitted_code.lock() = None; | ||
| } |
There was a problem hiding this comment.
Make __code__ update atomic with JIT cache invalidation.
self.code is swapped before jitted_code is cleared. A concurrent call can run stale compiled code against the new __code__ in that window.
🔧 Suggested fix
#[pygetset(setter)]
fn set___code__(&self, code: PyRef<PyCode>, vm: &VirtualMachine) {
- self.code.swap_to_temporary_refs(code, vm);
#[cfg(feature = "jit")]
{
- *self.jitted_code.lock() = None;
+ let mut jitted = self.jitted_code.lock();
+ *jitted = None;
+ self.code.swap_to_temporary_refs(code, vm);
+ }
+ #[cfg(not(feature = "jit"))]
+ {
+ self.code.swap_to_temporary_refs(code, vm);
}
self.func_version.store(0, Relaxed);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/function.rs` around lines 711 - 716, The update to
__code__ must be atomic with JIT-cache invalidation to avoid running stale
jitted code; modify set___code__ so it acquires the jitted_code mutex and clears
the cached entry before performing self.code.swap_to_temporary_refs, or hold the
same jitted_code lock while doing both operations so no other thread can observe
cleared/uncleared state in between. Specifically, in set___code__, lock
jitted_code (the Mutex used at *self.jitted_code.lock()), set it to None, then
call self.code.swap_to_temporary_refs(code, vm) while still holding the lock,
and finally release the lock so the swap and cache invalidation occur
atomically.
| if zelf.jitted_code.lock().is_some() { | ||
| return Ok(()); | ||
| } | ||
| let arg_types = jit::get_jit_arg_types(&zelf, vm)?; | ||
| let ret_type = jit::jit_ret_type(&zelf, vm)?; | ||
| let code: &Py<PyCode> = &zelf.code; | ||
| let compiled = rustpython_jit::compile(&code.code, &arg_types, ret_type) | ||
| .map_err(|err| jit::new_jit_error(err.to_string(), vm))?; | ||
| let _ = zelf.jitted_code.set(compiled); | ||
| *zelf.jitted_code.lock() = Some(compiled); | ||
| Ok(()) |
There was a problem hiding this comment.
Revalidate code identity before publishing compiled JIT code.
__jit__ does check -> compile -> store with no consistency check. If __code__ changes during compile, stale machine code can be cached and reused.
🔧 Suggested fix
#[cfg(feature = "jit")]
#[pymethod]
fn __jit__(zelf: PyRef<Self>, vm: &VirtualMachine) -> PyResult<()> {
if zelf.jitted_code.lock().is_some() {
return Ok(());
}
+ let code_snapshot: PyRef<PyCode> = (*zelf.code).to_owned();
+ let code_id = code_snapshot.get_id();
let arg_types = jit::get_jit_arg_types(&zelf, vm)?;
let ret_type = jit::jit_ret_type(&zelf, vm)?;
- let code: &Py<PyCode> = &zelf.code;
- let compiled = rustpython_jit::compile(&code.code, &arg_types, ret_type)
+ let compiled = rustpython_jit::compile(&code_snapshot.code, &arg_types, ret_type)
.map_err(|err| jit::new_jit_error(err.to_string(), vm))?;
- *zelf.jitted_code.lock() = Some(compiled);
+ let mut jitted = zelf.jitted_code.lock();
+ if jitted.is_none() && (*zelf.code).get_id() == code_id {
+ *jitted = Some(compiled);
+ }
Ok(())
}| // Invalidate inline caches before modifying attributes. | ||
| // This ensures other threads see the version invalidation before | ||
| // any attribute changes, preventing use-after-free of cached descriptors. | ||
| zelf.modified(); | ||
|
|
There was a problem hiding this comment.
Close the version-tag race introduced by pre-invalidation.
Calling zelf.modified() only before mutation allows a concurrent find_name_in_mro to assign/cache under a fresh tag during the mutation window (see Line 761-Line 777), so stale results can remain valid after the write completes.
🔧 Suggested fix
// Invalidate inline caches before modifying attributes.
// This ensures other threads see the version invalidation before
// any attribute changes, preventing use-after-free of cached descriptors.
zelf.modified();
if let PySetterValue::Assign(value) = value {
zelf.attributes.write().insert(attr_name, value);
} else {
let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable?
if prev_value.is_none() {
return Err(vm.new_attribute_error(format!(
"type object '{}' has no attribute '{}'",
zelf.name(),
attr_name,
)));
}
}
if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
if assign {
zelf.update_slot::<true>(attr_name, &vm.ctx);
} else {
zelf.update_slot::<false>(attr_name, &vm.ctx);
}
}
+ // Close the race: another thread may have assigned/cached a fresh
+ // version while mutation was in-flight.
+ zelf.modified();
Ok(())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/builtins/type.rs` around lines 2171 - 2175, The current
pre-invalidation (calling zelf.modified() once before attribute mutation) leaves
a window where a concurrent find_name_in_mro may cache results under the new tag
during the mutation; fix this by invoking zelf.modified() both before starting
the mutation and once more immediately after finishing attribute modifications
so any caches created during the mutation are replay-invalidated; update the
code path where attributes are changed (the block containing the single
zelf.modified() call) to perform a second modified() post-mutation and ensure
this behavior aligns with the cache-checking logic in find_name_in_mro.
Summary by CodeRabbit
Performance & Stability
Refactor
Behavior