Skip to content

Specialized ops#7322

Merged
youknowone merged 12 commits intoRustPython:mainfrom
youknowone:specialization2
Mar 4, 2026
Merged

Specialized ops#7322
youknowone merged 12 commits intoRustPython:mainfrom
youknowone:specialization2

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 3, 2026

Summary by CodeRabbit

  • Performance & Stability

    • Faster instruction execution via expanded adaptive specialization and improved inline caches.
    • Better thread-safety and stronger memory ordering for reliable concurrent execution and cache visibility.
    • Cache invalidation timing adjusted to reduce stale specializations.
  • Refactor

    • Centralized specialization/deoptimization lifecycle for consistent runtime behavior.
    • JIT cache moved to mutex-guarded storage for safer concurrent updates.
  • Behavior

    • Slot lookup now distinguishes native vs Python-level results for more accurate dispatch.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

This PR makes bytecode slots atomic for concurrent specialization, centralizes adaptive inline-caching and deoptimization flows, moves JIT cache storage to a mutex-protected optional, strengthens several atomic ordering choices across types/dicts, and distinguishes Python-level methods from native slots in slot lookup/update logic.

Changes

Cohort / File(s) Summary
Atomic Bytecode Operations
crates/compiler-core/src/bytecode.rs
Replaced non-atomic in-place opcode/arg/cache/adaptive-counter mutability with per-slot atomics (AtomicU8/AtomicU16). Added compare_exchange_op, read_op, read_arg, read_cache_u16/write_cache_u16, and adaptive counter read/write methods; updated serialization/quicken to use atomic reads; unsafe impl Sync for CodeUnits.
Adaptive Inline Caching & Deoptimization
crates/vm/src/frame.rs
Large refactor centralizing specialization lifecycle: added adaptive(), commit_specialization(), deoptimize()/deoptimize_at(), unified adaptive counters and commit paths, many specialized handlers (binary int/float, LoadAttr variants, Call optimizations, ForIter variants, ToBool, CompareOp, StoreSubscr, ContainsOp, etc.), and consolidated guards/version checks.
JIT Cache Storage
crates/vm/src/builtins/function.rs
Replaced OnceCell<CompiledCode> with PyMutex<Option<CompiledCode>> for jitted_code. Reworked all accesses to lock-based reads/writes and added cache invalidation on __code__ setter and in __jit__.
Slot Lookup Result Distinction
crates/vm/src/types/slot.rs
Introduced SlotLookupResult<T> (NativeSlot, PythonMethod, NotFound) and updated lookup_slot_in_mro() and all slot update sites/macros to branch on these outcomes, ensuring Python-level methods yield wrappers and MRO inheritance when not found.
Atomic Ordering Adjustments
crates/vm/src/dict_inner.rs, crates/vm/src/builtins/type.rs
Strengthened atomic orderings: dict version loads use Acquire and bumps use Release; PyType::modified() now uses SeqCst. SetAttr now invalidates type version before mutation.
Stack Analysis Minor Change
crates/vm/src/builtins/frame.rs
Stack analysis now calls deoptimize() on the resolved base opcode (after to_base()), ensuring de-specialization is applied during stack effect analysis.

Sequence Diagram(s)

sequenceDiagram
    participant Frame as ExecutingFrame
    participant CodeUnits as Code Storage
    participant Adaptive as Adaptive Engine
    participant Specialized as Specialized Handler
    participant Base as Base Handler

    Frame->>CodeUnits: read_op(index) (Acquire)
    CodeUnits-->>Frame: opcode
    Frame->>Adaptive: check adaptive counter & candidate (Relaxed)
    Adaptive->>Adaptive: increment counter (Relaxed)
    Adaptive->>Specialized: attempt specialization
    alt Success
        Specialized->>CodeUnits: compare_exchange_op(index, expected, new) (Release)
        CodeUnits-->>Specialized: CAS success
        Specialized-->>Frame: execute specialized path
    else Fail
        Specialized->>CodeUnits: write base opcode (Release)
        CodeUnits-->>Base: base opcode written
        Specialized-->>Frame: fall back to base handler
    end
Loading
sequenceDiagram
    participant Setter as PyType SetAttr
    participant Version as Version Tag
    participant Readers as Concurrent Readers

    Setter->>Version: modified() (SeqCst store)
    Version->>Readers: readers load version (Acquire)
    Setter->>Setter: perform attribute insert/remove
    Setter-->>Readers: changes ordered via version operations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

🐰 Hops excitedly
Atoms guard the opcode lane,
Slots now tell if Python reign—
Caches nudge their version flag,
Specializers leap, then sag.
Hooray—deopt, commit, and play!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Specialized ops' is vague and generic, using a non-descriptive term that fails to convey meaningful information about the specific changes in this substantial multi-file refactoring. Consider a more descriptive title that captures the main objectives, such as 'Implement adaptive specialization framework for bytecode operations' or 'Refactor to atomic-based synchronization with specialized opcode paths'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/doctest.py
[ ] test: cpython/Lib/test/test_doctest (TODO: 6)

dependencies:

  • doctest

dependent tests: (33 tests)

  • doctest: test_builtin test_cmd test_code test_collections test_ctypes test_decimal test_deque test_descrtut test_difflib test_doctest test_doctest2 test_enum test_extcall test_generators test_getopt test_heapq test_http_cookies test_itertools test_listcomps test_math test_metaclass test_pep646_syntax test_pickle test_pickletools test_setcomps test_statistics test_syntax test_threading_local test_typing test_unpack test_unpack_ex test_weakref test_zipimport

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone force-pushed the specialization2 branch 5 times, most recently from b3456e0 to 2954b60 Compare March 3, 2026 19:46
@youknowone youknowone marked this pull request as ready for review March 3, 2026 20:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/builtins/type.rs (1)

2171-2196: ⚠️ Potential issue | 🔴 Critical

Post-mutation cache invalidation is missing after pre-invalidation.

At Line 2174, zelf.modified() runs before the attribute write. find_name_in_mro() can reassign tp_version_tag and repopulate cache entries from pre-mutation state in that window. Since this path has no second invalidation after commit, stale cache entries can survive.

💡 Proposed fix (keep pre-invalidate, add post-commit invalidate)
         // Invalidate inline caches before modifying attributes.
         // This ensures other threads see the version invalidation before
         // any attribute changes, preventing use-after-free of cached descriptors.
         zelf.modified();

         if let PySetterValue::Assign(value) = value {
             zelf.attributes.write().insert(attr_name, value);
         } else {
             let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable?
             if prev_value.is_none() {
                 return Err(vm.new_attribute_error(format!(
                     "type object '{}' has no attribute '{}'",
                     zelf.name(),
                     attr_name,
                 )));
             }
         }

         if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
             if assign {
                 zelf.update_slot::<true>(attr_name, &vm.ctx);
             } else {
                 zelf.update_slot::<false>(attr_name, &vm.ctx);
             }
         }
+        // Flush any entries repopulated between pre-invalidation and commit.
+        zelf.modified();
         Ok(())
#!/bin/bash
set -euo pipefail

# Verify invalidation placement in setattro
rg -n -C8 'fn setattro|modified\(\)|attributes\.write\(\)\.(insert|shift_remove)' crates/vm/src/builtins/type.rs

# Verify cache/version reassignment path that can run concurrently
rg -n -C8 'fn find_name_in_mro|assign_version_tag|tp_version_tag\.load|tp_version_tag\.store' crates/vm/src/builtins/type.rs

Expected result: modified() appears before mutation in setattro, and no post-mutation invalidation currently exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 2171 - 2196, The pre-mutation
cache invalidation (zelf.modified()) in setattro is currently invoked before the
attribute write but there is no post-commit invalidation, allowing
find_name_in_mro to repopulate caches from the old state; fix this by keeping
the existing pre-invalidation and adding a second invalidation immediately after
the mutation commit (after attributes.write().insert or shift_remove completes
and before calling update_slot::<...>), i.e., invoke zelf.modified() again to
ensure tp_version_tag/cache entries are refreshed and prevent stale cache reuse.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 509-510: The code casts CodeUnit pointers to *const AtomicU16 (see
usages like units.as_ptr().wrapping_add(index) as *const AtomicU16 and the
subsequent .store calls), but CodeUnit is currently #[repr(C)] with two u8
fields and only 1-byte alignment, which violates atomic alignment requirements;
fix by changing the struct declaration to #[repr(C, align(2))] for CodeUnit and
add a compile-time assertion to ensure alignment (e.g.
assert_eq!(std::mem::align_of::<CodeUnit>(), 2) or a const_assert) so the
AtomicU16 casts are safe.
- Around line 485-499: The public safe readers (read_op, read_arg,
read_adaptive_counter) currently compute raw pointers with wrapping_add(index)
and dereference without bounds checks, which can cause UB on out-of-range
indices; update each function to load units first (as in read_cache_u16) and
assert the index (and any additional byte offsets, e.g., index+1 for arg) is
within units.len() before computing pointers and dereferencing so out-of-bounds
inputs panic instead of producing UB, keeping the existing pointer/dereference
logic otherwise.

In `@crates/vm/src/frame.rs`:
- Around line 4717-4723: The specialized LoadGlobalBuiltin path incorrectly
accepts dict subclasses via downcast_ref::<PyDict>(), which can bypass subclass
mapping; change the guard to require an exact PyDict type before doing the
direct dict lookup: first check that self.builtins is exactly the dict type
(e.g. via an exact-type check against vm.ctx.types.dict_type or a
PyDict::is_exact method), only then downcast_ref::<PyDict>() and use
cached_builtins_ver/current_builtins_ver and the fast get_item_opt path;
otherwise fall back to the generic global lookup; apply the same exact-type
check change to the other occurrence mentioned (around the 7313–7319 region).
- Around line 2442-2445: The current synchronization using
self.code.quickened.swap(true, atomic::Ordering::Relaxed) plus
atomic::fence(atomic::Ordering::Release) is misleading because there is no
matching Acquire reader; update the quickening logic in frame.rs: either remove
the explicit atomics (drop the swap + fence) if the actual memory ordering is
already provided by the replace_op (Release) ↔ read_op (Acquire) protocol in
CodeUnits, or make the intent explicit by switching to an acquire-aware pattern
(e.g., use swap/store with Release paired with an explicit Acquire load where
readers check quickened) and add a short comment referencing the CodeUnits
protocol; target the quickening site (self.code.quickened,
self.code.instructions.quicken(), atomic::fence) when making the change.

---

Outside diff comments:
In `@crates/vm/src/builtins/type.rs`:
- Around line 2171-2196: The pre-mutation cache invalidation (zelf.modified())
in setattro is currently invoked before the attribute write but there is no
post-commit invalidation, allowing find_name_in_mro to repopulate caches from
the old state; fix this by keeping the existing pre-invalidation and adding a
second invalidation immediately after the mutation commit (after
attributes.write().insert or shift_remove completes and before calling
update_slot::<...>), i.e., invoke zelf.modified() again to ensure
tp_version_tag/cache entries are refreshed and prevent stale cache reuse.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d106c3 and 2954b60.

📒 Files selected for processing (7)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/dict_inner.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/types/slot.rs

Comment on lines +485 to 499
pub fn read_op(&self, index: usize) -> Instruction {
let units = unsafe { &*self.0.get() };
let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8;
let byte = unsafe { &*ptr }.load(Ordering::Acquire);
// SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op.
unsafe { mem::transmute::<u8, Instruction>(byte) }
}

/// Atomically read the arg byte at `index` with Relaxed ordering.
pub fn read_arg(&self, index: usize) -> OpArgByte {
let units = unsafe { &*self.0.get() };
let ptr = units.as_ptr().wrapping_add(index) as *const u8;
let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd crates/compiler-core && wc -l src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 85


🏁 Script executed:

cd crates/compiler-core && sed -n '480,510p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1585


🏁 Script executed:

cd crates/compiler-core && sed -n '564,580p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 712


🏁 Script executed:

cd crates/compiler-core && sed -n '440,460p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 956


🏁 Script executed:

cd crates/compiler-core && sed -n '1,30p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1183


🏁 Script executed:

cd crates/compiler-core && rg -A 3 -B 3 "read_op\(" src/bytecode.rs | head -40

Repository: RustPython/RustPython

Length of output: 985


🏁 Script executed:

cd crates/compiler-core && rg "read_arg\(" src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 160


🏁 Script executed:

cd crates/compiler-core && rg "read_adaptive_counter\(" src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 126


🏁 Script executed:

cd crates/compiler-core && sed -n '320,380p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1985


🏁 Script executed:

cd crates/compiler-core && grep -n "struct CodeUnit\|struct CodeUnits\|#\[repr" src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 268


🏁 Script executed:

cd crates/compiler-core && grep -rn "read_op\|read_arg\|read_adaptive_counter" src/ --include="*.rs" | grep -v "^\s*//" | head -40

Repository: RustPython/RustPython

Length of output: 691


🏁 Script executed:

cd crates/compiler-core && sed -n '630,680p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1808


🏁 Script executed:

cd crates/compiler-core && sed -n '380,440p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1989


🏁 Script executed:

cd crates/compiler-core && sed -n '590,640p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1911


🏁 Script executed:

cd crates/compiler-core && grep -n "pub fn len\|pub fn is_empty\|impl.*CodeUnits" src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 415


🏁 Script executed:

cd crates/compiler-core && sed -n '373,430p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1990


🏁 Script executed:

cd crates/compiler-core && grep -rn "\.read_op\|\.read_arg\|\.read_adaptive_counter" . --include="*.rs" | grep -v "src/bytecode.rs"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cd crates/compiler-core && sed -n '446,480p' src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1406


🏁 Script executed:

rg -r crates/ "\.read_op\(|\.read_arg\(|\.read_adaptive_counter\(" --include="*.rs" | head -50

Repository: RustPython/RustPython

Length of output: 506


🏁 Script executed:

rg "CodeUnits" --include="*.rs" -l | head -20

Repository: RustPython/RustPython

Length of output: 506


🏁 Script executed:

rg "\.read_op\(|\.read_arg\(|\.read_adaptive_counter\(" -t rs

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

rg "CodeUnits" -t rs -l

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

rg "\.read_op\(|\.read_arg\(|\.read_adaptive_counter\("

Repository: RustPython/RustPython

Length of output: 1856


🏁 Script executed:

rg "CodeUnits"

Repository: RustPython/RustPython

Length of output: 1699


🏁 Script executed:

sed -n '1,50p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 1953


🏁 Script executed:

rg -B 5 -A 5 "\.read_op\(" crates/vm/src/frame.rs | head -80

Repository: RustPython/RustPython

Length of output: 2743


🏁 Script executed:

sed -n '700,800p' crates/vm/src/frame.rs | grep -A 10 "fn eval"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

rg "idx.*read_op|instr_idx.*read_op" crates/vm/src/frame.rs -B 3 -A 3 | head -100

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

sed -n '485,500p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 846


🏁 Script executed:

sed -n '569,580p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 567


🏁 Script executed:

rg "while.*<.*len\(\)|for.*in.*0.*len" crates/vm/src/frame.rs -B 2 -A 5 | head -60

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

rg "pub fn" crates/compiler-core/src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 1228


🏁 Script executed:

sed -n '435,445p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 419


🏁 Script executed:

sed -n '435,500p' crates/compiler-core/src/bytecode.rs | grep -E "fn len|fn is_empty"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

rg "impl.*Deref.*CodeUnits" crates/compiler-core/src/bytecode.rs -A 15

Repository: RustPython/RustPython

Length of output: 668


🏁 Script executed:

grep -n "pub fn len\|pub fn is_empty" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

sed -n '435,450p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 668


🏁 Script executed:

rg "\.len\(\)" crates/compiler-core/src/bytecode.rs | head -10

Repository: RustPython/RustPython

Length of output: 489


🏁 Script executed:

sed -n '515,570p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 2142


Safe readers allow potential UB on invalid index.

read_op, read_arg, and read_adaptive_counter are public safe functions that dereference raw pointers derived from wrapping_add(index) without bounds validation. Out-of-range input will cause undefined behavior instead of panicking. This contradicts the pattern already established in the same file for read_cache_u16, which includes an assertion.

Add bounds checks to all three functions:

Proposed fix
 pub fn read_op(&self, index: usize) -> Instruction {
     let units = unsafe { &*self.0.get() };
+    assert!(index < units.len(), "read_op: index out of bounds");
     let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8;
     let byte = unsafe { &*ptr }.load(Ordering::Acquire);
     // SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op.
     unsafe { mem::transmute::<u8, Instruction>(byte) }
 }

 pub fn read_arg(&self, index: usize) -> OpArgByte {
     let units = unsafe { &*self.0.get() };
+    assert!(index < units.len(), "read_arg: index out of bounds");
     let ptr = units.as_ptr().wrapping_add(index) as *const u8;
     let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
     OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed))
 }

 pub fn read_adaptive_counter(&self, index: usize) -> u8 {
     let units = unsafe { &*self.0.get() };
+    assert!(index < units.len(), "read_adaptive_counter: index out of bounds");
     let ptr = units.as_ptr().wrapping_add(index) as *const u8;
     let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
     unsafe { &*arg_ptr }.load(Ordering::Relaxed)
 }

Per coding guidelines: "Follow Rust best practices for error handling and memory management".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The public safe
readers (read_op, read_arg, read_adaptive_counter) currently compute raw
pointers with wrapping_add(index) and dereference without bounds checks, which
can cause UB on out-of-range indices; update each function to load units first
(as in read_cache_u16) and assert the index (and any additional byte offsets,
e.g., index+1 for arg) is within units.len() before computing pointers and
dereferencing so out-of-bounds inputs panic instead of producing UB, keeping the
existing pointer/dereference logic otherwise.

Comment on lines +509 to +510
let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU16;
unsafe { &*ptr }.store(value, Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the relevant section of the bytecode file
fd bytecode.rs | head -1

Repository: RustPython/RustPython

Length of output: 101


🏁 Script executed:

# Read the file to check the CodeUnit struct and the atomic operations around the mentioned lines
cat -n crates/compiler-core/src/bytecode.rs | sed -n '1,50p'

Repository: RustPython/RustPython

Length of output: 2067


🏁 Script executed:

# Get the CodeUnit struct definition and surrounding code
cat -n crates/compiler-core/src/bytecode.rs | sed -n '30,100p'

Repository: RustPython/RustPython

Length of output: 2781


🏁 Script executed:

# Check lines 509-525 to see the actual code
cat -n crates/compiler-core/src/bytecode.rs | sed -n '500,530p'

Repository: RustPython/RustPython

Length of output: 1618


🏁 Script executed:

# Check the size and alignment assertions
cat -n crates/compiler-core/src/bytecode.rs | grep -A 3 "const _:"

Repository: RustPython/RustPython

Length of output: 275


🏁 Script executed:

# Find the CodeUnit struct definition
cat -n crates/compiler-core/src/bytecode.rs | sed -n '340,360p'

Repository: RustPython/RustPython

Length of output: 738


🏁 Script executed:

# Search for the CodeUnit struct definition more broadly
rg "struct CodeUnit" -A 5 crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 492


🏁 Script executed:

# Check if there's any alignment attribute or repr clause on CodeUnit
rg "CodeUnit" crates/compiler-core/src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 857


🏁 Script executed:

# Check what Instruction and OpArgByte are
rg "type Instruction|struct Instruction|enum Instruction" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Check OpArgByte definition
rg "type OpArgByte|struct OpArgByte|enum OpArgByte" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look at the instruction module for Instruction definition
cat -n crates/compiler-core/src/bytecode/instruction.rs | head -50

Repository: RustPython/RustPython

Length of output: 1871


🏁 Script executed:

# Look at the oparg module for OpArgByte
cat -n crates/compiler-core/src/bytecode/oparg.rs | head -30

Repository: RustPython/RustPython

Length of output: 857


🏁 Script executed:

# Check what the alignment of AtomicU16 is and the current alignment of CodeUnit
cat > /tmp/check_alignment.rs << 'EOF'
use core::{mem, sync::atomic::AtomicU16};

#[repr(C)]
#[derive(Copy, Clone, Debug)]
pub struct CodeUnit {
    pub op: u8,
    pub arg: u8,
}

fn main() {
    println!("CodeUnit size: {}", mem::size_of::<CodeUnit>());
    println!("CodeUnit align: {}", mem::align_of::<CodeUnit>());
    println!("AtomicU16 size: {}", mem::size_of::<AtomicU16>());
    println!("AtomicU16 align: {}", mem::align_of::<AtomicU16>());
    println!("u16 align: {}", mem::align_of::<u16>());
}
EOF
rustc /tmp/check_alignment.rs -o /tmp/check_alignment && /tmp/check_alignment

Repository: RustPython/RustPython

Length of output: 111


🏁 Script executed:

# Verify the actual lines 509-510 and 521-523 in context
cat -n crates/compiler-core/src/bytecode.rs | sed -n '507,524p'

Repository: RustPython/RustPython

Length of output: 961


🏁 Script executed:

# Check if there are any other alignment-related comments or safety notes
rg "align|Alignment" crates/compiler-core/src/bytecode.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Verify the actual Instruction and OpArgByte alignment requirements by checking their definitions
rg "pub struct Instruction|pub enum Instruction" crates/compiler-core/src/bytecode/instruction.rs -A 1

Repository: RustPython/RustPython

Length of output: 147


🏁 Script executed:

# Look for any repr attributes on Instruction
cat -n crates/compiler-core/src/bytecode/instruction.rs | sed -n '16,30p'

Repository: RustPython/RustPython

Length of output: 636


🏁 Script executed:

# Check if there are any tests or documentation about alignment
rg "align|unsafe.*AtomicU16" crates/compiler-core/src/ -B 3 -A 3

Repository: RustPython/RustPython

Length of output: 1165


🏁 Script executed:

# Check what the actual byte layout is for CodeUnit
python3 << 'EOF'
# Instruction is #[repr(u8)] - 1 byte
# OpArgByte is #[repr(transparent)] wrapping u8 - 1 byte
# So CodeUnit = 1 + 1 = 2 bytes, but with #[repr(C)], alignment is 1 byte

print("With #[repr(C)] and two u8 fields:")
print("  Size: 2 bytes")
print("  Alignment: 1 byte (minimum of field alignments)")
print()
print("AtomicU16 requires:")
print("  Size: 2 bytes")
print("  Alignment: 2 bytes (natural alignment for atomic operations)")
print()
print("Issue: Casting 1-byte aligned pointer to 2-byte aligned is UB")
EOF

Repository: RustPython/RustPython

Length of output: 328


🏁 Script executed:

# Check if there are any other unsafe pointer casts in the file that might have similar issues
rg "as \*const AtomicU" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 623


🏁 Script executed:

# Check the SAFETY comment around these operations
cat -n crates/compiler-core/src/bytecode.rs | sed -n '495,535p'

Repository: RustPython/RustPython

Length of output: 2081


Add explicit 2-byte alignment guarantee to CodeUnit struct.

The casts to *const AtomicU16 at lines 509 and 521 require 2-byte alignment, but CodeUnit with #[repr(C)] and two u8 fields has only 1-byte alignment. This violates atomic type requirements and is undefined behavior on any target. The current size assertion only checks that CodeUnit is 2 bytes, not that it's 2-byte aligned.

Add align(2) to the repr attribute and assert alignment:

Proposed fix
-#[repr(C)]
+#[repr(C, align(2))]
 #[derive(Copy, Clone, Debug)]
 pub struct CodeUnit {
     pub op: Instruction,
     pub arg: OpArgByte,
 }

-const _: () = assert!(mem::size_of::<CodeUnit>() == 2);
+const _: () = {
+    assert!(mem::size_of::<CodeUnit>() == 2);
+    assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>());
+};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU16;
unsafe { &*ptr }.store(value, Ordering::Relaxed);
#[repr(C, align(2))]
#[derive(Copy, Clone, Debug)]
pub struct CodeUnit {
pub op: Instruction,
pub arg: OpArgByte,
}
const _: () = {
assert!(mem::size_of::<CodeUnit>() == 2);
assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>());
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 509 - 510, The code casts
CodeUnit pointers to *const AtomicU16 (see usages like
units.as_ptr().wrapping_add(index) as *const AtomicU16 and the subsequent .store
calls), but CodeUnit is currently #[repr(C)] with two u8 fields and only 1-byte
alignment, which violates atomic alignment requirements; fix by changing the
struct declaration to #[repr(C, align(2))] for CodeUnit and add a compile-time
assertion to ensure alignment (e.g. assert_eq!(std::mem::align_of::<CodeUnit>(),
2) or a const_assert) so the AtomicU16 casts are safe.

Optimizes user-defined class instantiation MyClass(args...)
when tp_new == object.__new__ and __init__ is a simple
PyFunction. Allocates the object directly and calls __init__
via invoke_exact_args, bypassing the generic type.__call__
dispatch path.
Change jitted_code from OnceCell to PyMutex<Option<CompiledCode>> so
it can be cleared on __code__ assignment. The setter now sets the
cached JIT code to None to prevent executing stale machine code.
- range iterator: deduplicate fast_next/next_fast
- Replace raw pointer reads/writes in CodeUnits with atomic
  operations (AtomicU8/AtomicU16) for thread safety
- Add read_op (Acquire), read_arg (Relaxed), compare_exchange_op
- Use Release ordering in replace_op to synchronize cache writes
- Dispatch loop reads opcodes atomically via read_op/read_arg
- Fix adaptive counter access: use read/write_adaptive_counter
  instead of read/write_cache_u16 (was reading wrong bytes)
- Add pre-check guards to all specialize_* functions to prevent
  concurrent specialization races
- Move modified() before attribute changes in type.__setattr__
  to prevent use-after-free of cached descriptors
- Use SeqCst ordering in modified() for version invalidation
- Add Release fence after quicken() initialization
For __getattribute__: only use getattro_wrapper when the type
itself defines the attribute; otherwise inherit native slot from
base class via MRO.

For __setattr__/__delattr__: only store setattro_wrapper when
the type has its own __setattr__ or __delattr__; otherwise keep
the inherited base slot.
write_cache_u32 at cache_base+3 writes 2 code units (positions 3 and 4),
but STORE_ATTR only has 4 cache entries (positions 0-3). This overwrites
the next instruction with the upper 16 bits of the slot offset.

Changed to write_cache_u16/read_cache_u16 since member descriptor offsets
fit within u16 (max 65535 bytes).
has_python_cmp incorrectly treated method_descriptor as Python-level
comparison methods, causing richcompare slot to use wrapper dispatch
instead of inheriting the native slot.
partial_cmp returns None for NaN comparisons. is_some_and incorrectly
returned false for all NaN comparisons, but NaN != x should be true
per IEEE 754 semantics.
Change lookup_slot_in_mro to return a 3-state SlotLookupResult
enum (NativeSlot/PythonMethod/NotFound) instead of Option<T>.

Previously, both "found a Python-level method" and "found nothing"
returned None, causing incorrect slot inheritance. For example,
class Test(Mixin, TestCase) would inherit object.slot_init from
Mixin via inherit_from_mro instead of using init_wrapper to
dispatch TestCase.__init__.

Apply this fix consistently to all slot update sites:
update_main_slot!, update_sub_slot!, TpGetattro, TpSetattro,
TpDescrSet, TpHash, TpRichcompare, SqAssItem, MpAssSubscript.
- deoptimize() / deoptimize_at(): replace specialized op with base op
- adaptive(): decrement warmup counter or call specialize function
- commit_specialization(): replace op on success, backoff on failure
- execute_binary_op_int() / execute_binary_op_float(): typed binary ops

Removes 10 duplicate deoptimize_* functions, consolidates 13 adaptive
counter blocks, 6 binary op handlers, and 7 specialize tail patterns.
Also replaces inline deopt blocks in LoadAttr/StoreAttr handlers.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/frame.rs (2)

7391-7407: ⚠️ Potential issue | 🔴 Critical

Guard UnpackSequenceTwoTuple behind expected arity (UNPACK_SEQUENCE 2)

Instruction::UnpackSequenceTwoTuple is selected from runtime tuple length only. If this opcode is UNPACK_SEQUENCE N where N != 2, a later execution can incorrectly succeed with 2 elements instead of raising. This is a correctness break and can corrupt stack expectations.

💡 Proposed fix
@@
-            Instruction::UnpackSequence { size } => {
-                self.adaptive(|s, ii, cb| s.specialize_unpack_sequence(vm, ii, cb));
-                self.unpack_sequence(size.get(arg), vm)
+            Instruction::UnpackSequence { size } => {
+                let expected_size = size.get(arg);
+                self.adaptive(|s, ii, cb| s.specialize_unpack_sequence(vm, expected_size, ii, cb));
+                self.unpack_sequence(expected_size, vm)
             }
@@
-    fn specialize_unpack_sequence(
+    fn specialize_unpack_sequence(
         &mut self,
         vm: &VirtualMachine,
+        expected_size: u32,
         instr_idx: usize,
         cache_base: usize,
     ) {
@@
-        let new_op = if let Some(tuple) = obj.downcast_ref_if_exact::<PyTuple>(vm) {
-            if tuple.len() == 2 {
+        let new_op = if let Some(tuple) = obj.downcast_ref_if_exact::<PyTuple>(vm) {
+            if expected_size == 2 && tuple.len() == 2 {
                 Some(Instruction::UnpackSequenceTwoTuple)
             } else {
                 Some(Instruction::UnpackSequenceTuple)
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7391 - 7407, The specialization picks
Instruction::UnpackSequenceTwoTuple based solely on the runtime tuple length,
which can violate the original UNPACK_SEQUENCE N expectation; update
specialize_unpack_sequence to first inspect the existing instruction at
instr_idx (match on Instruction::UnpackSequence { count }) and only choose
UnpackSequenceTwoTuple when count == 2 in addition to the obj being an exact
PyTuple of length 2; keep all other paths unchanged so UNPACK_SEQUENCE N (N !=
2) still preserves the generic UnpackSequence behavior.

7475-7486: ⚠️ Potential issue | 🟠 Major

Avoid unchecked narrowing of slot offset in StoreAttrSlot cache

offset as u16 can truncate. If that happens, specialization can write to the wrong slot index at runtime. Please guard the conversion and back off specialization when it does not fit.

💡 Proposed fix
             if let Some(ref descr) = cls_attr
                 && let Some(member_descr) = descr.downcast_ref::<PyMemberDescriptor>()
                 && let MemberGetter::Offset(offset) = member_descr.member.getter
             {
-                unsafe {
-                    self.code
-                        .instructions
-                        .write_cache_u32(cache_base + 1, type_version);
-                    self.code
-                        .instructions
-                        .write_cache_u16(cache_base + 3, offset as u16);
-                    self.code
-                        .instructions
-                        .replace_op(instr_idx, Instruction::StoreAttrSlot);
-                }
+                if let Ok(slot_offset) = u16::try_from(offset) {
+                    unsafe {
+                        self.code
+                            .instructions
+                            .write_cache_u32(cache_base + 1, type_version);
+                        self.code
+                            .instructions
+                            .write_cache_u16(cache_base + 3, slot_offset);
+                        self.code
+                            .instructions
+                            .replace_op(instr_idx, Instruction::StoreAttrSlot);
+                    }
+                } else {
+                    unsafe {
+                        self.code
+                            .instructions
+                            .write_adaptive_counter(cache_base, ADAPTIVE_BACKOFF_VALUE);
+                    }
+                }
             } else {

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7475 - 7486, The code currently casts
offset as u16 unconditionally in the StoreAttrSlot cache path (see cls_attr,
member_descr and MemberGetter::Offset(offset) and the call to write_cache_u16),
which can truncate large offsets; add a guard that checks whether offset fits in
u16 (e.g. offset <= u16::MAX) before taking the specialization/unsafe path and
writing the cache (write_cache_u32/write_cache_u16); if the offset does not fit,
back off the specialization and fall through to the generic store-attribute path
instead of writing a truncated offset into the cache.
♻️ Duplicate comments (2)
crates/compiler-core/src/bytecode.rs (2)

485-499: ⚠️ Potential issue | 🔴 Critical

Safe readers dereference raw pointers without bounds validation.

Lines [485-499] and [567-574] are safe methods, but they compute/deref raw pointers without checking index < len. Out-of-range calls can produce UB instead of a panic.

Proposed fix
 pub fn read_op(&self, index: usize) -> Instruction {
     let units = unsafe { &*self.0.get() };
+    assert!(index < units.len(), "read_op: index out of bounds");
     let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8;
     let byte = unsafe { &*ptr }.load(Ordering::Acquire);
     unsafe { mem::transmute::<u8, Instruction>(byte) }
 }

 pub fn read_arg(&self, index: usize) -> OpArgByte {
     let units = unsafe { &*self.0.get() };
+    assert!(index < units.len(), "read_arg: index out of bounds");
     let ptr = units.as_ptr().wrapping_add(index) as *const u8;
     let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
     OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed))
 }

 pub fn read_adaptive_counter(&self, index: usize) -> u8 {
     let units = unsafe { &*self.0.get() };
+    assert!(index < units.len(), "read_adaptive_counter: index out of bounds");
     let ptr = units.as_ptr().wrapping_add(index) as *const u8;
     let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
     unsafe { &*arg_ptr }.load(Ordering::Relaxed)
 }
#!/bin/bash
set -e
sed -n '483,500p' crates/compiler-core/src/bytecode.rs
sed -n '567,575p' crates/compiler-core/src/bytecode.rs

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

Also applies to: 567-574

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The safe reader
methods read_op and read_arg perform raw pointer arithmetic/derefs without
validating index, which can cause UB for out-of-range indices; modify both
functions to first obtain units (as now), check that index < units.len() and if
not panic with a clear message (e.g., index out of bounds) before doing any
pointer arithmetic/dereference, then proceed with the existing AtomicU8 load and
transmute/OpArgByte conversion; ensure the check uses the same units reference
used for pointer math so bounds are validated against the exact slice.

342-350: ⚠️ Potential issue | 🔴 Critical

AtomicU16 pointer casts require alignment that CodeUnit does not guarantee.

Lines [509-523] cast CodeUnit* to AtomicU16*, but CodeUnit at Lines [342-350] is #[repr(C)] with byte-sized fields, so alignment is not guaranteed to be 2. That is UB for atomic u16 accesses.

Proposed fix
-#[repr(C)]
+#[repr(C, align(2))]
 #[derive(Copy, Clone, Debug)]
 pub struct CodeUnit {
     pub op: Instruction,
     pub arg: OpArgByte,
 }

-const _: () = assert!(mem::size_of::<CodeUnit>() == 2);
+const _: () = {
+    assert!(mem::size_of::<CodeUnit>() == 2);
+    assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>());
+};
#!/bin/bash
set -e
sed -n '340,352p' crates/compiler-core/src/bytecode.rs
rg -n 'as \*const AtomicU16' crates/compiler-core/src/bytecode.rs

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

Also applies to: 509-523

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 342 - 350, The CodeUnit
struct is not guaranteed 2-byte aligned but later code casts CodeUnit pointers
to AtomicU16 (unsafe for misaligned data); fix by making CodeUnit 2-byte
aligned—add #[repr(C, align(2))] to the CodeUnit definition (keeping the
existing fields op: Instruction and arg: OpArgByte) and verify the size
assertion still holds, then confirm the atomic pointer casts around the code
that converts CodeUnit* to AtomicU16* (the casts at the section around lines
that perform "as *const AtomicU16"/"as *mut AtomicU16") are now safe due to the
enforced alignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/vm/src/frame.rs`:
- Around line 7391-7407: The specialization picks
Instruction::UnpackSequenceTwoTuple based solely on the runtime tuple length,
which can violate the original UNPACK_SEQUENCE N expectation; update
specialize_unpack_sequence to first inspect the existing instruction at
instr_idx (match on Instruction::UnpackSequence { count }) and only choose
UnpackSequenceTwoTuple when count == 2 in addition to the obj being an exact
PyTuple of length 2; keep all other paths unchanged so UNPACK_SEQUENCE N (N !=
2) still preserves the generic UnpackSequence behavior.
- Around line 7475-7486: The code currently casts offset as u16 unconditionally
in the StoreAttrSlot cache path (see cls_attr, member_descr and
MemberGetter::Offset(offset) and the call to write_cache_u16), which can
truncate large offsets; add a guard that checks whether offset fits in u16 (e.g.
offset <= u16::MAX) before taking the specialization/unsafe path and writing the
cache (write_cache_u32/write_cache_u16); if the offset does not fit, back off
the specialization and fall through to the generic store-attribute path instead
of writing a truncated offset into the cache.

---

Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 485-499: The safe reader methods read_op and read_arg perform raw
pointer arithmetic/derefs without validating index, which can cause UB for
out-of-range indices; modify both functions to first obtain units (as now),
check that index < units.len() and if not panic with a clear message (e.g.,
index out of bounds) before doing any pointer arithmetic/dereference, then
proceed with the existing AtomicU8 load and transmute/OpArgByte conversion;
ensure the check uses the same units reference used for pointer math so bounds
are validated against the exact slice.
- Around line 342-350: The CodeUnit struct is not guaranteed 2-byte aligned but
later code casts CodeUnit pointers to AtomicU16 (unsafe for misaligned data);
fix by making CodeUnit 2-byte aligned—add #[repr(C, align(2))] to the CodeUnit
definition (keeping the existing fields op: Instruction and arg: OpArgByte) and
verify the size assertion still holds, then confirm the atomic pointer casts
around the code that converts CodeUnit* to AtomicU16* (the casts at the section
around lines that perform "as *const AtomicU16"/"as *mut AtomicU16") are now
safe due to the enforced alignment.

ℹ️ Review info
Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: cdcead23-a33a-4961-b807-327211593155

📥 Commits

Reviewing files that changed from the base of the PR and between 2954b60 and 0b66b19.

📒 Files selected for processing (7)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/dict_inner.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/dict_inner.rs
  • crates/vm/src/builtins/frame.rs

- CONTAINS_OP_SET: add frozenset support in handler and specialize
- TO_BOOL_ALWAYS_TRUE: cache type version instead of checking slots
- LOAD_GLOBAL_BUILTIN: cache builtins dict version alongside globals
- mark_stacks: deoptimize specialized opcodes for correct reachability
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin specialization2

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
crates/compiler-core/src/bytecode.rs (2)

485-499: ⚠️ Potential issue | 🔴 Critical

Add bounds checks to all safe atomic readers.

read_op, read_arg, and read_adaptive_counter are safe APIs but dereference pointers derived from unchecked index. Out-of-range input is UB instead of a panic.

#!/bin/bash
set -euo pipefail
FILE="crates/compiler-core/src/bytecode.rs"

# Verify safe readers and whether they assert bounds.
rg -n "pub fn read_op|pub fn read_arg|pub fn read_adaptive_counter|assert!" "$FILE" -A8 -B2
🔧 Suggested fix
     pub fn read_op(&self, index: usize) -> Instruction {
         let units = unsafe { &*self.0.get() };
+        assert!(index < units.len(), "read_op: index out of bounds");
         let ptr = units.as_ptr().wrapping_add(index) as *const AtomicU8;
         let byte = unsafe { &*ptr }.load(Ordering::Acquire);
         // SAFETY: Only valid Instruction values are stored via replace_op/compare_exchange_op.
         unsafe { mem::transmute::<u8, Instruction>(byte) }
     }

     pub fn read_arg(&self, index: usize) -> OpArgByte {
         let units = unsafe { &*self.0.get() };
+        assert!(index < units.len(), "read_arg: index out of bounds");
         let ptr = units.as_ptr().wrapping_add(index) as *const u8;
         let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
         OpArgByte::from(unsafe { &*arg_ptr }.load(Ordering::Relaxed))
     }

     pub fn read_adaptive_counter(&self, index: usize) -> u8 {
         let units = unsafe { &*self.0.get() };
+        assert!(index < units.len(), "read_adaptive_counter: index out of bounds");
         let ptr = units.as_ptr().wrapping_add(index) as *const u8;
         let arg_ptr = unsafe { ptr.add(1) } as *const AtomicU8;
         unsafe { &*arg_ptr }.load(Ordering::Relaxed)
     }

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

Also applies to: 567-574

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 485 - 499, The safe atomic
readers (read_op, read_arg, read_adaptive_counter) currently compute pointers
from an unchecked index causing UB on out-of-range input; fix by reading units =
unsafe { &*self.0.get() } then perform explicit bounds checks (e.g.,
assert!(index < units.len()) for read_op and read_adaptive_counter, and
assert!(index + 1 < units.len()) for read_arg because it accesses the next byte)
before creating/wrapping the pointer; only after those checks compute the
ptr/arg_ptr and perform the AtomicU8 load so out-of-range indices panic instead
of producing UB.

509-510: ⚠️ Potential issue | 🔴 Critical

Guarantee 2-byte alignment before casting to AtomicU16.

The CodeUnit* -> AtomicU16* cast requires 2-byte alignment. With #[repr(C)] and two 1-byte fields, this alignment is not guaranteed, so these atomic accesses are UB-prone.

#!/bin/bash
set -euo pipefail
FILE="crates/compiler-core/src/bytecode.rs"

# Show CodeUnit layout declaration + AtomicU16 pointer casts.
rg -n "struct CodeUnit|repr\\(C|AtomicU16|size_of::<CodeUnit>|align_of::<CodeUnit>" "$FILE" -A3 -B2
🔧 Suggested fix
-#[repr(C)]
+#[repr(C, align(2))]
 #[derive(Copy, Clone, Debug)]
 pub struct CodeUnit {
     pub op: Instruction,
     pub arg: OpArgByte,
 }

-const _: () = assert!(mem::size_of::<CodeUnit>() == 2);
+const _: () = {
+    assert!(mem::size_of::<CodeUnit>() == 2);
+    assert!(mem::align_of::<CodeUnit>() >= mem::align_of::<u16>());
+};

As per coding guidelines: "Follow Rust best practices for error handling and memory management".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 509 - 510, The code casts
CodeUnit pointers to AtomicU16 pointers (units.as_ptr().wrapping_add(index) as
*const AtomicU16) but CodeUnit (with #[repr(C)] and two 1-byte fields) may not
be 2-byte aligned, causing UB; fix by guaranteeing 2-byte alignment on CodeUnit
(e.g., annotate the CodeUnit struct with #[repr(C, align(2))]) so the cast is
valid, then replace the cast with a safe pointer cast like
units.as_ptr().wrapping_add(index).cast::<AtomicU16>() and keep the unsafe {
&*ptr }.store(value, Ordering::Relaxed) usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/function.rs`:
- Around line 711-716: The update to __code__ must be atomic with JIT-cache
invalidation to avoid running stale jitted code; modify set___code__ so it
acquires the jitted_code mutex and clears the cached entry before performing
self.code.swap_to_temporary_refs, or hold the same jitted_code lock while doing
both operations so no other thread can observe cleared/uncleared state in
between. Specifically, in set___code__, lock jitted_code (the Mutex used at
*self.jitted_code.lock()), set it to None, then call
self.code.swap_to_temporary_refs(code, vm) while still holding the lock, and
finally release the lock so the swap and cache invalidation occur atomically.

In `@crates/vm/src/builtins/type.rs`:
- Around line 2171-2175: The current pre-invalidation (calling zelf.modified()
once before attribute mutation) leaves a window where a concurrent
find_name_in_mro may cache results under the new tag during the mutation; fix
this by invoking zelf.modified() both before starting the mutation and once more
immediately after finishing attribute modifications so any caches created during
the mutation are replay-invalidated; update the code path where attributes are
changed (the block containing the single zelf.modified() call) to perform a
second modified() post-mutation and ensure this behavior aligns with the
cache-checking logic in find_name_in_mro.

---

Duplicate comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 485-499: The safe atomic readers (read_op, read_arg,
read_adaptive_counter) currently compute pointers from an unchecked index
causing UB on out-of-range input; fix by reading units = unsafe { &*self.0.get()
} then perform explicit bounds checks (e.g., assert!(index < units.len()) for
read_op and read_adaptive_counter, and assert!(index + 1 < units.len()) for
read_arg because it accesses the next byte) before creating/wrapping the
pointer; only after those checks compute the ptr/arg_ptr and perform the
AtomicU8 load so out-of-range indices panic instead of producing UB.
- Around line 509-510: The code casts CodeUnit pointers to AtomicU16 pointers
(units.as_ptr().wrapping_add(index) as *const AtomicU16) but CodeUnit (with
#[repr(C)] and two 1-byte fields) may not be 2-byte aligned, causing UB; fix by
guaranteeing 2-byte alignment on CodeUnit (e.g., annotate the CodeUnit struct
with #[repr(C, align(2))]) so the cast is valid, then replace the cast with a
safe pointer cast like units.as_ptr().wrapping_add(index).cast::<AtomicU16>()
and keep the unsafe { &*ptr }.store(value, Ordering::Relaxed) usage.

ℹ️ Review info
Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3cc6b83b-aec4-4da8-a316-976f49a64f3c

📥 Commits

Reviewing files that changed from the base of the PR and between 0b66b19 and dd89151.

📒 Files selected for processing (7)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/builtins/function.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/dict_inner.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/types/slot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/dict_inner.rs

Comment on lines 711 to +716
fn set___code__(&self, code: PyRef<PyCode>, vm: &VirtualMachine) {
self.code.swap_to_temporary_refs(code, vm);
#[cfg(feature = "jit")]
{
*self.jitted_code.lock() = None;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make __code__ update atomic with JIT cache invalidation.

self.code is swapped before jitted_code is cleared. A concurrent call can run stale compiled code against the new __code__ in that window.

🔧 Suggested fix
     #[pygetset(setter)]
     fn set___code__(&self, code: PyRef<PyCode>, vm: &VirtualMachine) {
-        self.code.swap_to_temporary_refs(code, vm);
         #[cfg(feature = "jit")]
         {
-            *self.jitted_code.lock() = None;
+            let mut jitted = self.jitted_code.lock();
+            *jitted = None;
+            self.code.swap_to_temporary_refs(code, vm);
+        }
+        #[cfg(not(feature = "jit"))]
+        {
+            self.code.swap_to_temporary_refs(code, vm);
         }
         self.func_version.store(0, Relaxed);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/function.rs` around lines 711 - 716, The update to
__code__ must be atomic with JIT-cache invalidation to avoid running stale
jitted code; modify set___code__ so it acquires the jitted_code mutex and clears
the cached entry before performing self.code.swap_to_temporary_refs, or hold the
same jitted_code lock while doing both operations so no other thread can observe
cleared/uncleared state in between. Specifically, in set___code__, lock
jitted_code (the Mutex used at *self.jitted_code.lock()), set it to None, then
call self.code.swap_to_temporary_refs(code, vm) while still holding the lock,
and finally release the lock so the swap and cache invalidation occur
atomically.

Comment on lines +953 to 962
if zelf.jitted_code.lock().is_some() {
return Ok(());
}
let arg_types = jit::get_jit_arg_types(&zelf, vm)?;
let ret_type = jit::jit_ret_type(&zelf, vm)?;
let code: &Py<PyCode> = &zelf.code;
let compiled = rustpython_jit::compile(&code.code, &arg_types, ret_type)
.map_err(|err| jit::new_jit_error(err.to_string(), vm))?;
let _ = zelf.jitted_code.set(compiled);
*zelf.jitted_code.lock() = Some(compiled);
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Revalidate code identity before publishing compiled JIT code.

__jit__ does check -> compile -> store with no consistency check. If __code__ changes during compile, stale machine code can be cached and reused.

🔧 Suggested fix
     #[cfg(feature = "jit")]
     #[pymethod]
     fn __jit__(zelf: PyRef<Self>, vm: &VirtualMachine) -> PyResult<()> {
         if zelf.jitted_code.lock().is_some() {
             return Ok(());
         }
+        let code_snapshot: PyRef<PyCode> = (*zelf.code).to_owned();
+        let code_id = code_snapshot.get_id();

         let arg_types = jit::get_jit_arg_types(&zelf, vm)?;
         let ret_type = jit::jit_ret_type(&zelf, vm)?;
-        let code: &Py<PyCode> = &zelf.code;
-        let compiled = rustpython_jit::compile(&code.code, &arg_types, ret_type)
+        let compiled = rustpython_jit::compile(&code_snapshot.code, &arg_types, ret_type)
             .map_err(|err| jit::new_jit_error(err.to_string(), vm))?;
-        *zelf.jitted_code.lock() = Some(compiled);
+        let mut jitted = zelf.jitted_code.lock();
+        if jitted.is_none() && (*zelf.code).get_id() == code_id {
+            *jitted = Some(compiled);
+        }
         Ok(())
     }

Comment on lines +2171 to +2175
// Invalidate inline caches before modifying attributes.
// This ensures other threads see the version invalidation before
// any attribute changes, preventing use-after-free of cached descriptors.
zelf.modified();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Close the version-tag race introduced by pre-invalidation.

Calling zelf.modified() only before mutation allows a concurrent find_name_in_mro to assign/cache under a fresh tag during the mutation window (see Line 761-Line 777), so stale results can remain valid after the write completes.

🔧 Suggested fix
         // Invalidate inline caches before modifying attributes.
         // This ensures other threads see the version invalidation before
         // any attribute changes, preventing use-after-free of cached descriptors.
         zelf.modified();

         if let PySetterValue::Assign(value) = value {
             zelf.attributes.write().insert(attr_name, value);
         } else {
             let prev_value = zelf.attributes.write().shift_remove(attr_name); // TODO: swap_remove applicable?
             if prev_value.is_none() {
                 return Err(vm.new_attribute_error(format!(
                     "type object '{}' has no attribute '{}'",
                     zelf.name(),
                     attr_name,
                 )));
             }
         }

         if attr_name.as_wtf8().starts_with("__") && attr_name.as_wtf8().ends_with("__") {
             if assign {
                 zelf.update_slot::<true>(attr_name, &vm.ctx);
             } else {
                 zelf.update_slot::<false>(attr_name, &vm.ctx);
             }
         }
+        // Close the race: another thread may have assigned/cached a fresh
+        // version while mutation was in-flight.
+        zelf.modified();
         Ok(())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/builtins/type.rs` around lines 2171 - 2175, The current
pre-invalidation (calling zelf.modified() once before attribute mutation) leaves
a window where a concurrent find_name_in_mro may cache results under the new tag
during the mutation; fix this by invoking zelf.modified() both before starting
the mutation and once more immediately after finishing attribute modifications
so any caches created during the mutation are replay-invalidated; update the
code path where attributes are changed (the block containing the single
zelf.modified() call) to perform a second modified() post-mutation and ensure
this behavior aligns with the cache-checking logic in find_name_in_mro.

@youknowone youknowone changed the title Specialized ops 2 Specialized ops Mar 4, 2026
@youknowone youknowone merged commit c058add into RustPython:main Mar 4, 2026
13 checks passed
@youknowone youknowone deleted the specialization2 branch March 4, 2026 06:39
@youknowone youknowone changed the title Specialized ops Align specialization guards and caching with CPython Mar 4, 2026
@youknowone youknowone changed the title Align specialization guards and caching with CPython Specialized ops Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant