Optimized ListMerger by ryzhyk · Pull Request #5727 · feldera/feldera

ryzhyk · 2026-03-02T06:55:28Z

This commit applies the same optimization to ListMerge that we previously
implemented for CursorList: replace linear scan for min values at each step of
the cursor with a binary heap that keeps the cursors partially sorted and only
does O(log(n) x m) work at each step, where n is the number of cursors and m is
the number of cursors that point to the current min key or value.

Benchmark results on ListMerger benchmarks. The key range: 100 column
represents the workload with many keys and few values per key; the key range:
100000000 workload is lots of unique keys in each batch a small number of values
per key.

The key in these benchmarks is u64; the value is Tup10<u64,...,u64>.

This implementation doesn't yet contain some of the lower-level optimizations we
implemented for CursorList: replacing array indexig with get_unchecked and storing
raw pointers to keys and values instead of reading them on each access.

BEFORE

Memory-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                7.2 │                      3.0 │
│           8 │                5.0 │                      2.6 │
│          32 │                3.2 │                      2.0 │
│          64 │                2.2 │                      1.6 │
└─────────────┴────────────────────┴──────────────────────────┘

File-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                5.3 │                      2.2 │
│           8 │                4.3 │                      1.9 │
│          32 │                3.1 │                      1.7 │
│          64 │                2.4 │                      1.4 │
└─────────────┴────────────────────┴──────────────────────────┘

AFTER

Memory-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                7.4 │                      3.1 │
│           8 │                5.6 │                      2.6 │
│          32 │                4.9 │                      2.3 │
│          64 │                4.4 │                      2.1 │
└─────────────┴────────────────────┴──────────────────────────┘

File-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                5.3 │                      2.2 │
│           8 │                4.4 │                      1.9 │
│          32 │                4.0 │                      1.8 │
│          64 │                3.8 │                      1.7 │
└─────────────┴────────────────────┴──────────────────────────┘

mythical-fred

Draft looks good — clean follow-on from #5639. Approach is correct. Two design notes below.

mythical-fred · 2026-03-02T13:11:31Z

crates/dbsp/src/trace/spine_async/list_merger.rs

+
+        let heap = BinaryHeap::from_vec(std::mem::take(&mut self.key_heap), cmp);
+        self.key_heap = heap.into_vec();
+        self.update_key_heap();


init_key_heap builds a valid heap with from_vec, converts it back to a Vec, then immediately hands it to update_key_heap which wraps it again with from_vec_unchecked. Since current_key is empty at init time, update_key_heap is just doing a peek_all — the sift loop is a no-op.

Consider simplifying: build the heap in init_key_heap, call peek_all directly to populate current_key, then store the vec. Removes the awkward double-construction and makes the intent obvious.

mythical-fred · 2026-03-02T13:11:31Z

crates/dbsp/src/trace/spine_async/list_merger.rs

@@ -256,6 +356,8 @@ where
                }
                self.cursors[index].step_key();
                if !self.cursors[index].key_valid() {


After step_key(), current_val retains stale heap positions from the previous key. This is harmless today (single-cursor path never calls update_val_heap, and pos is only used by heap operations) — but it's fragile. A future reader who calls update_val_heap here would get UB from from_vec_unchecked on a stale heap.

Consider calling init_val_heap() (or at least clearing val_heap) after step_key() to keep state consistent.

blp · 2026-03-02T20:13:19Z

Fantastic performance results! Now reviewing

mihaibudiu · 2026-03-02T22:09:12Z

What are the numbers in the tables?

mihaibudiu · 2026-03-02T22:10:14Z

From the code it seems to be throughput, so higher is better

Two benchmarks to help with performance tuning: - The ListMerger is the algorithm we use to merge batches in background worker threads. We benchmark it by merging N randomly generated indexed Z-set on-disk or in-memory. - The input map benchmark simulates inges into a table with a primary key using a circuit with a single input_map operator. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

This commit applies the same optimization to ListMerge that we previously implemented for CursorList: replace linear scan for min values at each step of the cursor with a binary heap that keeps the cursors partially sorted and only does O(log(n) x m) work at each step, where n is the number of cursors and m is the number of cursors that point to the current min key or value. Benchmark results on ListMerger benchmarks. The `key range: 100` column represents the workload with many keys and few values per key; the key range: 100000000 workload is lots of unique keys in each batch a small number of values per key. The key in these benchmarks is u64; the value is Tup10<u64,...,u64>. This implementation doesn't yet contain some of the lower-level optimizations we implemented for CursorList: replacing array indexig with `get_unchecked` and storing raw pointers to keys and values instead of reading them on each access. BEFORE Memory-backed batches ┌─────────────┬────────────────────┬──────────────────────────┐ │ # Batches │ key range: 100 │ key range: 100000000 │ ├─────────────┼────────────────────┼──────────────────────────┤ │ 1 │ 7.2 │ 3.0 │ │ 8 │ 5.0 │ 2.6 │ │ 32 │ 3.2 │ 2.0 │ │ 64 │ 2.2 │ 1.6 │ └─────────────┴────────────────────┴──────────────────────────┘ File-backed batches ┌─────────────┬────────────────────┬──────────────────────────┐ │ # Batches │ key range: 100 │ key range: 100000000 │ ├─────────────┼────────────────────┼──────────────────────────┤ │ 1 │ 5.3 │ 2.2 │ │ 8 │ 4.3 │ 1.9 │ │ 32 │ 3.1 │ 1.7 │ │ 64 │ 2.4 │ 1.4 │ └─────────────┴────────────────────┴──────────────────────────┘ AFTER Memory-backed batches ┌─────────────┬────────────────────┬──────────────────────────┐ │ # Batches │ key range: 100 │ key range: 100000000 │ ├─────────────┼────────────────────┼──────────────────────────┤ │ 1 │ 7.4 │ 3.1 │ │ 8 │ 5.6 │ 2.6 │ │ 32 │ 4.9 │ 2.3 │ │ 64 │ 4.4 │ 2.1 │ └─────────────┴────────────────────┴──────────────────────────┘ File-backed batches ┌─────────────┬────────────────────┬──────────────────────────┐ │ # Batches │ key range: 100 │ key range: 100000000 │ ├─────────────┼────────────────────┼──────────────────────────┤ │ 1 │ 5.3 │ 2.2 │ │ 8 │ 4.4 │ 1.9 │ │ 32 │ 4.0 │ 1.8 │ │ 64 │ 3.8 │ 1.7 │ └─────────────┴────────────────────┴──────────────────────────┘ Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

ryzhyk · 2026-03-03T17:31:36Z

From the code it seems to be throughput, so higher is better

Yes, throughput, forgot to mention.

ryzhyk requested review from blp and gz March 2, 2026 06:55

ryzhyk added the DBSP core Related to the core DBSP library label Mar 2, 2026

gz requested a review from mythical-fred March 2, 2026 09:44

mythical-fred reviewed Mar 2, 2026

View reviewed changes

blp approved these changes Mar 2, 2026

View reviewed changes

gz approved these changes Mar 2, 2026

View reviewed changes

ryzhyk force-pushed the cursor-optimizations branch from 7c9a999 to e2012a8 Compare March 3, 2026 07:16

Base automatically changed from cursor-optimizations to main March 3, 2026 11:13

ryzhyk added 2 commits March 3, 2026 09:29

ryzhyk force-pushed the list_merger_optimize branch from 7c7cd41 to eaa6548 Compare March 3, 2026 17:30

ryzhyk marked this pull request as ready for review March 3, 2026 17:30

ryzhyk enabled auto-merge March 3, 2026 17:32

ryzhyk added this pull request to the merge queue Mar 3, 2026

Merged via the queue into main with commit 122bf41 Mar 3, 2026
1 check passed

ryzhyk deleted the list_merger_optimize branch March 3, 2026 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized ListMerger#5727

Optimized ListMerger#5727
ryzhyk merged 2 commits intomainfrom
list_merger_optimize

ryzhyk commented Mar 2, 2026

Uh oh!

mythical-fred left a comment

Uh oh!

mythical-fred Mar 2, 2026

Uh oh!

mythical-fred Mar 2, 2026

Uh oh!

blp commented Mar 2, 2026

Uh oh!

mihaibudiu commented Mar 2, 2026

Uh oh!

mihaibudiu commented Mar 2, 2026

Uh oh!

ryzhyk commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ryzhyk commented Mar 2, 2026

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

mythical-fred Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mythical-fred Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

blp commented Mar 2, 2026

Uh oh!

mihaibudiu commented Mar 2, 2026

Uh oh!

mihaibudiu commented Mar 2, 2026

Uh oh!

ryzhyk commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants