Skip to content

Optimized ListMerger#5727

Merged
ryzhyk merged 2 commits intomainfrom
list_merger_optimize
Mar 3, 2026
Merged

Optimized ListMerger#5727
ryzhyk merged 2 commits intomainfrom
list_merger_optimize

Conversation

@ryzhyk
Copy link
Contributor

@ryzhyk ryzhyk commented Mar 2, 2026

This commit applies the same optimization to ListMerge that we previously
implemented for CursorList: replace linear scan for min values at each step of
the cursor with a binary heap that keeps the cursors partially sorted and only
does O(log(n) x m) work at each step, where n is the number of cursors and m is
the number of cursors that point to the current min key or value.

Benchmark results on ListMerger benchmarks. The key range: 100 column
represents the workload with many keys and few values per key; the key range:
100000000 workload is lots of unique keys in each batch a small number of values
per key.

The key in these benchmarks is u64; the value is Tup10<u64,...,u64>.

This implementation doesn't yet contain some of the lower-level optimizations we
implemented for CursorList: replacing array indexig with get_unchecked and storing
raw pointers to keys and values instead of reading them on each access.

BEFORE

Memory-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                7.2 │                      3.0 │
│           8 │                5.0 │                      2.6 │
│          32 │                3.2 │                      2.0 │
│          64 │                2.2 │                      1.6 │
└─────────────┴────────────────────┴──────────────────────────┘

File-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                5.3 │                      2.2 │
│           8 │                4.3 │                      1.9 │
│          32 │                3.1 │                      1.7 │
│          64 │                2.4 │                      1.4 │
└─────────────┴────────────────────┴──────────────────────────┘

AFTER

Memory-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                7.4 │                      3.1 │
│           8 │                5.6 │                      2.6 │
│          32 │                4.9 │                      2.3 │
│          64 │                4.4 │                      2.1 │
└─────────────┴────────────────────┴──────────────────────────┘

File-backed batches

┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                5.3 │                      2.2 │
│           8 │                4.4 │                      1.9 │
│          32 │                4.0 │                      1.8 │
│          64 │                3.8 │                      1.7 │
└─────────────┴────────────────────┴──────────────────────────┘

@ryzhyk ryzhyk requested review from blp and gz March 2, 2026 06:55
@ryzhyk ryzhyk added the DBSP core Related to the core DBSP library label Mar 2, 2026
@gz gz requested a review from mythical-fred March 2, 2026 09:44
Copy link
Collaborator

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft looks good — clean follow-on from #5639. Approach is correct. Two design notes below.


let heap = BinaryHeap::from_vec(std::mem::take(&mut self.key_heap), cmp);
self.key_heap = heap.into_vec();
self.update_key_heap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_key_heap builds a valid heap with from_vec, converts it back to a Vec, then immediately hands it to update_key_heap which wraps it again with from_vec_unchecked. Since current_key is empty at init time, update_key_heap is just doing a peek_all — the sift loop is a no-op.

Consider simplifying: build the heap in init_key_heap, call peek_all directly to populate current_key, then store the vec. Removes the awkward double-construction and makes the intent obvious.

@@ -256,6 +356,8 @@ where
}
self.cursors[index].step_key();
if !self.cursors[index].key_valid() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After step_key(), current_val retains stale heap positions from the previous key. This is harmless today (single-cursor path never calls update_val_heap, and pos is only used by heap operations) — but it's fragile. A future reader who calls update_val_heap here would get UB from from_vec_unchecked on a stale heap.

Consider calling init_val_heap() (or at least clearing val_heap) after step_key() to keep state consistent.

@blp
Copy link
Member

blp commented Mar 2, 2026

Fantastic performance results! Now reviewing

@mihaibudiu
Copy link
Contributor

What are the numbers in the tables?

@mihaibudiu
Copy link
Contributor

From the code it seems to be throughput, so higher is better

@ryzhyk ryzhyk force-pushed the cursor-optimizations branch from 7c9a999 to e2012a8 Compare March 3, 2026 07:16
Base automatically changed from cursor-optimizations to main March 3, 2026 11:13
ryzhyk added 2 commits March 3, 2026 09:29
Two benchmarks to help with performance tuning:
- The ListMerger is the algorithm we use to merge batches in background worker threads.
  We benchmark it by merging N randomly generated indexed Z-set on-disk or in-memory.
- The input map benchmark simulates inges into a table with a primary key using a
  circuit with a single input_map operator.

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
This commit applies the same optimization to ListMerge that we previously
implemented for CursorList: replace linear scan for min values at each step of
the cursor with a binary heap that keeps the cursors partially sorted and only
does O(log(n) x m) work at each step, where n is the number of cursors and m is
the number of cursors that point to the current min key or value.

Benchmark results on ListMerger benchmarks. The `key range: 100` column
represents the workload with many keys and few values per key; the key range:
100000000 workload is lots of unique keys in each batch a small number of values
per key.

The key in these benchmarks is u64; the value is Tup10<u64,...,u64>.

This implementation doesn't yet contain some of the lower-level optimizations we
implemented for CursorList: replacing array indexig with `get_unchecked` and storing
raw pointers to keys and values instead of reading them on each access.

BEFORE

Memory-backed batches
┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                7.2 │                      3.0 │
│           8 │                5.0 │                      2.6 │
│          32 │                3.2 │                      2.0 │
│          64 │                2.2 │                      1.6 │
└─────────────┴────────────────────┴──────────────────────────┘
File-backed batches
┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                5.3 │                      2.2 │
│           8 │                4.3 │                      1.9 │
│          32 │                3.1 │                      1.7 │
│          64 │                2.4 │                      1.4 │
└─────────────┴────────────────────┴──────────────────────────┘

AFTER

Memory-backed batches
┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                7.4 │                      3.1 │
│           8 │                5.6 │                      2.6 │
│          32 │                4.9 │                      2.3 │
│          64 │                4.4 │                      2.1 │
└─────────────┴────────────────────┴──────────────────────────┘
File-backed batches
┌─────────────┬────────────────────┬──────────────────────────┐
│ # Batches   │ key range: 100     │ key range: 100000000     │
├─────────────┼────────────────────┼──────────────────────────┤
│           1 │                5.3 │                      2.2 │
│           8 │                4.4 │                      1.9 │
│          32 │                4.0 │                      1.8 │
│          64 │                3.8 │                      1.7 │
└─────────────┴────────────────────┴──────────────────────────┘

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
@ryzhyk ryzhyk force-pushed the list_merger_optimize branch from 7c7cd41 to eaa6548 Compare March 3, 2026 17:30
@ryzhyk ryzhyk marked this pull request as ready for review March 3, 2026 17:30
@ryzhyk
Copy link
Contributor Author

ryzhyk commented Mar 3, 2026

From the code it seems to be throughput, so higher is better

Yes, throughput, forgot to mention.

@ryzhyk ryzhyk enabled auto-merge March 3, 2026 17:32
@ryzhyk ryzhyk added this pull request to the merge queue Mar 3, 2026
@ryzhyk ryzhyk added this pull request to the merge queue Mar 3, 2026
Merged via the queue into main with commit 122bf41 Mar 3, 2026
1 check passed
@ryzhyk ryzhyk deleted the list_merger_optimize branch March 3, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DBSP core Related to the core DBSP library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants