[SQL] Unused field analysis for aggregates by mihaibudiu · Pull Request #5601 · feldera/feldera

mihaibudiu · 2026-02-11T05:04:25Z

mihaibudiu · 2026-02-11T05:25:48Z

Interestingly, this pass actually discovers unused aggregate functions which are introduced by the Calcite decorrelator!

mihaibudiu · 2026-02-12T00:02:30Z

Calcite compiles the following query:

create materialized view q4
                as select
                        o_orderpriority,
                        count(*) as order_count
                from
                        orders
                where
                        o_orderdate >= date '1993-07-01'
                        and o_orderdate < date '1993-07-01' + interval '3' month
                        and exists (
                                select
                                        *
                                from
                                        lineitem
                                where
                                        l_orderkey = o_orderkey
                                        and l_commitdate < l_receiptdate
                        )
                group by
                        o_orderpriority

To a plan containing the following fragment:

              LogicalJoin(condition=[=($0, $9)], joinType=[inner]), id = 574
                LogicalFilter(condition=[SEARCH($4, Sarg[[1993-07-01..1993-10-01)])]), id = 552
                  LogicalTableScan(table=[[schema, orders]]), id = 68
                LogicalAggregate(group=[{0}], agg#0=[MIN($1)]), id = 559
                  LogicalProject(l_orderkey=[$0], $f0=[true]), id = 557
                    LogicalFilter(condition=[<($11, $12)]), id = 555
                      LogicalTableScan(table=[[schema, lineitem]]), id = 70

By using a MIN(true) aggregate to figure out whether the EXISTS subquery produces any rows. Turns out that the result of the MIN is never used, and this new optimization can actually completely remove it, leaving an aggregate... which does nothing. This is implemented much more efficiently by using a linear aggregate, which returns Tup0 if there is any value in the collection, or nothing otherwise.

mihaibudiu · 2026-02-12T01:25:59Z

The analysis can also discover that the following SUM aggregate is not used, since it's used only for ORDER BY, which is ignored by default:

                select deptno
                from emp
                group by deptno
                order by sum(sal) filter (where job = 'CLERK')

mihaibudiu · 2026-02-12T01:26:49Z

However, if an aggregate is evaluated just for side-effects (i.e., crash on overflow), removing aggregates is not sound.

mythical-fred

Two hard-block issues here, plus one clear test bug.

1. No manual testing description. The PR body contains only "Fixes #5541" with no description of what manual testing was done. Per the team's process, every PR must describe the manual testing steps taken.

2. Soundness concern is unhandled (see inline comment on processAggregate). You flagged this yourself in the PR comments — but the code doesn't address it. Until there is either a fix or an explicit decision that Feldera's semantics don't include overflow-crash guarantees (with that documented in a code comment), this is an open correctness concern that should block merge.

3. Copy-paste bug in issue5541b (see inline comment). The expected output uses column name a but the view produces min.

Minor: the commit message has no body. For a change of this scope — renaming OptimizeMaps → OptimizeProjections, adding aggregate elimination, introducing empty-aggregate support — a body explaining the why (e.g., decorrelator-introduced MIN(true) aggregates, ORDER BY on unreferenced aggregates) would help future readers understand the change in isolation. Please use git rebase -i to add one before merge.

The optimization itself — particularly the "filler" MapIndex that restores the original output shape while skipping the removed computation, and the empty-aggregate → linear-aggregate path — is elegant. The fanout guards are correctly placed.

...compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/OptimizeProjections.java

...compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/aggregate/DBSPAggregateList.java

...er/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/simple/Regression2Tests.java

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

blp · 2026-02-27T21:56:33Z

crates/sqllib/src/lib.rs

+// Semigroup for the an aggregate which computes nothing
+// Useful when the compiler removes all aggregates from an aggregate operator
+// (This is currently never used, because the compiler generates a linear aggregate
+// for this case)
+#[derive(Clone)]
+#[doc(hidden)]
+pub struct EmptySemigroup;


Why add it if it will never be used?

But I do see a reference later on in the commit.

Yes, that code in the compiler is unreachable, but I tested it with this implementation.
I have spent time thinking about this, so I decided to keep the implementation - maybe it will be necessary in the future?

It does no harm 👍

Well, it does make compilation slower...

mihaibudiu marked this pull request as draft February 11, 2026 05:25

mihaibudiu force-pushed the issue5541 branch from 71af2a8 to cba90af Compare February 11, 2026 06:09

mihaibudiu marked this pull request as ready for review February 11, 2026 06:10

mihaibudiu marked this pull request as draft February 11, 2026 06:33

mihaibudiu force-pushed the issue5541 branch from cba90af to bcd07b4 Compare February 11, 2026 23:49

mihaibudiu marked this pull request as ready for review February 11, 2026 23:50

mihaibudiu force-pushed the issue5541 branch 2 times, most recently from 887e62d to 03616ff Compare February 11, 2026 23:56

mihaibudiu marked this pull request as draft February 12, 2026 00:01

mihaibudiu force-pushed the issue5541 branch from 03616ff to d7551db Compare February 12, 2026 00:51

mihaibudiu marked this pull request as ready for review February 12, 2026 00:51

mihaibudiu force-pushed the issue5541 branch from d7551db to 0e3c38e Compare February 12, 2026 00:52

mythical-fred suggested changes Feb 23, 2026

View reviewed changes

[SQL] Unused field analysis for aggregates

8b44513

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

mihaibudiu force-pushed the issue5541 branch from 0e3c38e to 8b44513 Compare February 27, 2026 19:53

blp approved these changes Feb 27, 2026

View reviewed changes

mihaibudiu added this pull request to the merge queue Feb 27, 2026

Merged via the queue into main with commit 4842324 Feb 28, 2026
9 checks passed

mihaibudiu deleted the issue5541 branch February 28, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SQL] Unused field analysis for aggregates#5601

[SQL] Unused field analysis for aggregates#5601
mihaibudiu merged 1 commit intomainfrom
issue5541

mihaibudiu commented Feb 11, 2026

Uh oh!

mihaibudiu commented Feb 11, 2026

Uh oh!

mihaibudiu commented Feb 12, 2026

Uh oh!

mihaibudiu commented Feb 12, 2026

Uh oh!

mihaibudiu commented Feb 12, 2026

Uh oh!

mythical-fred left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blp Feb 27, 2026

Uh oh!

mihaibudiu Feb 27, 2026

Uh oh!

blp Feb 27, 2026

Uh oh!

mihaibudiu Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mihaibudiu commented Feb 11, 2026

Uh oh!

mihaibudiu commented Feb 11, 2026

Uh oh!

mihaibudiu commented Feb 12, 2026

Uh oh!

mihaibudiu commented Feb 12, 2026

Uh oh!

mihaibudiu commented Feb 12, 2026

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blp Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

blp Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants