[SQL] Unused field analysis for aggregates#5601
Conversation
|
Interestingly, this pass actually discovers unused aggregate functions which are introduced by the Calcite decorrelator! |
71af2a8 to
cba90af
Compare
cba90af to
bcd07b4
Compare
887e62d to
03616ff
Compare
|
Calcite compiles the following query: To a plan containing the following fragment: By using a |
03616ff to
d7551db
Compare
d7551db to
0e3c38e
Compare
|
The analysis can also discover that the following SUM aggregate is not used, since it's used only for ORDER BY, which is ignored by default: |
|
However, if an aggregate is evaluated just for side-effects (i.e., crash on overflow), removing aggregates is not sound. |
mythical-fred
left a comment
There was a problem hiding this comment.
Two hard-block issues here, plus one clear test bug.
1. No manual testing description. The PR body contains only "Fixes #5541" with no description of what manual testing was done. Per the team's process, every PR must describe the manual testing steps taken.
2. Soundness concern is unhandled (see inline comment on processAggregate). You flagged this yourself in the PR comments — but the code doesn't address it. Until there is either a fix or an explicit decision that Feldera's semantics don't include overflow-crash guarantees (with that documented in a code comment), this is an open correctness concern that should block merge.
3. Copy-paste bug in issue5541b (see inline comment). The expected output uses column name a but the view produces min.
Minor: the commit message has no body. For a change of this scope — renaming OptimizeMaps → OptimizeProjections, adding aggregate elimination, introducing empty-aggregate support — a body explaining the why (e.g., decorrelator-introduced MIN(true) aggregates, ORDER BY on unreferenced aggregates) would help future readers understand the change in isolation. Please use git rebase -i to add one before merge.
The optimization itself — particularly the "filler" MapIndex that restores the original output shape while skipping the removed computation, and the empty-aggregate → linear-aggregate path — is elegant. The fanout guards are correctly placed.
...compiler/src/main/java/org/dbsp/sqlCompiler/compiler/visitors/outer/OptimizeProjections.java
Show resolved
Hide resolved
...compiler/SQL-compiler/src/main/java/org/dbsp/sqlCompiler/ir/aggregate/DBSPAggregateList.java
Outdated
Show resolved
Hide resolved
...er/SQL-compiler/src/test/java/org/dbsp/sqlCompiler/compiler/sql/simple/Regression2Tests.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
0e3c38e to
8b44513
Compare
| // Semigroup for the an aggregate which computes nothing | ||
| // Useful when the compiler removes all aggregates from an aggregate operator | ||
| // (This is currently never used, because the compiler generates a linear aggregate | ||
| // for this case) | ||
| #[derive(Clone)] | ||
| #[doc(hidden)] | ||
| pub struct EmptySemigroup; |
There was a problem hiding this comment.
Why add it if it will never be used?
But I do see a reference later on in the commit.
There was a problem hiding this comment.
Yes, that code in the compiler is unreachable, but I tested it with this implementation.
I have spent time thinking about this, so I decided to keep the implementation - maybe it will be necessary in the future?
There was a problem hiding this comment.
Well, it does make compilation slower...
Fixes #5541