feat: aggregate nbrs api by SemyonSinchenko · Pull Request #792 · graphframes/graphframes

SemyonSinchenko · 2026-02-03T21:08:14Z

What changes were proposed in this pull request?

New API as described in #785

Why are the changes needed?

Close #785

core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala

core/src/main/scala/org/graphframes/GraphFrame.scala

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala

SemyonSinchenko · 2026-02-08T20:35:11Z

Hi @james-willis ! I addressed most of your comments. For some I left my answers. It looks like we are close to a final version. Could be nice if you can take another look, so I can work on bindings and docs.

see scaladoc issue

codecov-commenter · 2026-02-08T21:07:11Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 70.30303% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.28%. Comparing base (d36a5ac) to head (929a28f).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
...park/sql/graphframes/GraphFramesConnectUtils.scala	0.00%	36 Missing ⚠️
...scala/org/graphframes/lib/AggregateNeighbors.scala	89.84%	13 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #792      +/-   ##
==========================================
- Coverage   84.90%   84.28%   -0.62%     
==========================================
  Files          68       69       +1     
  Lines        3444     3672     +228     
  Branches      431      459      +28     
==========================================
+ Hits         2924     3095     +171     
- Misses        520      577      +57

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SemyonSinchenko · 2026-02-11T10:28:06Z

I will work on tests, bindings and docs.

SemyonSinchenko · 2026-02-26T16:31:01Z

@james-willis this one is ready for review

james-willis · 2026-02-28T20:30:28Z

Looking at this PR, here are my recommendations:

Code Quality & Performance

Memory Management Concerns (AggregateNeighbors.scala:267-268):
The hardcoded warning for maxHops > 10 should be configurable or include memory estimation logic rather than an arbitrary threshold.
Potential Optimization (AggregateNeighbors.scala:303):
The repartition by SRC could be expensive for smaller datasets. Consider making this optional or using coalesce when appropriate.

API Design

Required Parameter Validation:
The API requires either stoppingCondition or targetCondition but this isn't enforced until run(). Consider validation during builder setup for better UX.
Default Accumulator:
Most use cases will want basic path tracking. Consider adding a convenience method for common path accumulation patterns.

Testing & Documentation

Edge Case Testing: Add tests for:
- Very large graphs (performance characteristics)
- Disconnected components
- Graphs with cycles under different stopping conditions
Python API Consistency:
Ensure Python parameter names follow conventions consistently.

Implementation Robustness

Convergence Logic (AggregateNeighbors.scala:400):
Currently only checks if frontier is empty, but doesn't handle cases where states exist but no new edges can be traversed.

Overall, this is a solid implementation that follows GraphFrames patterns well. The main areas for improvement are around performance optimization configuration and edge case handling. The comprehensive test suite and documentation are excellent.

Comment by Claude (AI Assistant)

SemyonSinchenko · 2026-03-02T11:02:05Z

@james-willis

It is a warning, I don't understand how the warning can be configurable.
That is false. The join mean repartition by src anyway, so even on a single step it is better to keep the repartition
It is impossible to achieve, because we do not know the order of setter-calls
I do not understand, what does it mean? If it is about adding implementations, I'm going to do it in follow-up PRs. I'm going to add at least allPaths to find all paths between two nodes
I'm not going to add huge graphs (performance testing) to unit tests. All tests are 10+ minutes already, do we want to have 30 minutes tests?
All the Python API follows the snake_case way
In this case the DataFrame will be empty, so I don't fully understand the comment...

SemyonSinchenko added 6 commits February 2, 2026 21:22

wip

f507233

Merge remote-tracking branch 'graphframes/main' into 785-aggregate-nbrs

25d8e99

wip

d0c0f09

wip

cc2e0f1

wip

f333462

working API

c9e9f63

SemyonSinchenko self-assigned this Feb 3, 2026

SemyonSinchenko added scala pyspark-classic GraphFrames on PySpark Classic pyspark-connect GraphFrames on PySpark Connect labels Feb 3, 2026

SemyonSinchenko mentioned this pull request Feb 3, 2026

feat: all paths between two nodes #200

Open

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala Outdated Show resolved Hide resolved

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala Outdated Show resolved Hide resolved

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala Outdated Show resolved Hide resolved

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala Show resolved Hide resolved

james-willis requested changes Feb 4, 2026

View reviewed changes

SemyonSinchenko added 2 commits February 8, 2026 21:29

from comments

ca1e80a

from comments 2

8b62eb5

SemyonSinchenko requested a review from james-willis February 8, 2026 20:34

fix docs

7e1f9e6

see scaladoc issue

SemyonSinchenko added 3 commits February 26, 2026 13:53

Merge remote-tracking branch 'graphframes/main' into 785-aggregate-nbrs

b618d8e

feat: add AggregateNeighbors functionality with comprehensive test suite

e67285d

feat: add Aggregate Neighbors algorithm with multi-hop traversal support

929a28f

SemyonSinchenko marked this pull request as ready for review February 26, 2026 15:49

test: remove redundant error handling test for aggregate_neighbors

0118cb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: aggregate nbrs api#792

feat: aggregate nbrs api#792
SemyonSinchenko wants to merge 13 commits intographframes:mainfrom
SemyonSinchenko:785-aggregate-nbrs

SemyonSinchenko commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SemyonSinchenko commented Feb 8, 2026

Uh oh!

codecov-commenter commented Feb 8, 2026 •

edited

Loading

Uh oh!

SemyonSinchenko commented Feb 11, 2026

Uh oh!

SemyonSinchenko commented Feb 26, 2026

Uh oh!

james-willis commented Feb 28, 2026

Uh oh!

SemyonSinchenko commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SemyonSinchenko commented Feb 3, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SemyonSinchenko commented Feb 8, 2026

Uh oh!

codecov-commenter commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SemyonSinchenko commented Feb 11, 2026

Uh oh!

SemyonSinchenko commented Feb 26, 2026

Uh oh!

james-willis commented Feb 28, 2026

Code Quality & Performance

API Design

Testing & Documentation

Implementation Robustness

Uh oh!

SemyonSinchenko commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Feb 8, 2026 •

edited

Loading