beam

Running Nexmark Benchmarks with Apache Beam

Apache Beam is a layer over other data processing systems with multiple backends ("runners"). It includes its own implementation of the Nexmark benchmarks. These instructions document how to run them for four of its runners:

Direct: This is built into Beam. This is meant for checking correctness and is not optimized for performance.
Flink
Spark
Google Cloud Dataflow

Each runner supports both batch and streaming processing. In addition, Beam's Nexmark suite includes an implementation of each query in as many as three forms. All of them are implemented using the Beam native representation in terms of transforms and aggregations. Some of them are also implemented in two forms of SQL: "standard" SQL and ZetaSQL.

For a Beam overview, see https://beam.apache.org/get-started/beam-overview/. For information about the Nexmark benchmark for Beam, see https://beam.apache.org/documentation/sdks/java/testing/nexmark/.

Prerequisites

Install the Java Development Kit. These instructions were tested with OpenJDK 21.0.2.

Setting up Beam

You can follow the instructions below to build Nexmark, or run setup.sh in this directory.

Clone the Beam repository:

git clone https://github.com/apache/beam.git

If you wish to benchmark a particular version, check it out:

(cd beam && git checkout origin/release-2.55.0)

Apply configurable-spark-master.patch:

(cd beam && git am < ../configurable-spark-master.patch)

There's no need to run a separate build step. Beam will build the first time you run a benchmark.

Running the benchmarks

Use run-nexmark.sh in the parent directory.

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
README.md		README.md
configurable-spark-master.patch		configurable-spark-master.patch
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Running Nexmark Benchmarks with Apache Beam

Prerequisites

Setting up Beam

Running the benchmarks

FilesExpand file tree

beam

Directory actions

More options

Directory actions

More options

Latest commit

History

beam

Folders and files

parent directory

README.md

Running Nexmark Benchmarks with Apache Beam

Prerequisites

Setting up Beam

Running the benchmarks