[adapters] Upgrade delta and iceberg dependencies.#5682
Conversation
ac2c03a to
2ba39e5
Compare
| // https://docs.rs/parquet/50.0.0/src/parquet/record/api.rs.html#858 | ||
| // the right way is probably to use serde_arrow for deserialization and serialization | ||
| timestamp_format: TimestampFormat::String("%Y-%m-%d %H:%M:%S %:z"), // 2023-11-04 15:33:47 +00:00 | ||
| timestamp_format: TimestampFormat::String("%Y-%m-%d %H:%M:%S.%f %:z"), // 2023-11-04 15:33:47.123 +00:00 |
There was a problem hiding this comment.
Seems safe. It looks like the parquet crate now produces this format when converting records to JSON. In any case, this is only used in testing.
| tokio = { workspace = true, features = ["sync", "macros", "fs", "rt"] } | ||
| utoipa = { workspace = true } | ||
| chrono = { workspace = true, features = ["rkyv-64", "serde"] } | ||
| chrono = { workspace = true, features = ["serde"] } |
80f9428 to
71d9fdb
Compare
71d9fdb to
a0ef7d7
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
Two blockers: a correctness bug in the Glue catalog path and a commit message that doesn't describe most of what's in the commit.
Commit message quality: The single commit in this PR is titled Disable chrono::rkyv feature. with a body that only explains the chrono/datafusion bug fix. But the commit contains the full delta-rs 0.26.2→0.30.2 / iceberg 0.5.1→0.8.0 / arrow 55→57 / datafusion 47→51 upgrade plus the adapter code rewrites — none of which appear in the message. Anyone running git log later will have no idea this commit contains a major dependency upgrade. Feldera has linear history on main; the commit message is the only window into why something changed. Please either split this into two commits (upgrade + chrono fix) or update the message to describe both changes.
crates/iceberg/src/input.rs
Outdated
|
|
||
| let mut props = self.config.fileio_config.clone(); | ||
| if let Some(endpoint) = self.config.glue_catalog_config.endpoint.as_ref() { | ||
| props.insert(GLUE_CATALOG_PROP_CATALOG_ID.to_string(), endpoint.clone()); |
There was a problem hiding this comment.
Copy-paste bug: endpoint is being stored under GLUE_CATALOG_PROP_CATALOG_ID instead of GLUE_CATALOG_PROP_URI. Consequences:
- If both
idandendpointare configured, the endpoint value silently overwrites the catalog ID inprops(the ID is lost). - When only
endpointis set, it ends up under the wrong key and the Glue SDK ignores it — the custom endpoint URL is silently discarded.
GLUE_CATALOG_PROP_URI is exported by iceberg-catalog-glue (it's pub const GLUE_CATALOG_PROP_URI: &str = "uri" in catalog.rs). Import it alongside GLUE_CATALOG_PROP_CATALOG_ID and use it here:
if let Some(endpoint) = self.config.glue_catalog_config.endpoint.as_ref() {
props.insert(GLUE_CATALOG_PROP_URI.to_string(), endpoint.clone());
}Upgrade deltalake and related dependencies:
delta-rs 0.26.2 -> 0.30.2
iceberg 0.5.1 -> 0.8.0
arrow 55 -> 57
datafusion 47 -> 51
The new version of delta-rs is based on delta-kernel and has some features
missing in 0.26, such as support for v2 checkpoints and deletion vectors. The
latter will require additional work on the connector to support deletion
vectors in follow and CDC modes, but they should work out of the box in the
snapshot mode.
Overall we are hoping this version will be more reliable and performant.
As part of the upgrade, I disabled the `chrono::rkyv` feature.
The latest version of datafusion resurfaced this bug
apache/datafusion#14862.
The bug is triggered by the chrono crate when it is compiled with the rkyv
feature enabled. It turns out that we no longer need this feature, except in
the DBSP tutorial. This commit removes the feature and modifies the tutorial to
use an integer that represents the number of days since epoch instead of
chrono::NaiveDate.
Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
a0ef7d7 to
a40e748
Compare
|
@mythical-fred , re-review |
mythical-fred
left a comment
There was a problem hiding this comment.
Both blockers addressed:
- Glue catalog endpoint now correctly inserted under
GLUE_CATALOG_PROP_URI; catalog ID underGLUE_CATALOG_PROP_CATALOG_ID. Both constants imported. - Commit message now properly describes the full upgrade scope.
LGTM.
Upgrade deltalake and related dependencies:
The new version of delta-rs is based on delta-kernel and has some features
missing in 0.26, such as support for v2 checkpoints and deletion vectors. The
latter will require additional work on the connector to support deletion
vectors in follow and CDC modes, but they should work out of the box in the
snapshot mode.
Overall we are hoping this version will be more reliable and performant.
Describe Manual Test Plan
Checklist
Breaking Changes?
Mark if you think the answer is yes for any of these components:
Describe Incompatible Changes