-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What would you like to happen?
I see that From Apache Beam 2.66.0, after this PR , one can apply "filter" on a Iceberg ManagedIO Read Operation.
For ex- If i have 2 columns named day and ingestedAt, i can write something like this and pass as a map to my Iceberg ManagedIO.read() transform.
filter: "\"day\" = '20260109' AND \"ingestedAt\" = '2026-01-09T15:08:38.122'"
What i also want to know is, if its possible to apply this kind of a filter on a nested column (column present inside a struct type). For ex, if i had a column named eventInfo whose type is struct and it had a column called zone inside it, I wanted to know if i could apply a predicate pushdown filter like
filter: "\"day\" = '20260109' AND \"ingestedAt\" = '2026-01-09T15:08:38.122' AND \"eventInfo.\".\"zone\" = 'abc'"
I tried running an example of this sort, where my column name was data and i saw this exception -
Caused by: java.lang.IllegalArgumentException: Unsupported filter type in field 'data': STRUCT at org.apache.beam.sdk.io.iceberg.FilterUtils.convertLiteral(FilterUtils.java:390) at org.apache.beam.sdk.io.iceberg.FilterUtils.convertFieldAndLiteral(FilterUtils.java:320) at org.apache.beam.sdk.io.iceberg.FilterUtils.convertFieldAndLiteral(FilterUtils.java:305) at org.apache.beam.sdk.io.iceberg.FilterUtils.convert(FilterUtils.java:197) at org.apache.beam.sdk.io.iceberg.FilterUtils.convertLogicalExpr(FilterUtils.java:271) at org.apache.beam.sdk.io.iceberg.FilterUtils.convert(FilterUtils.java:227) at org.apache.beam.sdk.io.iceberg.FilterUtils.convert(FilterUtils.java:148) ... 41 more
26/01/12 23:24:40 ERROR PipelineException: Invalid resource error: Pipeline execution failed: Failed to create partitions for source ScanSource java.lang.RuntimeException: Failed to create partitions for source ScanSource at org.apache.beam.runners.spark.io.SourceRDD$Bounded.getPartitions(SourceRDD.java:128) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:291)
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner