NullPointerException when querying Iceberg table after expiring snapshots

When querying an Iceberg Table right after expiring all but the latest snapshot, before table metadata is refreshed by Dremio, Dremio throws a NullPointerException

ALTER TABLE xxx REFRESH METADATA fixes the problem

      SYSTEM ERROR: NullPointerException

SqlOperatorImpl ICEBERG_SUB_SCAN
Location 2:0:3
SqlOperatorImpl ICEBERG_SUB_SCAN
Location 2:0:3
Fragment 2:0

[Error Id: bc4de5ae-4740-4517-ad4f-2d65635d46d5 on big-data-dev-dremio-2:0]

  (java.lang.NullPointerException) null
    com.dremio.exec.store.iceberg.DremioInputFile.newStream():98
    org.apache.iceberg.avro.AvroIterable.newFileReader():101
    org.apache.iceberg.avro.AvroIterable.iterator():77
    org.apache.iceberg.avro.AvroIterable.iterator():37
    org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll():320
    org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList():237
    org.apache.iceberg.ManifestLists.read():46
    org.apache.iceberg.BaseSnapshot.cacheManifests():142
    org.apache.iceberg.BaseSnapshot.dataManifests():164
    com.dremio.exec.store.iceberg.IcebergManifestListRecordReader.setup():171
    com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():311
    com.dremio.sabot.op.scan.ScanOperator.setupReader():302
    com.dremio.sabot.op.scan.ScanOperator.setup():266
    com.dremio.sabot.driver.SmartOp$SmartProducer.setup():569
    com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():79
    com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():63
    com.dremio.sabot.driver.SmartOp$SmartProducer.accept():539
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.StraightPipe.setup():102
    com.dremio.sabot.driver.Pipeline.setup():69
    com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():478
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():327
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
    com.dremio.sabot.task.slicing.SlicingThread.run():171   

It’s possible the query plan cache got out of sync with the Iceberg metadata stored in the filesystem. You can try disabling the plan cache with the support key “planner.query_plan_cache_enabled” A bunch of query plan and materialization cache fixes are coming in v23.

That is good to know. I look forward to testing out the next release when it is available.

My main point was that Dremio should handle this gracefully and not throw a NPE. It is not a serious issue in production as we are not as aggressive on expiring snapshots on our production setup. But it also sounds like a fix is already on the roadmap.

Thanks for the response.

@dotjdk Did disabling plan cache help?

To be honest I hadn’t tested it as it is not something we normally do. Usually we keep a lot more snapshots.

But I just did a quick test with plan cache disabled, and I got the same result.