Dremio iceberg fail to read parquet generated by itself

Hi there,

I got a question when using Dremio with Iceberg… I’m running to my job to read multiple simple parquets. using MERGE INTO to continue updating an Iceberg table with the defined primary key. Then I hit this error message in the middle

"IOException: /datalake/myicetable/xxxxxxxxxxxx-7056-c0b7-61fc0bd70d00/3_19_0.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [111, 109, 13, 122]"

I checked the parquet file (and rest of files included according to iceberg metadata) are correct by

  • print out the last 4 bytes is correct (i.e. b’PAR1’)
  • move it to a separate folder and format it as a physical dataset, Dremio can process it perfectly
  • read by pandas, show shape and rows perfectly
SYSTEM ERROR: IOException: /datalake/myicetable/xxxxxxxxxxxx-7056-c0b7-61fc0bd70d00/3_19_0.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [111, 109, 13, 122]

SqlOperatorImpl TABLE_FUNCTION
Location 5:12:2
Fragment 5:0

[Error Id: e1ff0824-16e3-4819-bda6-0ad478149b48 on dremio-executor-1.dremio-cluster-pod.default.svc.cluster.local:0]

  (java.lang.RuntimeException) Failed to read row groups from block split
    com.dremio.exec.store.parquet.ParquetSplitReaderCreatorIterator.initSplits():393
    com.dremio.exec.store.parquet.ParquetSplitReaderCreatorIterator.addSplits():375
    com.dremio.exec.store.parquet.ParquetScanTableFunction.addSplits():91
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():103
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():209
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():124
    com.dremio.sabot.driver.Pipeline.pumpOnce():114
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():544
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():472
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700():106
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():978
    com.dremio.sabot.task.AsyncTaskWrapper.run():121
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():249
    com.dremio.sabot.task.slicing.SlicingThread.run():171
  Caused By (java.io.IOException) /datalake/myicetable/xxxxxxxxxxxx-7056-c0b7-61fc0bd70d00/3_19_0.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [111, 109, 13, 122]
    com.dremio.parquet.pages.FooterReader.checkMagicBytes():165
    com.dremio.parquet.pages.FooterReader.processFooter():95
    com.dremio.parquet.pages.FooterReader.lambda$readFooterFuture$1():82
    java.util.concurrent.CompletableFuture.uniCompose():966
    java.util.concurrent.CompletableFuture$UniCompose.tryFire():940
    java.util.concurrent.CompletableFuture.postComplete():488
    java.util.concurrent.CompletableFuture$AsyncRun.run():1646
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():750

Appreciate for your comment, thanks in advance
Donald

Hey Donald–

In case you missed it, one of my colleagues responded to your post in the #vendor-dremio channel with so follow up questions and some ideas on how to fix it.

I’ll post them here in case it’s easier for you to reply here.

First, we’d love to know what dremio editon and version are you using (for example, enterprise v24.0.0).

Second, my colleague thinks this may a caching issue that could be resolved by a forget metadata call.

Can you try a `FORGET METADATA` followed by re-promote for the table and see if the issue remains? for example, through sql:
`ALTER TABLE myicetable FORGET METADATA`
`ALTER ABLE myicetable REFRESH METADATA AUTO PROMOTION`
you can also do this through the dataset UI

Please do let us know.

Thanks,

Dan