Hi there,
I got a question when using Dremio with Iceberg… I’m running to my job to read multiple simple parquets. using MERGE INTO to continue updating an Iceberg table with the defined primary key. Then I hit this error message in the middle
"IOException: /datalake/myicetable/xxxxxxxxxxxx-7056-c0b7-61fc0bd70d00/3_19_0.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [111, 109, 13, 122]"
I checked the parquet file (and rest of files included according to iceberg metadata) are correct by
- print out the last 4 bytes is correct (i.e. b’PAR1’)
- move it to a separate folder and format it as a physical dataset, Dremio can process it perfectly
- read by pandas, show shape and rows perfectly
SYSTEM ERROR: IOException: /datalake/myicetable/xxxxxxxxxxxx-7056-c0b7-61fc0bd70d00/3_19_0.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [111, 109, 13, 122]
SqlOperatorImpl TABLE_FUNCTION
Location 5:12:2
Fragment 5:0
[Error Id: e1ff0824-16e3-4819-bda6-0ad478149b48 on dremio-executor-1.dremio-cluster-pod.default.svc.cluster.local:0]
(java.lang.RuntimeException) Failed to read row groups from block split
com.dremio.exec.store.parquet.ParquetSplitReaderCreatorIterator.initSplits():393
com.dremio.exec.store.parquet.ParquetSplitReaderCreatorIterator.addSplits():375
com.dremio.exec.store.parquet.ParquetScanTableFunction.addSplits():91
com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():103
com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():209
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():124
com.dremio.sabot.driver.Pipeline.pumpOnce():114
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():544
com.dremio.sabot.exec.fragment.FragmentExecutor.run():472
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1700():106
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():978
com.dremio.sabot.task.AsyncTaskWrapper.run():121
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():249
com.dremio.sabot.task.slicing.SlicingThread.run():171
Caused By (java.io.IOException) /datalake/myicetable/xxxxxxxxxxxx-7056-c0b7-61fc0bd70d00/3_19_0.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [111, 109, 13, 122]
com.dremio.parquet.pages.FooterReader.checkMagicBytes():165
com.dremio.parquet.pages.FooterReader.processFooter():95
com.dremio.parquet.pages.FooterReader.lambda$readFooterFuture$1():82
java.util.concurrent.CompletableFuture.uniCompose():966
java.util.concurrent.CompletableFuture$UniCompose.tryFire():940
java.util.concurrent.CompletableFuture.postComplete():488
java.util.concurrent.CompletableFuture$AsyncRun.run():1646
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():750
Appreciate for your comment, thanks in advance
Donald