Pyarrow COPY TO, is not a Parquet file. expected magic number

Hi Dremio Team,

when running a COPY INTO command via pyarrow on a Minio bucket I’m getting this error. When running same command in Web UI SQL Runner it works:

COPY INTO lakehouse.controlling."Artikelstammdaten" AT BRANCH "DAILY-Artikelstammdaten-2025-05-01"
FROM '@miniostore/controlling'
FILES ('Artikelstammdaten.parquet')

Before COPY, the file was copied from another location in Minio, overwritten (replaced) with same file name, but new data. After that we do REFRESH METADATA. This mechanics works well with other files in same instance, but not with this one.

Errors:

pyarrow._flight.FlightInternalError: Flight returned internal error, with message: SYSTEM ERROR: IOException: /lakehouse/upload/controlling/Artikelstammdaten.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [8, 0, 0, 0]



Location 2:5:3. gRPC client debug context: UNKNOWN:Error received from peer ipv4:172.17.0.1:32010 {created_time:"2025-05-21T07:12:11.39055654+00:00", grpc_status:13, grpc_message:"SYSTEM ERROR: IOException: /lakehouse/upload/controlling/Artikelstammdaten.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [8, 0, 0, 0]

SqlOperatorImpl TABLE_FUNCTION
Location 2:5:3
ErrorOrigin: EXECUTOR
[Error Id: 8b96ec5f-79d7-4b52-8784-d1565c905ccd on dremio:0]

  (java.lang.RuntimeException) Failed to read row groups from block split
    com.dremio.exec.store.parquet.ParquetSplitReaderCreatorIterator.initSplits():482
    com.dremio.exec.store.parquet.ParquetSplitReaderCreatorIterator.addSplits():461
    com.dremio.exec.store.parquet.ParquetScanTableFunction.addSplits():119
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():175
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():128
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():257
    com.dremio.sabot.driver.StraightPipe.pump():55
    com.dremio.sabot.driver.Pipeline.doPump():134
    com.dremio.sabot.driver.Pipeline.pumpOnce():124
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():690
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():595
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1274
    com.dremio.sabot.task.AsyncTaskWrapper.run():130
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():281
    com.dremio.sabot.task.slicing.SlicingThread.run():186
  Caused By (java.io.IOException) Exception occurred during reading the footer of /lakehouse/upload/controlling/Artikelstammdaten.parquet
    com.dremio.parquet.pages.FooterReader.toIO():87
    com.dremio.parquet.pages.FooterReader.processFooter():191
    com.dremio.parquet.pages.FooterReader.lambda$readFooterFuture$1():115
    java.util.concurrent.CompletableFuture$UniCompose.tryFire():1072
    java.util.concurrent.CompletableFuture.postComplete():506
    java.util.concurrent.CompletableFuture$AsyncRun.run():1742
    java.util.concurrent.ThreadPoolExecutor.runWorker():1128
    java.util.concurrent.ThreadPoolExecutor$Worker.run():628
    java.lang.Thread.run():829
  Caused By (java.io.IOException) /lakehouse/upload/controlling/Artikelstammdaten.parquet is not a Parquet file. expected magic number [80, 65, 82, 49] at tail, but found [8, 0, 0, 0]
    com.dremio.parquet.pages.FooterReader.checkMagicBytes():226
    com.dremio.parquet.pages.FooterReader.processFooter():135
    com.dremio.parquet.pages.FooterReader.lambda$readFooterFuture$1():115
    java.util.concurrent.CompletableFuture$UniCompose.tryFire():1072
    java.util.concurrent.CompletableFuture.postComplete():506
    java.util.concurrent.CompletableFuture$AsyncRun.run():1742
    java.util.concurrent.ThreadPoolExecutor.runWorker():1128
    java.util.concurrent.ThreadPoolExecutor$Worker.run():628
    java.lang.Thread.run():829

SqlOperatorImpl TABLE_FUNCTION
Location 2:5:3"}. Client context: IOError: Server never sent a data message. Detail: Internal

@rbecher Is it possible that /lakehouse/upload/controlling/Artikelstammdaten.parquet is not complete? For some reason Dremio says 'Not a valid parquet file"

Can you try reading just that one file via Spark SQL?

@balaji.ramaswamy The file is complete with ending PAR1. Also, it is processible in SQL Runner, so not a file problem..

@rbecher Missed the part where you had mentioned it ran from the UI. Does this fail if you to do a FORGET and the REFRESH and then first just query the file without using it in COPY command?

No, a SQL query just works. I guess it’s a metadata glitch in pyarrow implementation?

@rbecher Are you ok to share your pyarrow script/command that you are using to run the OPY command?

from pyarrow import flight
from pyarrow.flight import FlightClient

...

flight_client = FlightClient(location=(LOCATION), disable_server_verification=True)
options = flight.FlightCallOptions(headers=headers)

...

def flight_command(flight, flight_client, command, options):
  flight_info = flight_client.get_flight_info(flight.FlightDescriptor.for_command(command), options)
  results = flight_client.do_get(flight_info.endpoints[0].ticket, options)
  rows = results.read_all()
  return rows

...

COPY_INTO = f"""
COPY INTO {CATALOG}.{SCHEMA}."{TABLE}" AT BRANCH "{BRANCH}"
FROM '{COPY_SOURCE}'
FILES ('{SOURCE_FILE}')
"""
print(COPY_INTO)
rows = flight_command(flight, flight_client, COPY_INTO, options)
records = rows[0][0].as_py()
print(str(records) + " records inserted")

@rbecher Scrit seems straight forward. I assume even just a straight up COPY INTO using SQL runner on Dremio gives the same message. Can you please send the output of parquet-tools for /lakehouse/upload/controlling/Artikelstammdaten.parquet

Dremio is complaining that it is unable to read footer so want to see what is going on

@balaji.ramaswamy as already said the file is OK and ends with PAR1. Running same COPY command in SQL runner works w/o error..