Able to read parquet file with parquet-tools, but not dremio

I’m writing parquet files that are not readable from Dremio. They are readable using parquet-tools 1.9 though. I get an error:

Failed to decode column name::varchar

Turning on snappy compression for the columns produces a different error.

Failure while attempting to retrieve metadata information for table

Still readable with parquet-tools though.

Dremio is using a 1.8 version, not sure if that could cause this.

How/what did you use to generate those pqt files? Do you know the block/page sizes of those files?
FYI this is what we typically recommend - https://docs.dremio.com/advanced-administration/parquet-files.html

I’m using parquetjs and verifying the output using parquet-tools version 1.9.0. I’m writing just three small rows of data just to test.

test program:

"use strict";

var parquet = require('./parquet')

var schema = new parquet.ParquetSchema({
  name: { type: 'UTF8', compression: 'SNAPPY' },
  quantity: { type: 'INT64', compression: 'SNAPPY' },
  price: { type: 'DOUBLE', compression: 'SNAPPY', optional: true },
  date: { type: 'TIMESTAMP_MILLIS', compression: 'SNAPPY' },
  in_stock: { type: 'BOOLEAN', compression: 'SNAPPY' }
})


var x = async function() {
  var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet',{
    compression: 'SNAPPY'
  })

  await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true})
  await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true})
  await writer.appendRow({name: 'pears', quantity: 10, date: new Date(), in_stock: true})

  writer.close()
}

x()

Here is the output of parquet-tools:

row group 0

name: BINARY SNAPPY DO:0 FPO:4 SZ:90/90/1.00 VC:3 ENC:PLAIN,RLE
quantity: INT64 SNAPPY DO:0 FPO:156 SZ:77/77/1.00 VC:3 ENC:PLAIN,RLE
price: DOUBLE SNAPPY DO:0 FPO:310 SZ:88/88/1.00 VC:3 ENC:PLAIN,RLE
date: INT64 SNAPPY DO:0 FPO:472 SZ:94/94/1.00 VC:3 ENC:PLAIN,RLE
in_stock: BOOLEAN SNAPPY DO:0 FPO:639 SZ:43/43/1.00 VC:3 ENC:PLAIN,RLE

name TV=3 RL=0 DL=0
----------------------------------------------------------------------------
page 0:  DLE:RLE RLE:RLE VLE:PLAIN ST:[no stats for this column] SZ:30 VC:3

quantity TV=3 RL=0 DL=0
----------------------------------------------------------------------------
page 0:  DLE:RLE RLE:RLE VLE:PLAIN ST:[min: 10, max: 10, num_nulls: 0] SZ:24 [more]...

price TV=3 RL=0 DL=1
----------------------------------------------------------------------------
page 0:  DLE:RLE RLE:RLE VLE:PLAIN ST:[min: 2.50000, max: 2.50000, [more]... VC:3

date TV=3 RL=0 DL=0
----------------------------------------------------------------------------
page 0:  DLE:RLE RLE:RLE VLE:PLAIN ST:[min: 1534690611464, max: 15 [more]... VC:3

in_stock TV=3 RL=0 DL=0
----------------------------------------------------------------------------
page 0:  DLE:RLE RLE:RLE VLE:PLAIN ST:[min: true, max: true, num_nulls: 0] [more]...

BINARY name

*** row group 1 of 1, values 1 to 3 ***
value 1: R:0 D:0 V:apples
value 2: R:0 D:0 V:oranges
value 3: R:0 D:0 V:pears

INT64 quantity

*** row group 1 of 1, values 1 to 3 ***
value 1: R:0 D:0 V:10
value 2: R:0 D:0 V:10
value 3: R:0 D:0 V:10

DOUBLE price

*** row group 1 of 1, values 1 to 3 ***
value 1: R:0 D:1 V:2.5
value 2: R:0 D:1 V:2.5
value 3: R:0 D:0 V:

INT64 date

*** row group 1 of 1, values 1 to 3 ***
value 1: R:0 D:0 V:1534690611464
value 2: R:0 D:0 V:1534690611465
value 3: R:0 D:0 V:1534690611465

BOOLEAN in_stock

*** row group 1 of 1, values 1 to 3 ***
value 1: R:0 D:0 V:true
value 2: R:0 D:0 V:true
value 3: R:0 D:0 V:true

This is looking like an issue with parquet-cpp in general. I built parquet-cpp and see some errors there as well when reading the output. So doesn’t look like a dremio specific issue. I’ll keep researching, but not likely anything to be done on the dremio side of things.

Still trying to figure out the issue. The error is here:

(java.lang.NullPointerException) null com.dremio.parquet.pages.IncrementalPageReaderIterator.next():94 com.dremio.parquet.pages.MemoizingPageIterator.next()

But I don’t see that code in github. Do you know where I need to go to have a look?

I found the issue, parquet-cpp doesn’t support DATA_PAGE_V2 yet. I needed to downgrade the library to use DATA_PAGE.

1 Like

I have very close issue. I am trying to generate parquet file in Spark-shell and I can read the file in Spark but when I open the parquet file in Dremio, I am getting errors “Something went wrong”

Error in parquet reader (complex). Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message spark_schema

Could you please try 3.1.4 version, we recently fixed a bug around reading v2 data pages in parquet.

You mean 3.1.3 or 3.1.4? The main page shows 3.1.3 available for download.

Hi @david.lee,

Apologies for the confusion. This fix will be available in the next Community Edition release, 3.1.6, which will be available next week.

Hello Friends

Im trying to read a parquet file stored in S3 bucket using Dremio Dataset. I am receiving a error Error in parquet reader (complex). Message: Failure in setting up reader Parquet Metadata: ParquetMetaData{FileMetaData{schema: message spark_schema . Can you please help here.

Thanks,
Harish.

That is because the file is compressed as snappy, and has a .crc extension on the end, try to generate the files uncompressed, and you’ll be good to go.