Snappy Parquet Failures

I’m able to access parquet data using ‘hadoop fs -tail’ commands from the Dremio server and hive data via beeline, but I’m not able to access the same data via the Dremio application using either MapRFS or Hive Metastore connections. In all three cases I’m using the credentials that are running the Dremio service.

The error I receive when trying a Hive Metastore connection is:
SYSTEM ERROR: AccessControlException:

Followed by:
(java.lang.RuntimeException) Failed to read parquet footer for file

Has anyone seen this behavior before?

Hi @MattD15,

Can you provide a profile for the job associated with this error? This will hopefully include the complete stack trace.

Also, in beeline, can you run describe formatted on one of the tables you do not have access to in Dremio and attach the result?

Hi @ben, thanks for the reply. I’m working on getting those details back to you. One thing I noticed in the documentation was this: "Dremio does not support MapR cluster names that are non-URI qualified (e.g. containing “_” character). "

We’ve been using Dremio successfully so far with clusters containing ‘_’, but could this be the cause of the issue I’m facing?
Thanks again.

Hi @MattD15,

I’m not sure about the cluster naming support and will have to get back to you shortly with confirmation on the effects of that.

Another question: is your Hive service configured through MapR as one of their ecosystem components? If you are running Dremio on MapR and trying to connect to some Hive source external to it, that likely will not work and may cause an error similar to one you see, i.e. you can add the source but not access any of the tables.