'Failed to Fetch' on a hadoop save

We are using the Dremio community edition (MapR release) with MapR on Azure.
Dremio is able to connect to MapRfs, show a preview of files. But, when we try to save the file in parquet format, we see an error ‘Failed to Fetch’.
After throwing the said error, Dremio crashes. We’d have to manually restart it again.
We do not have this issue with any other data sources.(SQL etc)

There are also no logs printed of the event inside server.log or server.out

We’d greatly appreciate any help in this matter.

Hi @ruppala

Kindly send us the server.log and server.out at the time of the error

Thanks
@balaji.ramaswamy

logs.zip (6.9 KB)

Thank you for the immediate response. Please find the logs attached.

Hi @ruppala

Kindly send me the dremio-env file and output of “free -g” from the unix prompt

Can you also send us /var/log/messages and dmesg at the time this issue happened?

Also can you please send us the server.log before and after the event. I see only the startup in the log attached?

Thanks
@balaji.ramaswamy

output of free -g:

total used free shared buff/cache available
Mem: 62 30 24 0 7 31
Swap: 1 0 1

Please find the logs that includes everything printed after startup until the next restart
log.zip (3.7 KB)

Hi @balaji.ramaswamy could you please take a look.

Hi @ruppala

I do not see anything in the logs other than a metadata refresh for the SQL server query

2019-04-30 13:34:34,078 [main] INFO com.dremio.dac.server.WebServer - Started on http://localhost:9047
2019-04-30 13:34:34,215 [main] INFO c.dremio.dac.server.LivenessService - Started liveness service on port 46201
2019-04-30 13:34:39,543 [metadata-refresh-Finance DataMart] WARN c.d.e.store.jdbc.JdbcSchemaFetcher - Took longer than 5 seconds to query row count for [FDM].[dbo].[FCT_BAL], Using default value of 1000000000.
com.microsoft.sqlserver.jdbc.SQLServerException: The query has timed out.
at com.microsoft.sqlserver.jdbc.TDSCommand.checkForInterrupt(IOBuffer.java:6498) ~[microsoft-sqljdbc41-4.2.6420.100.jar:na]

Can you send me the server.gc? Try this again and send the server.gc, server.gc.1

Thanks
@balaji.ramaswamy

Thank you @balaji.ramaswamy
There is just one message that was printed in the server.gc when I repeated the save:

2019-04-30T19:08:13.712+0000: 22.187: [GC (Allocation Failure) [PSYoungGen: 868352K->37502K(1341952K)] 929037K->98211K(1923584K), 0.0292115 secs] [Times: user=0.09 sys=0.03, real=0.03 secs]

And in server.gc.1:

2019-04-30T19:00:49.928+0000: 209.822: [GC (Allocation Failure) [PSYoungGen: 1227737K->25415K(1338880K)] 1258083K->194728K(1819136K), 0.0922304 secs] [Times: user=0.38 sys=0.03, real=0.09 secs]

Hi @balaji.ramaswamy. Please take a look at this when you can.

Hi @balaji.ramaswamy, please let me know if you need any other information.
Thank you!

Hi @ruppala

Is your data on ADLS?

Thanks
@balaji.ramaswamy

Hi @balaji.ramaswamy,

The data is on MapR stood up on Azure VMs.

@ruppala

I think it would be good if you enable debug and review the logs. Would be a bit noisy but have to look through it

Under the conf folder vi logback.xml and change the below to debug

Then restart Dremio and check for errors during startup time or before it crashes

If this does not reveal anything then change above to info from debug and try below

Restart again

Check log again