Slow preview/run on reasonably sized machine

Hello everybody,
we are working on a single node Dremio deploy (c+e) on a 8 core 64 gb vm. We are trying to access/view a reasonably small (10k rows) virtual dataset from a physical dataset on a HDFS we access through the Hive connector. This takes way too long considering the hardware and the env settings:
DREMIO_MAX_HEAP_MEMORY_SIZE_MB=8192 and DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=51200.
What puzzles us is the fact that the very same virtual dataset coming from the very same physical dataset, when accessed through a less powerful virtual machine (4 core, 16GB), with default environment, basically answers instantly. Both version are installed on a Rhel 7.6 , both are 3.2.4 version and both are only used for the Dremio service. I’m attaching the two query profiles for the same run on the two different machines. dremio8cpu64gb.zip (6,7 KB) dremio4cpu16gb.zip (6,5 KB)

@will23332

On the slower one, your are suffering from the below

#1 Planning is take 1.4s (Try to run second time and see if it is reproducible)
#2 HIVE_SUB_SCAN 00-xx-08 is waiting in IO for 7.1s (processing is also slightly slower ~ 1s)

@balaji.ramaswamy
Thank you for your response.
Regarding planning, it does not seem consistent, other runs haven’t given us long planning times.
Regarding the scan, it seems to be consistengly waiting and almost always taking twice the processing time (this is not consistent as well, sometimes it wouldn’t wait as much). On the less powerful machine usually the waiting time takes a lot less than the processing time. I’m attaching the last run I did on the more powerful one.
I thought it could have been something regarding metadata but attaching the same dataset as new source doesn’t change the outcome.
What worries me that it could be something to do with the machine as a whole comes from some other tests we ran on an imported parquet of 200k rows. Also with this test the less performing machine performed better (less waiting time on the row_scan and less time on processing the same operation). I’m attaching those two profile as well if it could be any help. Is there any test of any sort I could do to check where this process fails? Thanks Again.
dremio4cpu16gb-importedparquet.zip (30,7 KB) dremio8cpu64gb-importedparquet.zip (30,9 KB) dremio8cpu64gb-lastfromhive.zip (6,7 KB)

What I just noticed is that on the more powerful machine, the Dremio process seems to be engaging a processor to at least 70-100% even when idling (with zero jobs running)… the same thing doesn’t happen on the other machine, if no jobs are running the cpu stats do not even get to 10%. Any way to see if Dremio is on some operations which are not shown? Tailing the server.log didn’t give me any info, the only one that seems to be interesting enough is the server.gc that shows the garbage collector failing every second or so:

2019-10-24T10:22:56.415+0000: 67160.656: [GC (Allocation Failure) [PSYoungGen: 2206256K->217616K(2375680K)] 3253628K->1327556K(4431360K), 1.2654839 secs] [Times: user=8.50 sys=1.35, real=1.26 secs]
2019-10-24T10:23:47.618+0000: 67211.859: [GC (Allocation Failure) [PSYoungGen: 2164752K->182641K(2129920K)] 3274692K->1355861K(4185600K), 1.5794676 secs] [Times: user=8.40 sys=1.73, real=1.58 secs]
2019-10-24T10:24:43.786+0000: 67268.027: [GC (Allocation Failure) [PSYoungGen: 2129777K->148992K(2362880K)] 3302997K->1384043K(4418560K), 1.6208318 secs] [Times: user=8.99 sys=2.90, real=1.62 secs]