Dremio query duration takes too much time

I have Dremio installed on my personal machine as a Docker image (for experimental purposes before production). The data is on the Azure cloud and was downloaded very quickly. However, executing the query takes too long (10 min). I wonder if there is any additional configuration to be done to manage the nodes and exploit maximum hardware performance to optimize the query?

Please share Query Profile to view more details and help you

1 Like

Hi @abde139 , scanning a single CSV file with 7M records is going to be slow. That’s the bottleneck in your query. You should look into loading this CSV into Iceberg format COPY INTO | Dremio Documentation or creating a reflection on the CSV Types of Reflections | Dremio Documentation. In either case, it would optimize the physical storage of your data so that the reads could be parallelized. If your data gets like 1000x larger, then you should also look into partitioning to optimize file/split pruning.

1 Like

Hi @Benny_Chow, many thanks for your valuable answer. I tried as a first step to load table as parquet (tried both Datatable and Deltatable from azure cloud). The problem here is that every time and only with a simple query (like: select * from table_name) the Dremio server stops suddenly without any errors. In that case, I don’t have access to query profile or running jobs from GUI to understand what happened !
I’ve tried this simulation many times but in vain, the server always shuts down without any error message (although it works with parquet tables of a smaller size and also works with the same table in CSV format)
I would be more than grateful if you can provide me with any guidance ?!

Which deployment model are you using?

The coordinator Java Virtual Machine never just shuts down with no error message… we have heap monitors that will cancel queries before an OOME would occur. More importantly, large data volume can affect executor resource consumption but not the coordinators. The coordinators are responsible for query planning and that’s not really affected by data volume.

1 Like

Many thanks for your guidance.
I am using Dremio on-premise as a single-node deployment (via Docker image).

I understand you. This is the last logs from the server before it shuts down:

2024-01-12 01:56:48,731 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 1a5f639f-5277-3f5b-a5a7-022f2cad2a00:0:0 with assigned weight 1 and scheduling weight 1
2024-01-12 01:56:53,807 [scheduler-7] INFO  c.d.s.r.MaterializationCache - Materialization cache updated.  Count stats: materializationReuse=0 materializationExpand=0 materializationError=0 externalReuse=0 externalExpand=0 externalError=0
2024-01-12 01:56:53,920 [FABRIC-rpc-event-queue] WARN  c.d.s.fabric.FabricMessageHandler - Message of mode REQUEST for protocol 2 of rpc type 14 took longer than 500ms. Actual duration was 532ms.
172.17.0.1 - - [12/Jan/2024:01:56:54 +0000] "GET / HTTP/1.1" 200 1298 "http://localhost:9047/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
2024-01-12 01:56:54,151 [FABRIC-3] WARN  c.d.services.fabric.FabricServer - Message of mode REQUEST of rpc type 1 took longer than 500ms.  Actual duration was 779ms.
2024-01-12 01:56:54,525 [scheduler-7] INFO  c.d.s.r.PlanCacheSynchronizer - Cleaning up 0 datasets based on entries, 0 datasets based on goals. Total (union) processed datasets: 0
2024-01-12 01:56:54,535 [scheduler-7] INFO  c.d.s.r.PlanCacheSynchronizer - Completed plan cache sync.  Cache entries before 9.  Cache entries after 9.
2024-01-12 01:56:54,617 [scheduler-7] INFO  c.d.s.r.ReflectionServiceImpl - Materialization cache sync took 353 ms.  Plan cache sync took 797 ms.
2024-01-12 01:56:57,448 [e13 - 1a5f639f-5277-3f5b-a5a7-022f2cad2a00:frag:0:0] INFO  c.d.s.e.f.FragmentStatusReporter - 1a5f639f-5277-3f5b-a5a7-022f2cad2a00:0:0 scheduling weight 1: State to report: FINISHED
172.17.0.1 - - [12/Jan/2024:01:56:57 +0000] "POST /apiv2/dataset/tmp.UNTITLED/version/0008686797910475/transform_and_run/?newVersion=0003401994254919 HTTP/1.1" 200 797 "http://localhost:9047/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
2024-01-12 01:56:57,519 [async-query-logger6] INFO  query.logger - Query: 1a5f639f-5277-3f5b-a5a7-022f2cad2a00; outcome: COMPLETED
2024-01-12 01:56:57,538 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 1a5f6397-e3c9-51d0-e301-2d159c7a8200:0:0 with assigned weight 1 and scheduling weight 1
172.17.0.1 - - [12/Jan/2024:01:56:57 +0000] "GET /api/v3/info HTTP/1.1" 200 172 "http://localhost:9047/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
172.17.0.1 - - [12/Jan/2024:01:56:58 +0000] "GET / HTTP/1.1" 200 1298 "http://localhost:9047/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
2024-01-12 01:57:02,583 [SlicingThreadMonitor] INFO  c.d.s.t.slicing.SlicingThreadMonitor - Marking thread 7 for Migration candidate as its runningTime is 5033 ms
2024-01-12 01:57:10,504 [cache-async-io-28] WARN  c.d.exec.store.dfs.LoggedFileSystem - asyncRead.complete elapsed=8683ms scheme=dremioAzureStorage:// path=/working/0FILES_TABLES/COMPTE/part-00000-tid-8733486745133430435-4661f76a-9d56-4cb0-bd7b-1c768891c64b-2860-1-c000.snappy.parquet offset=126309151 nbytes=1048576
2024-01-12 01:57:11,580 [FABRIC-rpc-event-queue] WARN  c.d.s.fabric.FabricMessageHandler - Message of mode REQUEST for protocol 2 of rpc type 14 took longer than 500ms. Actual duration was 855ms.
2024-01-12 01:57:11,730 [FABRIC-3] WARN  c.d.services.fabric.FabricServer - Message of mode REQUEST of rpc type 1 took longer than 500ms.  Actual duration was 977ms.