GKE Master disk usage

Hi,
We have deployed Dremio 4.5 on GKE.
1 master node (100GB disk), 3 executor pod (100GB disk)

We are getting No space left on device error on the master pod.
98GB is used by the catalog.

Can you tell me how to debug what is taking up so much of disk space?

Hello,
Dremio stores various things under ./data, e.g. a Lucene index (in db/search) for searching in the “Jobs” UI.
catalog/ contains the actual profiles for previous job executions. If you run many queries and have a long retention period configured, it can take up quite a lot of disk space.

You can do two things to reduce the disk consumption:
Bring the cluster into administration mode and run the “dremio-admin clean” task (instructions in the helm chart) – this will introduce a cluster downtime.
Or: set jobs.max.age_in_days in Admin > Support > Support Keys to a smaller value than the default (30 days I think). It will then do a nightly cleanup (default: 01:00 in the morning, support key: job.cleanup.start_at_hour)

Best, Tim

Thank you Tim.

Can you share the other support settings?
I could not find jobs.max.age_in_days setting in the docs.

Hi @unni, it is not documented, but it should still appear if you enter it into the Support key field as described in the link.

1 Like

Hi, @unni
Big portions of Dremio are open-source. I found the config keys in “ExecConstants.java” in the GitHub project. Here’s the link for the current 4.7 release: https://github.com/dremio/dremio-oss/blob/d255abfabad2c9122e1cdf030ea6bbe8f9b7ce50/sabot/kernel/src/main/java/com/dremio/exec/ExecConstants.java
Since some keys might have been added in Dremio versions more recent than yours, you might need to switch to an older version of ExecConstants via git history.

(I’m not an Dremio engineer, so please re-confirm with Dremio that it is actually okay to change some of the keys you’ll find in ExecConstants. Some of the stuff is pretty low-level and you might break things when tweaking the settings.)

Best regards, Tim

@tid Thank you Tim. Will check out the code.

@unni

This is what you do

  • Shutdown the executors
  • Shutdown the coordinator
  • cd <DREMIO_HOME>/bin
  • ./dremio-admin clean
  • save output to a file
  • start coordinator
  • start executor

Send us the saved file, we can exactly say where the space is occupied

Mostly it would be in

  • Metadata splits or Metadata multi splits which can be cleaned offline using “clean -o”
  • As @tid said, jobs and profiles - We keep for 30 days, you can set the parameter via the support key or do offline “clean -j n”, where n is number of days to keep
  • “clean -i” does a reindex
  • 'clean -c" does compacting of the database
1 Like

We ran ./dremio-admin clean on our cluster, please find the attached file.

Currently 95GB of the disk space is used by the master.
I noticed that the profile section is taking most of the space.

Please let us know what this means and how to reclaim the space.dremio-admin-clean-command-output.txt.zip (1.1 KB)

@unni

All your 90 GB is in jobs and profiles, do you have verbose profile on? If not then how many days of profiles is this, as said above you can delete jobs > n days, documentation below

https://docs.dremio.com/advanced-administration/metadata-cleanup.html#delete-jobs

jobs
basic rocks store stats
* Estimated Number of Keys: 2056631
* Estimated Live Data Size: 4365202377
* Total SST files size: 5280083846
* Pending Compaction Bytes: 179204667
* Estimated Blob Count: 0
* Estimated Blob Bytes: 0
Index Stats
* live records: 2059900

* deleted records: 270045

profiles
basic rocks store stats
* Estimated Number of Keys: 3834547
* Estimated Live Data Size: 87999247546
* Total SST files size: 93266898549
* Pending Compaction Bytes: 364795392
* Estimated Blob Count: 0
* Estimated Blob Bytes: 0

The jobs.max.age_in_days: is set to 2.
We are running close to 150000 SQL queries within a span of 8 hours. Could this lead to GC issues ?
The master pod restarts after 10 hours because master is unable to connect to Zookeeper.

@unni

What version of Dremio is this? Clicking on jobs page might cause a full GC as you have so many jobs. Also if the SQL’s are very big that can add to the issue. Here are a few things you can do

  • Make sure verbose profile is not on
  • Add the below to dremio-master.yaml under DREMIO_JAVA_EXTRA_OPTS section and restart pods
    -Xloggc:/opt/dremio/data
    -XX:+UseGCLogFileRotation
    -XX:NumberOfGCLogFiles=5
    -XX:GCLogFileSize=4000k
    -XX:+PrintClassHistogramBeforeFullGC
    -XX:+PrintClassHistogramAfterFullGC
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/opt/dremio/data
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:MaxGCPauseMillis=500
    -XX:InitiatingHeapOccupancyPercent=25
    -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log
  • Make sure we have a 16 GB heap on the coordinator
  • Once the problem happens, send us the GC logs

Thanks
Bali

We are using Dremio 4.5.
DREMIO_JAVA_EXTRA_OPTS are already set. Heap is set to 40GB.

dremio@dremio-master-0:/opt/dremio$ ps -ef | grep dremio
dremio       1     0 99 07:16 ?        02:21:15 /usr/local/openjdk-8/bin/java -Djava.util.logging.config.class=org.slf4j.bridge.SLF4JBridgeHandler -Djava.library.path=/opt/dremio/lib -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Ddremio.plugins.path=/opt/dremio/plugins -Xmx40000m -XX:MaxDirectMemorySize=11000m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/dremio -Dio.netty.maxDirectMemory=0 -DMAPR_IMPALA_RA_THROTTLE -DMAPR_MAX_RA_STREAMS=400 -Dzookeeper=zk-hs:2181 -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=false -Xloggc:/opt/dremio/data/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=4000k -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dremio/data -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=25 -XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log -cp /opt/dremio/conf:/opt/dremio/jars/*:/opt/dremio/jars/ext/*:/opt/dremio/jars/3rdparty/*:/usr/local/openjdk-8/lib/tools.jar com.dremio.dac.daemon.DremioDaemon

@balaji.ramaswamy anything that we can do from our side so that this issue does not occur.

@unni

Great !

Can we have the GC logs and so that we can review and see where the issue is?

Sharing the latest logs
2020-11-25 10:43:56,374 [zk-curator-231] ERROR ROOT - Dremio is exiting. Node lost its master status.
Dremio is exiting. Node lost its master status.
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
2020-11-25 10:43:56,387 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to LOST
172.38.2.223 - - [25/Nov/2020:10:43:56 +0000] “GET / HTTP/1.1” 200 2522 “-” “kube-probe/1.16”
2020-11-25 10:43:56,419 [Curator-ConnectionStateManager-0] INFO c.d.s.coordinator.zk.ZKClusterClient - ZK connection state changed to RECONNECTED

gclogs_25Nov.zip (3.1 MB)

@unni

I see your heap is filled with planner, metadata information, can we validate if you are not refreshing metadata very frequently. I see you are using Hive UDF’s, lot of expression based queries (maybe)?

How big is the OS RAM? Have you tried to add more scale out coordinators (since 4.8)?

We are refreshing metadata before refreshing the reflection. We are not using any Hive UDF’s, but the queries have expressions and join
Sharing a sample query.

WITH “_data1” AS
(SELECT “tx_1562674474561”.“Month” AS “Month”,
“tx_1562674474561”.“Year” AS “Year”,
“tx_1562674474561”.“DayOfMonth” AS “DayOfMonth”,
case
WHEN 100count(distinct(tif_flag_1562674600281))/nullif(count(distinct(referencenumber_1561656334791)),0) = 0 THEN
null
ELSE 100
count(distinct(tif_flag_1562674600281))/nullif(count(distinct(referencenumber_1561656334791)),0)
END AS “__tif_orders_invoiced_in_te1562675768886”
FROM
(SELECT “tx_1562674474561”.,
“period”.

FROM
(SELECT “tif_flag_1562674600281”,
tdsr_1558686830005,
CAST(the_date AS DATE) AS the_date,
referencenumber_1561656334791
FROM “projectid”.“system”.“tx_1562674474561” AS “tx_1562674474561”
WHERE ( “tx_1562674474561”.“tdsr_1558686830005” IN ( ‘RO1’ ) )
AND (“tx_1562674474561”.“the_date” <= ‘2020-12-31’
AND “tx_1562674474561”.“the_date” >= ‘2019-01-01’) ) AS “tx_1562674474561”
JOIN
(SELECT “DayOfMonth”,
“yyyyMMdd”,
“Year”,
“Month”,
TO_DATE(“period”.“the_date”,
‘YYYY-MM-DD’) AS “period_the_date”
FROM “projectid”.“system”.“period” AS “period”
WHERE (“period”.“the_date” <= ‘2020-12-31’
AND “the_date” >= ‘2019-01-01’) ) AS “period”
ON “tx_1562674474561”.“the_date” = “period”.period_the_date ) AS “tx_1562674474561”
WHERE “tx_1562674474561”.“tdsr_1558686830005” IN ( ‘RO1’ )
AND ((“tx_1562674474561”.“yyyyMMdd”
BETWEEN 20200101
AND 20201231)
OR (“tx_1562674474561”.“yyyyMMdd”
BETWEEN 20190101
AND 20191231) )
GROUP BY “tx_1562674474561”.“Month”,“tx_1562674474561”.“Year”,“tx_1562674474561”.“DayOfMonth” ) , “period” AS
(SELECT min(yyyymmdd) AS min_date,
max(yyyymmdd) AS max_date,
“period”.“Month” AS “Month”,
“period”.“Year” AS “Year”,
“period”.“DayOfMonth” AS “DayOfMonth”
FROM “projectid”.“system”.“period” AS “period”
WHERE ((“period”.“yyyyMMdd”
BETWEEN 20200101
AND 20201231)
OR (“period”.“yyyyMMdd”
BETWEEN 20190101
AND 20191231) )
GROUP BY “period”.“Month”, “period”.“Year”, “period”.“DayOfMonth” )
SELECT *
FROM
(SELECT “__tif_orders_invoiced_in_te1562675768886”,
“period”.“Month” AS “Month”,
“period”.“Year” AS “Year”,
“period”.“DayOfMonth” AS “DayOfMonth”,
min_date,
max_date
FROM “_data1” FULL OUTER
JOIN “period”
ON ( COALESCE(“_data1”.“Month”)=“period”.“Month”
AND COALESCE(“_data1”.“Year”)=“period”.“Year”
AND COALESCE(“_data1”.“DayOfMonth”)=“period”.“DayOfMonth” ) )
WHERE (“__tif_orders_invoiced_in_te1562675768886” IS NOT NULL )AND( min_date IS NOT NULl
AND max_date IS NOT NULL )

Dremio master pod has 15 core, 59 GB. We are using Kubernetes with Dremio version 4.5, we tried using multiple co-ordinator but many of the queries get timed out or the connection is lost.

@unni

Scale out coordinator is a feature from Dremio 4.7

@balaji.ramaswamy
The co-ordinator pods & values were there in the dremio-cloud-tools chart when we were using Dremio 4.5 version.Maybe 4.7 had been released when we used the chart.
It had worked for queries with 4.5 but we will definitely work on upgrading to 4.9.

Any other suggestions that you can provide based on the sample query. Any options for the planner to be less verbose which can improve the performance?

@unni

Send us the profile for that query, interested in seeing the logical plan and how big it is ?