Hi All,
i’m struggeling to create reflections on bigger data.
Let’s say a creation of a table took f.e. 1hours the reflection on similar data is progressing already 20h a mostly fails. At the same time the cluster is not using any resources.
Attaching progressing profile:
47216db1-017a-4f37-b9ff-7cbf9e0a03ea.zip (283.9 KB)
And here the cluster utilization - one workers on 4%, rest idle.
Any idea how to make this more efficient?
I have now tables with 17B rows ready for testing and I suspect that I won’t be able to create any reflections at all…
Thanks
Jaro
Hi i had to kill the query after 33h…
attaching the profile for further analysis.
0a0e9389-429a-440c-aa36-f1761e8d84b3.zip (283.9 KB)
Jaro
There’s data skew issue in the sort operator:
There are 84 worker threads but only 2 are actually doing work. The skew is caused by the data not being well distributed across the unique partition tuples. I think you need to adjust the reflection’s partition fields and/or use partition transforms.
Hi @Benny_Chow can you please explain me some more on how it works?
my intention with this raw reflection was to create a full snapshot of view joining two tables with one partition field that had very low cardinality (<10 distinct values).
both input tables are partitioned by the same field and rewritten (optimize with data rewrite run after adding partition). thank you
@jaroslav_marko
By default reflection creation tries to minimize files. Try switching to “Minimize Refresh Time” and it might help with skew. Open the reflection layout and there will be a cogwheel, click on it and change to “Minimize Refresh Time”. This may slow down the query that uses the reflection as it has to read more files
What Benny is saying is to see if you can partition on a key that has more uniform distribution column,
Are you partitioning on perspective_id
? if you run below query, is it skewed?
select perspective_id, count(*) ct group by 2 order by 1 desc
Attaching screenshots for changing refresh setting
I also see you are creating a raw reflection on the entire dataset, no filters and no columns reduced. See if you have an opportunity to create the reflection on reduced columns and rows based on dash board requirements
For example if the report or dashboard only selects from a subset of columns then only select those or if the reports run only on say last 2 years of data, see if you can add a filter
Right now you are joining a 1.3 Billion row table with a 750 Million row table
Hi @balaji.ramaswamy, thanks for your recommendations.
Indeed i can reduce the fields and content of the reflection. I wanted to “materialize” the join, because it is often used in different queries to accelerate every query that would use the join. This does not seem to be best approach, correct?
With skew i understood uneven distribution of values in the physical files (what i wanted to address with optimize rewrite data - that was failing too )
What you mean is uneven distribution of values, right?
Current distribution is like this - there are only 3 unique values
from my perspective it is not ideal, but not too bad either.
Thanks for helping me to understand better how it works,
jaro
@jaroslav_marko How long did the refresh take with minimize refresh time option? Kindly provide a profile
hi @balaji.ramaswamy , it still progresses (4.5h). attaching profile.
3ea99603-eb18-4bb7-bd0b-c92a6f99b95e.zip (283.9 KB)
The creation failed after 6.5h and next attempt after 13mins. Attaching both profiles.
Jaro
0dd23100-600d-4271-a987-20efddab0290.zip (230.6 KB)
1cd6aa78-d970-4954-9778-e2306e4c9065.zip (287.8 KB)
@jaroslav_marko I can see an executor disconnect, do you have GC logs configured, can you please send them for node 198.18.101.215
along with the server.log. It should cover the below times in UTC
Time in UTC
Start time: 2024-10-02T07:50:03.545Z
End time: 2024-10-02T14:23:48.447Z
Hi @balaji.ramaswamy i’m afraid we do not have GC logs setup, can you please let me know how to do it? thanks
On top we just realized that there are missing server.log files completely on both executors and master node. This is strange as we remember that the files were present in 25.0 version.
Is there some setting (maybe helm chart update) that is influencing this, please?
Jaro
To configure GC logging
extraStartParams: >-
-Xloggc:/opt/dremio/data/gc-%t-%p.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=8000k
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintClassHistogramBeforeFullGC
-XX:+PrintClassHistogramAfterFullGC
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/dremio/data
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:MaxGCPauseMillis=500
-XX:InitiatingHeapOccupancyPercent=25
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintReferenceGC
-XX:ErrorFile=/opt/dremio/data/hs_err_pid%p.log
Executor logs should already be persistent, send output of ps -ef | grep dremio