IOException: No space left on device while rewrite

jaroslav_marko · September 18, 2024, 6:46pm

Hi, when I try to rebuild a table I get error that seems like that we are running out of space somewhere, but we have plenty of available space everywhere. Can you please help me to diagnose the issue, thanks.

IOException: No space left on device

55b58e3a-4131-4b49-8613-b9f024594a3f.zip (101.6 KB)

Thanks
Jaro

balaji.ramaswamy · September 19, 2024, 4:38am

@jaroslav_marko Looks like it is on EXTERNAL_SORT, Can you check under data on the executors what is the disk size? or if you have configured spill to a specific location check there. One thing to note is that spill files will be deleted once query fails with no space available

jaroslav_marko · September 19, 2024, 11:08am

Hi @balaji.ramaswamy I have upgraded the executors from 100GB to 1TB storage each, spilling is probably on, becuase I see the spill sign next to some queries. (can I check it somewhere?)
I have run it again, but failed on different error.

AttemptManager dremio-master-0.dremio-cluster-pod.mlops-lakehouse.svc.cluster.local no longer active. Cancelling fragment 1914442a-2223-0edd-c23e-c040f3713800:1:52

82e47e76-09ee-43a6-bc92-64b09c6cc64d.zip (102.7 KB)

Thanks Jaro

balaji.ramaswamy · September 19, 2024, 6:45pm

@jaroslav_marko

This is a completely different error, looks like your Master went down. Can you please check logs if it was restarted or you actually see a shutdown-thread?

jaroslav_marko · September 20, 2024, 8:26am

Hi @balaji.ramaswamy unfortunately we have lost the logs
I have run the job again and it did not finish after 17hours so I canceled it.
trying again, but it is not such a big table that this operation should be a problem.
btw. when optimize process fails what happens with already created files? how to clean them?
thanks
Jaro

balaji.ramaswamy · September 21, 2024, 9:00pm

Those are considered ‘orphan files’. We should have a mechanism to discard these half-baked files similar to the docs link ive attached

iceberg-tables-compaction-expiring-snapshots-and-more/

jaroslav_marko · September 23, 2024, 7:58am

hi @balaji.ramaswamy thanks for the guide how to maintain Iceberg tables.

coming back to the original issue → i let the job run until failed (>18h). see the profile attached. can you help me to identify the error.

6f1e85f3-ee15-4710-8848-b08e7166ee65.zip (122.4 KB)

best regards
Jaro

balaji.ramaswamy · September 25, 2024, 4:03am

@jaroslav_marko Looks like dremio-executor-5.dremio-cluster-pod.mlops-lakehouse.svc.cluster.local went unresponsive, do you have the server.log and GC logs when the error hapened?

Topic		Replies	Views
Failed to spill to disk. Please check space availability;	1	1641	April 30, 2020
IOException: No space left on device in the Dremio like AWS Service	1	1821	March 21, 2022
IOException: No space left on device on dremio	4	2687	June 22, 2020
IOException: No space left on device in Dremio like AWS Service Dremio Cloud	2	1537	March 25, 2022
DATA_WRITE ERROR: No space left on device	1	1077	March 25, 2022

IOException: No space left on device while rewrite

Related topics