java.io.IOException: org.rocksdb.RocksDBException:No space left on device

Hello Dremio team,

I am running Dremio on a single EC2 m5.large instance with 8GB of RAM. I was trying to read a 150MB xlsx file from a S3 bucket and upon reading it I got the following errors and exceptions. Currently I cannot even start Dremio and all my data is inaccessible :frowning: . I tried rebooting the EC2 instance and even stop and starting it from the AWS console but every time I try to start Dremio, I get the same error.

Any idea how I might at least start Dremio :dolphin: ?

Thank you,
George

Errors/Exceptions from /var/log/dremio/server.out

hu Oct 11 21:15:22 UTC 2018 Starting dremio on ip-x-y-z-w
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30658
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 30658
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
KVStore version is 2.1.6-201809161906440178-edb5b4d



Catastrophic failure occurred. Exiting. Information follows: Failed to start services, daemon exiting.
java.io.IOException: org.rocksdb.RocksDBException: While appending to file: /var/lib/dremio/db/catalog/000178.sst: No space left on device
	at com.dremio.datastore.RocksDBStore.exclusively(RocksDBStore.java:373)
	at com.dremio.datastore.RocksDBStore.close(RocksDBStore.java:335)
	at com.dremio.datastore.ByteStoreManager$2.onRemoval(ByteStoreManager.java:83)
	at com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1963)
	at com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3562)
	at com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3538)
	at com.google.common.cache.LocalCache$Segment.clear(LocalCache.java:3309)
	at com.google.common.cache.LocalCache.clear(LocalCache.java:4322)
	at com.google.common.cache.LocalCache$LocalManualCache.invalidateAll(LocalCache.java:4937)
	at com.dremio.datastore.ByteStoreManager.close(ByteStoreManager.java:250)
	at com.dremio.common.AutoCloseables.close(AutoCloseables.java:92)
	at com.dremio.common.AutoCloseables.close(AutoCloseables.java:71)
	at com.dremio.datastore.CoreStoreProviderImpl.close(CoreStoreProviderImpl.java:197)
	at com.dremio.datastore.LocalKVStoreProvider.close(LocalKVStoreProvider.java:152)
	at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:172)
	at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:103)
Caused by: org.rocksdb.RocksDBException: While appending to file: /var/lib/dremio/db/catalog/000178.sst: No space left on device
	at org.rocksdb.RocksDB.flush(Native Method)
	at org.rocksdb.RocksDB.flush(RocksDB.java:1760)
	at com.dremio.datastore.RocksDBStore.lambda$close$2(RocksDBStore.java:339)
	at com.dremio.datastore.RocksDBStore.exclusively(RocksDBStore.java:365)
	... 15 more

You can try our metadata cleanup to free some space - https://docs.dremio.com/advanced-administration/metadata-cleanup.html

Looks like you are out of space on the local file system where /var/lib/dremio/db/catalog/ is mounted. Can you allocate some more space there?

Hi @Anthony,

Thank you for this information. I tried all these options but none of them worked. I think me also deleting a file as I am sharing below might have something to do with it. I ended up reinstalling Dremio and loosing all my data. I am sure I can replicate this issue by reading a large file (~150MB) so I can try the admin commands once more.

sudo /opt/dremio/bin/dremio-admin clean -j

Failed to complete cleanup.
com.beust.jcommander.ParameterException: Expected a value after parameter -j
sudo /opt/dremio/bin/dremio-admin clean --max-job-days

Failed to complete cleanup.
com.beust.jcommander.ParameterException: Expected a value after parameter --max-job-days
sudo /opt/dremio/bin/dremio-admin clean -I

018-10-12 14:21:31,117 [main] INFO  c.d.common.scanner.BuildTimeScan - 
Loaded prescanned packages [com.dremio.storage ...

Failed to complete cleanup.
org.rocksdb.RocksDBException: Sst file size mismatch: /var/lib/dremio/db/catalog/000134.sst. Size recorded in manifest 2527531, actual size 0

At this point I think I deleted file /var/lib/dremio/db/catalog/000134.sst thinking that it was the culprit. From the message below, I was wrong:

sudo /opt/dremio/bin/dremio-admin clean --reindex-data

Failed to complete cleanup.
org.rocksdb.RocksDBException: Sst file size mismatch: /var/lib/dremio/db/catalog/000134.sst. Size recorded in manifest 2527531, actual size 0

Thank you for your feedback,
George

Hi @kelly,

That’s what I thought as well but given that my input file was ~150MB and I had 8GB of space left I could not see how adding more space would help. Is there a way to confirm on which partition is /var/lib/dremio/db/catalog/ mounted on? I would imagine one of the 2 main operations below:

df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.8G   44K  3.8G   1% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
/dev/nvme0n1p1  7.8G  7.2G  518M  94% / 

I ended up reinstalling Dremio and loosing all my data. I am sure I can replicate this issue by reading a large file (~150MB) but I want to confirm where is thecatalog folder mounted on first.

Thank you again,
George