I’m new to Dremio.
I’m trying to reflect on a dataset stored on Azure Data Lake as csv.gz blobs.
I successfully loaded the data lake. I then create a dataset for each of the 4 sub folders. The dataset total is 1.2 TB.
I then activate raw reflection on each dataset specifying a column with about 1K cardinality. Each reflection fail with “Failed to spill to disk. Please check space availability”.
I run a 5 VMs cluster on E16-v3 sku, which have 400GB of disk space. So that should be plenty for the 1.2 TB, shouldn’t it? Not even one work.
Here are some details on one of them:
|Input Bytes:||131.64 GB|
Any tips on why this isn’t working?