Hi,
I’m new to Dremio.
I’m trying to reflect on a dataset stored on Azure Data Lake as csv.gz blobs.
I successfully loaded the data lake. I then create a dataset for each of the 4 sub folders. The dataset total is 1.2 TB.
I then activate raw reflection on each dataset specifying a column with about 1K cardinality. Each reflection fail with “Failed to spill to disk. Please check space availability”.
I run a 5 VMs cluster on E16-v3 sku, which have 400GB of disk space. So that should be plenty for the 1.2 TB, shouldn’t it? Not even one work.
Here are some details on one of them:
Input Bytes: |
131.64 GB |
Input Records: |
458,525,083 |
Any tips on why this isn’t working?
@vplauzon
Would it be possible to share the query profile?
Hi Balaji,
Of course, here it is. I scrubbed a couple of details (email + storage account name).header.zip (47.3 KB)
Thank you!
@vplauzon
It may be possible that Dremio is not able to write to “/mnt/resource/dremio/spill” due to permissions. Can we validate this?
How can we validate that?
It’s a vanilla image fresh from the Azure Marketplace (VMSS). I didn’t tweak the config in any way.
@vplauzon Log on to “mydremioq000001.internal.cloudapp.net” and go to “/mnt/resource/dremio/spill” and check
Hi Balaji,
Logging in the first executor, trying I got the following with a sudo-cd:
-bash: cd: /mnt/resource/dremio/spill: No such file or directory
The “highest” I could get through the hierarchy is:
$ sudo ls /mnt/resource
DATALOSS_WARNING_README.txt lost+found
I did update the VM scale set image…
Is there an issue with the Azure Marketplace image?
Any ideas?
Do you think there was a bug with the Az Marketplace version I used?
@vplauzon Let me check, For Azure, recommend using AKS, is that possible?
The Azure Marketplace is using VM Scale set. I know AKS is also possible.
@vplauzon On Azure the recommended deployment is AKS, would that be ok to try?
http://docs.dremio.com/deployment/azure-aks/azure-aks/
Sure, I’ll give it a shot.