[solved][hive] "Not a file" exception when read Hive table with subdirectories

Our data in Hive is organized with subdirectories under the table directory but Dremio seems assume the file structure is flat the all the data lives under the table directory without any subfolders. When we the non-flat tables, we get:

Caused by: java.io.IOException: Not a file: adl://home/datasets/subdir/subdir2/subdir3/subdir4/2018-03-14
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) ~[hadoop-mapreduce-client-core-2.8.0.jar:na]
        at com.dremio.exec.store.hive.DatasetBuilder$HiveSplitsGenerator.runInner(DatasetBuilder.java:377) ~[dremio-hive-plugin-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
        at com.dremio.exec.store.hive.DatasetBuilder.buildSplits(DatasetBuilder.java:444) ~[dremio-hive-plugin-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
        at com.dremio.exec.store.hive.DatasetBuilder.buildIfNecessary(DatasetBuilder.java:285) ~[dremio-hive-plugin-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
        at com.dremio.exec.store.hive.DatasetBuilder.getDataset(DatasetBuilder.java:204) ~[dremio-hive-plugin-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
        at com.dremio.exec.catalog.DatasetManager.getTableFromPlugin(DatasetManager.java:297) [dremio-sabot-kernel-2.0.1-201804132205050000-10b1de0.jar:2.0.1-201804132205050000-10b1de0]
        ... 32 common frames omitted

In which /home/datasets/subdir/subdir2/subdir3/subdir4/ is path for the table and 2018-03-14 is another folder in which the real data live inside that folder.

Hi @dli16

This usually happens when there are extra folders under the final Hive folder and Hive is complaining “Not a file”. You can do this, go to your Hive shell and do the below

describe formatted .;

This would give the path to the actual Hive files on disk

If regular file system do a "ls -ltrh "
if HDFS the "hadoop fs -ls "

See if the definition matches the files and if there are any additional sub folders?


Yes, there are extra folders under final Hive folder. We did this on purpose so we can organize our data better. We can read the table fine with other tools.

I figured it out. Just set: mapred.input.dir.recursive and hive.mapred.supports.subdirectories to true.