Query subfolders


How can I query files in subfolders of a data-lake?


routputdata has multiple folders in it. Within those folders I have the parquet files.

While I can query files that are directly under routputdata I’m not able to query parquet files within subfolders. I tried various constellations but none worked. I also tested whether the issue is that its a numeric folder but that wasn’t it.

SELECT * FROM routputdata.“202010608054747”.“event.parquet”
doesn’t work.

And as a second question I wanted to ask whether there’s a way to dynamically set that folder through another query.


SELECT * FROM routputdata.(select folder from folders.csv LIMIT 1).“event.parquet”

Any help would be appreciated.

@cklar Are the sub-folders partition columns or completely different datasets? If completely different datasets then have to move it out as when you promote the parent folder, the sub-folder automatically becomes a partition column

You can promote a folder dynamically using REST API or checking "Automatically format files into physical datasets when users issue queries.“Automatically format files into physical datasets when users issue queries”