Hi
I have a CTAS statement
This create table as select (select clause consists of complex SQL with about 50 sql server tables joined and resulting into 200+ columns) This runs in 2 min but Dremio generates 100 Parquet files (Size of partition folder is 50 MB(
When take that select statement and cache the result in SQL Table and use that single table in CTAS Dremio generates only 1 file (Size of partition folder is half 25 MB)
My question is
- Is there any performance difference when querying dremio data in 2 scenarios?
- Is there any way to make Dremio generate only 1 parquet file when CTAs is run with full query script