Hi
I have a CTAS statement
This create table as select (select clause consists of complex SQL with about 50 sql server tables joined and resulting into 200+ columns) This runs in 2 min but Dremio generates 100 Parquet files (Size of partition folder is 50 MB
(
When take that select statement and cache the result in SQL Table and use that single table in CTAS Dremio generates only 1 file (Size of partition folder is half 25 MB)
My question is
- Is there any performance difference when querying dremio data in 2 scenarios?
- Is there any way to make Dremio generate only 1 parquet file when CTAs is run with full query script