Over 100 Parquet files generated for Complex CTAS involving several table joins

Hi
I have a CTAS statement
This create table as select (select clause consists of complex SQL with about 50 sql server tables joined and resulting into 200+ columns) This runs in 2 min but Dremio generates 100 Parquet files (Size of partition folder is 50 MB(

When take that select statement and cache the result in SQL Table and use that single table in CTAS Dremio generates only 1 file (Size of partition folder is half 25 MB)

My question is

  1. Is there any performance difference when querying dremio data in 2 scenarios?
  2. Is there any way to make Dremio generate only 1 parquet file when CTAs is run with full query script

@DremioStudent Are you able to provide the job profile of the CTAS the generates 100 Parquet files and the one that generates 1, that will tell us why

For a single CTAS, the number of files will be primarily driven by the partition spec, sort spec and parquet file options. See Parquet File Best Practices | Dremio Documentation.

To answer your specific question, we would need to see the profiles…