Is there any document that explains the implementation principle of dremio Formatting Data to a Table? How is the relationship between files and formatted tables managed in dremio? What steps are performed when we query a formatted table?
When a folder is formatted as PDS (table), Dremio as part of the metadata refresh, learns everyfile that is associated to this table. When you query a formatted table, once planning is complete, start of execution first prunes manifest files and then prunes datafiles. This makes sure only limited files are read and then if there are additional non-partition filters those are treated as predicate pushdowns and only relevant pages read