Hi,
Good Day! We are planning to implement query engine on top of our Azure data lake. And currently comparing Starburst presto with dremio
Have a question on the number of partitions does dremio support for a table. From the document, I understand the max number of partitions/splits that is supported in dremio is 60K. Please correct me if am wrong.
Also, what is the time does it take to refresh a table whenever a new partition is added.
For my current use case, we have different tables and a max of 100 partitions (job_id column) may be created for each table per day, and we have to maintain around 7 years of data. So in this case the number of partitions would be around 250K. So afraid if dremio supports this.
And how much time does dremio take to refresh a table when a new partitions is added when there are too many partitions (each partition may hold 10MB - 1GB of data).
Below is the high-level folder structure that we are currently planning to come up with.
output
|_package_1
| |_table1
| | |_job_id=1
| | | |_part_000.parquet
| | | |_part_001.parquet
| | |_job_id=2
| | | |_part_000.parquet
| | | |_part_001.parquet
| |
| |_table2
| | |_job_id=1
| | | |_part_000.parquet
| | | |_part_001.parquet
| | |_job_id=2
| | | |_part_000.parquet
| | | |_part_001.parquet
| |
| |_table3
| | |_job_id=1
| | | |_part_000.parquet
| | | |_part_001.parquet
| | |_job_id=2
| | | |_part_000.parquet
| | | |part_001.parquet
|
|
| package_2
| |_table1
| | |_job_id=3
| | | |_part_000.parquet
| | | |_part_001.parquet
| | |_job_id=4
| | | |_part_000.parquet
| | | |_part_001.parquet
| |
| |_table2
| | |_job_id=3
| | | |_part_000.parquet
| | | |_part_001.parquet
| | |_job_id=4
| | | |_part_000.parquet
| | | |_part_001.parquet
| |
| |_table3
| | |_job_id=3
| | | |_part_000.parquet
| | | |_part_001.parquet
| | |_job_id=4
| | | |_part_000.parquet
| | | |_part_001.parquet
Thanks