We would like to start using option “Enable partition column inference” to have well formatted columns for partitioned parquet files but looks like this option doesn’t work like is described on documentation paged
No new well formatted columns visible only standard old dir0
Tested on latest community edition 23.1.0 with simple dummy partitioned parquet file.
I’ve created new Object Storages with option selected (tested on S3 and NAS object storage)
Could you check and advice?
@Pawel It requires you to forget metadata on the PDS, so you can either remove format/add format back or do the below
alter pds <pds-name> forget metadata;
alter pds <pds-name> refresh metadata;
@balaji.ramaswamy Yes from that reason I tested on new Object Storage “local” with option selected from creation time.
Additionally I also tested sequences:
select * from “local”.parquet_test
alter table “local”.parquet_test forget metadata
alter table “local”.parquet_test refresh metadata
But this also dosn’t help.
Could you check if this is working on any dummy parquet file - I’ve attached my test one here
parquet_test.zip (3.3 KB)
@Pawel Works as expected on your dataset, let me try on community edition as I tried on enterprise
@balaji.ramaswamy - Could you check on community edition and let us know - we will know that we don’t do anything incorrectly - thanks
@Pawel I do not see the feature working in Community edition. Let me find out if it is a bug or EE only feature
by the way, I tried on CE v24, with no luck.
Could you please tell us if this also works with the TABLE() construct ?
@fetanchaud This is only for file system sources like HDFS/S3/AZ/GCS
Sorry @balaji.ramaswamy , let me explain, I meant this kind of construct :
select * from table( "cbp"."chevoperiod"."rtjepman"."rtemanf" ( type=>'text', fieldDelimiter=>';', extractHeader=>True, trimHeader=>True ) )
Promotion has to be either done with UI or via REST Dremio or enable auto promotion, are you trying to promote via this command
Also this feature for partition inference is only for PARQUET, is your file format PARQUET?
@balaji.ramaswamy Could you let us know if this is bug or EE only feature
Although initially it did not work, I then had to enable unlimited splits in order for it to show up on CE. In 22.x, unlimited splits is on by default but the dist:/// on dremio.conf needs to be configured to write to something like S3/HDFS/AZ/GCS etc
@balaji.ramaswamy Many thanks
It also works with NAS storage for example I could use local path /opt/test:
@Pawel So works with NAS and not via S3?