Location of PDS when connecting to S3

I have a novice question. I am connecting to my S3 bucket and reading a parquet file from Dremio. When I open the parquet and perform Format folder and Save, is my Dataset being copied to a Dremio location ?

From there-in I create virtual datasets and I see in graph as:

Source (s3) --> Parent (PDS) --> VDS

Is the Physical Dataset pointing to my source S3 or a Dremio location?


The only thing Dremio stores is the format options (in this case parquet) and some metadata we collect (the footer in this case) for query planning purposes. Data is not copied into Dremio, all quires will run against the file in S3.

You can read more about physical and virtual datasets here but the basic concept is PDS is the data stored on a source.

1 Like

Thank you, that was my understanding. The Graph showing S3 and a parent as separate entities, I think confused me.