Wildcard queries on other sources besides Elastic

allan.sene · March 1, 2019, 6:11pm

Guys,

I would like to perform some queries using wildcard on Parquet Files on S3, just like BigQuery supports.

Are you planning to release some feature like that soon?

ben · March 11, 2019, 3:54pm

Thanks for the suggestion @allan.sene. What is your particular use case for this feature?

allan.sene · March 11, 2019, 8:47pm

I have lots of process that write Parquet files on S3. Those files are written inside partitions, and sometimes, we update this data.

We have queries/views that reads lots of data from lots of tables that we use in dashboards and reports. To easy this views, we create (automatically, via scripting) VDSs that represents this filtering by a date field. This way, we can refresh only the reflection for this portion of data, when we detect that some parquet file is updated. You can see this VDSs like Support Anchor Datasets.

So we have a VDS that UNION ALL this VDS to return the specific data to those reports.

Would be great to have something like select * from my_partitioned_vds*, so we don’t have to create this UNION ALL VDS all the time.

Topic		Replies	Views
Wildcard on S3 Queries Dremio University	2	1726	April 15, 2020
Issues with a view (VDS) built using queries on an auto expiring/refreshing PDS	1	960	September 29, 2022
Create vds from a pds (s3 data lake)	1	1801	January 13, 2022
Reflection on VD using subset of PD data	2	1051	August 8, 2019
Can't query on renamed column	7	2691	May 4, 2018

Wildcard queries on other sources besides Elastic

Related topics