Couple of queries on Dremio

I need to know the following queries on Dremio

  1. Support for multiple dataset using regex
  2. Customization of field size for json files.
  3. Optimization and improve performance while fetching data from dataset
  4. More details on Dremio reflections
  5. If any change in db schema, how dremio will be in sync with that schema
  6. Dremio can infer the schema and is there a possiblity that we can provide schema file
  1. Support for multiple dataset using regex
  • Are you requesting examples?
  1. Customization of field size for json files.
  • Are you running into field max width issues?
  1. Optimization and improve performance while fetching data from dataset
  • Our cost based optimizer should automatically generate the right plan. If you have a query that is not performing as expected, please share the query profile
  1. More details on Dremio reflections
    Here is a white paper on Reflection Best Practices
  2. If any change in db schema, how dremio will be in sync with that schema
    On the source properties under the metadata tab, you should set the frequency at which Dremio should probe the source, see Caching Metadata
  3. Dremio can infer the schema and is there a possibility that we can provide schema file
    Currently we do not have a way to define schema, this is a feature request we will be coming out this year

@ balaji.ramaswamy
Thanks for addressing most of the queries on Dremio, I appreciate your responses. I would like to know on first 2 queries.

  1. I would like to query all the json file that matches the regex pattern(with same schema) in a given directory/directories. I explored and found that we can use * as the regex but its not working as expected and option was to make the folder as dataset which doesn’t support some of my use cases. Can you suggest the way to query tables using the regex.
    assuming this structure:
    folder/user1/{trip1.json, trips2.json, trips3.json……}
    folder/user2/{trip1.json, trips2.json, trips3.json……}
    folder/user3/{trip1.json, trips2.json, trips3.json……}

  2. When loading the json files, I encounter issues like “Attempting to read a too large value for field with name data. Size was 35486 but limit was 32000” due to which I am unable to load those json files.

@shrikanth

#2 is a Dremio limit and we currently support only upto 32K on field width
#1 For Regex, below is our docs
https://docs.dremio.com/sql-reference/sql-functions/string.html