The UI preview is very slow compared to the run

Correct me if I’m wrong, but if I click on a physical dataset, by default a UI preview is performed. Regarding the implementation of Dremio, I seem to understand that a SELECT * FROM TABLE is made anyway. If I have a table with 1000000 rows, the preview sends the JDBC driver a SELECT * FROM TABLE, then the driver actually prepares 1000000 of rows, but Dremio decides to take only 10000 of them.Why isn’t there a SELECT * FROM TABLE LIMIT 10000 from the preview? If I decide to perform a SELECT * FROM TABLE LIMIT 10000, obviously the JDBC driver prepares only 10000 lines and therefore it will be faster.

@andreat, can you attach a profile for the Preview job?

@ben we are seeing the same thing with ARP based plugins (such as the Snowflake one). My understanding from talking with the developer is that this is a known issue in newer 4.x versions of Dremio?

@patricker,
If you look that profile, you should see a JDBC scan and then further down stream a LIMIT and PROJECT operation for Preview jobs.

Once ~10k records are processed by the LIMIT then the JDBC scan should stop.

Sometimes, the (single-threaded) JDBC scans take a long time to return any records and the Preview hangs. This appears to mainly be an issue of smaller clusters.

@ben So you are saying that the query runs as a SELECT * FROM abc, and after reading 10k rows it cancels the remainder of the query?

That’s correct. So while there may be a 1 Million records in the RDBMS table, Dremio should not scan all of them during Preview.

@ben Working with the Snowflake Plugin, this is leading to realllllly slow preview times. Snowflake is copying the whole table into a set of result files, instead of just the subset/limit that we need.

Any update on when this bug will be fixed in Dremio?

Can you attach the profile for the Preview job? I’d like to see if this is the same behavior.

@ben let me know what part of the profile to copy/paste here/what I should be looking for, I don’t upload full profiles.

If you look at the JDBC scan, how many records did it scan?
Do you see the LIMIT and PROJECT above these? How many records does it say these scanned?

I did some tests with Snowflake Cloud (trial), the ARP based plugin and Dremio version 4.2.2, reporting the results and the complete job profiles:

Table name: SUPPLIER Schema: TPCH_SF100 Rows: 1M Size: 65.5MB Query: UI (preview) Duration: 5s
Job profile: 4c2c46fa-803d-4c9c-b9a9-bc88ec576627.zip (8,4 KB)

Table name: SUPPLIER Schema: TPCH_SF100 Rows: 1M Size: 65.5MB Query: UI (run) Duration: 11s
Job profile: 3fe6109a-5c61-42cb-974f-58b089a01cd5.zip (9,9 KB)

Table name: PART Schema: TPCH_SF10 Rows: 2M Size: 50.4MB Query: UI (preview) Duration: 6s
Job profile: 4cf91651-afde-4c79-afc1-86bfefdc687c.zip (9,0 KB)

Table name: PART Schema: TPCH_SF10 Rows: 2M Size: 50.4MB Query: UI (run) Duration: 8s
Job profile: 9a7c9fe7-abe1-4432-871b-7d80158440cd.zip (10,1 KB)

Table name: SUPPLIER Schema: TPCH_SF1000 Rows: 10M Size: 658.3MB Query: UI (preview) Duration: 22s
Job profile: 21a6b972-9c15-437e-b91c-1a23ef8a7d3a.zip (8,7 KB)

Table name: SUPPLIER Schema: TPCH_SF1000 Rows: 10M Size: 658.3MB Query: UI (run) Duration: 13s
Job profile: 130ae7c0-492d-43c5-897a-037f69e770fe.zip (9,8 KB)

Table name: PART Schema: TPCH_SF100 Rows: 20M Size: 509.0MB Query: UI (preview) Duration: 35s
Job profile: fdb9abb6-cbe8-4df5-bd45-817ff7088efe.zip (9,0 KB)

Table name: PART Schema: TPCH_SF100 Rows: 20M Size: 509.0MB Query: UI (run) Duration: 15s
Job profile: b83bb4b3-706f-43f1-bb98-99672874cfd6.zip (10,1 KB)

I created a temporary work around for this issue, as we really only minded for previews. The code below, when added to the end of the QueryContext constructor, turns the setting off for UI Preview queries:

    // If this is a UI Preview, manually remove enable_relational_planning
    if(workloadType.getNumber()==2){
      this.queryOptions.setOption(OptionValue.createBoolean(OptionValue.OptionType.QUERY, "planner.enable_relational_planning", false));
    }