Dremio SELECT * on joins

Hi, Guys. How’s it going?
I’m trying to query two distinct MySQL databases by performing a join between two tables. My query looks like this:

SELECT e.product
    FROM Database1.schema1.table1 e 
        INNER JOIN Database2.schema2.table2 p ON p.id = e.product
LIMIT 100

However, when I check the PROCESSLIST for both databases, I’ve realized that Dremio is executing a query like this:

SELECT *
FROM table2 
LIMIT 2000
OFFSET 0

This is a query for a sample result on Dremio, but I think that this wouldn’t go well when I deploy it in production, due to the fact that both tables are huge and, at the time that I’ll execute this query, it will run a SELECT * on my whole table, making my query with poor performance and having impact on my database.

Is there a way to optimize this query to narrow my results? Maybe by performing the join on a subquery?

If there’s a predicate you can use in either or both mysql databases then by all means use that. :slight_smile: The way your query is written a table scan makes sense.

You can accomplish this with subqueries, or you can stack VDS on top of one another to get the same effect.

We don’t currently perform broadcast joins, which may be what you were hoping.

You can also look at creating data reflections on your tables as that will be much, much more efficient than table scans and it will prevent your queries from hitting the underlying database. Your dbas may appreciate that. :slight_smile:

1 Like

Thank you so much, @kelly! I created VDS for my queries and accelerated them with reflections. Ran smoothly :slight_smile: