Combine data from multiple datasets

Hello,

I was wondering if it is possible to combine two or more datasets vertically (assume they all have the same field names and same data types)

I.e. assume we have 2 tables holding book information. (isbn, title, author) One of these tables is in a mongodb and the other is in mysql (assume schema is the same)

I’d like to create a virtual table in dremio which would combine the data from these two tables and present it as a single table.
for example rows 1 to 100 are from mongoldb and 101 to 200 are from mysql

Is such thing possible in dremio?

I tried running multiple select statements but it seems dremio doesn’t like it.

Thanks,
Goris.

@goris

Can you not do the below?

select * from mongodb.mybook1
union all
select * from mysql.mybook2

Yes, it worked. Thanks :slight_smile:

Hi,

I am also trying to query two different data sources but I am getting following error message. Any idea why this happens?

I am using following dremio version.

@vincent_mayer It looks like you’re using a very old version. Try again with something newer and let us know if it happens again. Like v24 -Index of community-server/24.0.0-202302100528110223-3a169b7c/

@lenoyjacob thx for the hint. There is no makefile in the .tar file. Can you help how to install dremio using the .tar file?

Follow tarball instructions here for a linux machine: Dremio

Thx. I have tried that but I dont get it to work. I am using a MacBook. Is there a manual for a MacBook?

So I have managed to install the new dremio version on my mac. I could also perform a query that fetches data from a MongoDB and a PostgreSQL server.

When analyzing the query I stumbled upon the raw profile of the query.

The last step is a JDBC_SUB_SCAN. I thought Dremio is entirely built on Apache Arrow or did I misunderstand anything?

Glad for any explanation why there is a jdbc sub scan:)

@lenoyjacob @balaji.ramaswamy

@vincent_mayer

The initial scan has to be on the source which is Mongo and PG. You can create a reflection on top of the VDS so queries from dashboard can query Parquet instead of Mongo or PG