Incremental refresh using multiple column

Hi,

Currently running CE 4.2.1.

We have lots of datasource both in Oracle and SQL Server that have combined primary key.
Some of thoses tables have up to 500 millions rows in them. Full refresh took more than 6 hours during testing. Not to mention space, io, network bandwith consumption.

We are searching for a way to create an incremental refresh using multiple column

image

As we can see here, the first two ID column are required for an incremental refresh to work.
But Dremio support only one :frowning:

How do we get around this limitation ?

Thanks.

Hi Alain,

I think that at the moment the best workaround would be to add an incremental column in your table, or a timestamp column that you would use.
You can keep your existing combined primary/unique key .

Or if one of the 2 columns you mention is incremental (that is to say new lines always get higher values), you could use it for the incremental update. But to ensure unicity of your dataset you should create a VDS / virtual dataset that ensures that there are stil unique values for your combined primary key : a Window query using ROW_NUMBER function could be useful (https://docs.dremio.com/sql-reference/sql-functions/analytic.html)

eg

select ID_TRX_ITM, ID_TRX_SERV, other_columns
from
(
select ID_TRX_ITM, ID_TRX_SERV, other_columns
, ROW_NUMBER() OVER(partition by ID_TRX_ITM, ID_TRX_SERV) as rownum
from __YOUR_TABLE)
where rownum = 1

Hi,

Yes, timestamp would be the way to go. Except that we get a few thousand table like this.

However, there is some applications where we cannot modify the table. They are provided by a third party. For those we were thinking of a view to concat the required column into a unique one.

Anyway, we are still investigating the best way to do this.

Thanks.