Apache phoenix source

Have you plan to handle apache phoenix sources ?

We do not currently have plans to handle Phoenix sources. Phoenix is oriented for OLTP-style queries. Our recommendation is to query HBase directly. Dremio can read from HBase and is optimized for scan-oriented, OLAP-style workloads.

We would love to hear more about your interest in querying Phoenix however. Perhaps there is something we have overlooked?

Kelly

I try to use hbase directly but, I have some problem to decode row_key and timestamp value.

my datas are store in apache phoenix.
For now all transformation are made with pig or phoenix.
but end-user want to analyse row data en dremio seems to be perfect for that, If I success to read phoenix data from dremio.

Did you see this tutorial? It might help you work through it: https://www.dremio.com/tutorials/hbase-sql-dremio/

Yes of course,
but timestamp of phoenix are encoded with 12 bytes.
so I try CONVERT_FROM with TIMESTAMP_IMPALA,
after I use BYTE_SUBSTR to get 8 bytes timestamps and exclude nanosecond (0 in my case).
but no result

in dremio : gAABYFHysiAAAAAA
in phoenix: 2017-12-13 22:15:48.000
long representation: 1513199748

I found a way to get my value but not in dremio

the binary representation of timestamp phoenix in binary is:
10000000 00000000 00000001 01100000 01010001 11110010 10110010 00100000 00000000 00000000 00000000 00000000

I delete the last 4 bytes in order to exclude nano:
10000000 00000000 00000001 01100000 01010001 11110010 10110010 00100000

if I change the first byte from 1 to 0 I have:
00000000 00000000 00000001 01100000 01010001 11110010 10110010 00100000
in decimal I have : 1513203348000
which is the millisecond representation of my timestamp

there is away to replace the first bit in dremio ?

I found a way to decode phoenix timestamp in dremio:
TO_TIMESTAMP(cast( (CONVERT_FROM(BYTE_SUBSTR(<TIMESTAMP COLUMN>,1,8), ‘BIGINT_BE’)+POW(2,63))/1000 as BIGINT) )

I’m glad you found a solution, even though it’s not very pretty. I’ve opened an internal ticket to this and other Phoenix encodings to our supported encodings.

We also plan to release an SDK at some point, which would allow you to write a UDF to handle this case.

Great !!!
i found a less ugly way:
TO_TIMESTAMP(cast( RSHIFT(LSHIFT(CONVERT_FROM(BYTE_SUBSTR(<TIMESTAMP COLUMN>,1,8), ‘BIGINT_BE’),1),1)/1000 as BIGINT))

I think you’ll find that if you build a data reflection on this source you’ll have much, much better performance.

Here’s a tutorial on that: https://www.dremio.com/tutorials/getting-started-with-data-reflections/

I’m aware about data reflection thanks

for some table I have “SCHEMA_CHANGE ERROR”.
I think phoenix optimize the storage by not store some empty column.
Do you have a solution for that ?

I would expect that this problem would go away after a few attempts, as we should be learning the new schema each time we encounter a new column. If the problem is not going away, that suggests something is not working right. Could you post the query profile for the query where this is occurring?

@steven, I have send you the query profil

I really think it would make a ton of sense if Dremio “understood” Phoenix tables. Doesn’t mean it should necessarily use the Phoenix JDBC driver, because that one is tuned towards OLTP, understand that.