Apache phoenix source

Pirion · December 18, 2017, 11:32pm

Have you plan to handle apache phoenix sources ?

kelly · December 19, 2017, 6:54pm

We do not currently have plans to handle Phoenix sources. Phoenix is oriented for OLTP-style queries. Our recommendation is to query HBase directly. Dremio can read from HBase and is optimized for scan-oriented, OLAP-style workloads.

We would love to hear more about your interest in querying Phoenix however. Perhaps there is something we have overlooked?

Kelly

Pirion · December 19, 2017, 8:50pm

I try to use hbase directly but, I have some problem to decode row_key and timestamp value.

my datas are store in apache phoenix.
For now all transformation are made with pig or phoenix.
but end-user want to analyse row data en dremio seems to be perfect for that, If I success to read phoenix data from dremio.

kelly · December 19, 2017, 9:03pm

Did you see this tutorial? It might help you work through it: https://www.dremio.com/tutorials/hbase-sql-dremio/

Pirion · December 19, 2017, 11:18pm

Yes of course,
but timestamp of phoenix are encoded with 12 bytes.
so I try CONVERT_FROM with TIMESTAMP_IMPALA,
after I use BYTE_SUBSTR to get 8 bytes timestamps and exclude nanosecond (0 in my case).
but no result

in dremio : gAABYFHysiAAAAAA
in phoenix: 2017-12-13 22:15:48.000
long representation: 1513199748

Pirion · December 20, 2017, 11:43am

I found a way to get my value but not in dremio

the binary representation of timestamp phoenix in binary is:
10000000 00000000 00000001 01100000 01010001 11110010 10110010 00100000 00000000 00000000 00000000 00000000

I delete the last 4 bytes in order to exclude nano:
10000000 00000000 00000001 01100000 01010001 11110010 10110010 00100000

if I change the first byte from 1 to 0 I have:
00000000 00000000 00000001 01100000 01010001 11110010 10110010 00100000
in decimal I have : 1513203348000
which is the millisecond representation of my timestamp

there is away to replace the first bit in dremio ?

Pirion · December 20, 2017, 6:32pm

I found a way to decode phoenix timestamp in dremio:
TO_TIMESTAMP(cast( (CONVERT_FROM(BYTE_SUBSTR(<TIMESTAMP COLUMN>,1,8), ‘BIGINT_BE’)+POW(2,63))/1000 as BIGINT) )

steven · December 20, 2017, 8:13pm

I’m glad you found a solution, even though it’s not very pretty. I’ve opened an internal ticket to this and other Phoenix encodings to our supported encodings.

We also plan to release an SDK at some point, which would allow you to write a UDF to handle this case.

Pirion · December 20, 2017, 8:39pm

Great !!!
i found a less ugly way:
TO_TIMESTAMP(cast( RSHIFT(LSHIFT(CONVERT_FROM(BYTE_SUBSTR(<TIMESTAMP COLUMN>,1,8), ‘BIGINT_BE’),1),1)/1000 as BIGINT))

kelly · December 20, 2017, 9:08pm

I think you’ll find that if you build a data reflection on this source you’ll have much, much better performance.

Here’s a tutorial on that: https://www.dremio.com/tutorials/getting-started-with-data-reflections/

Pirion · December 20, 2017, 10:35pm

I’m aware about data reflection thanks

Pirion · December 21, 2017, 8:54pm

for some table I have “SCHEMA_CHANGE ERROR”.
I think phoenix optimize the storage by not store some empty column.
Do you have a solution for that ?

steven · December 21, 2017, 11:57pm

I would expect that this problem would go away after a few attempts, as we should be learning the new schema each time we encounter a new column. If the problem is not going away, that suggests something is not working right. Could you post the query profile for the query where this is occurring?

Pirion · December 22, 2017, 8:38am

@steven, I have send you the query profil

Dominik · May 30, 2018, 7:25am

I really think it would make a ton of sense if Dremio “understood” Phoenix tables. Doesn’t mean it should necessarily use the Phoenix JDBC driver, because that one is tuned towards OLTP, understand that.

Topic		Replies	Views
Timestamp issue in Dremio	3	3159	April 30, 2019
VDS sending Full Table Scan to Oracle	3	908	July 2, 2021
Oracle TIMESTAMP Data Types	2	1278	February 11, 2019
Hive on phoenix	4	2678	January 8, 2020
Can I query the metadata of the original database from dremio	1	786	August 16, 2022

Apache phoenix source

Related topics