Creating BigQuery source using ARP framework, failure in reading records

Farhan · May 5, 2020, 6:53pm

I am trying to create a Google BigQuery source in Dremio using CData’s BigQuery jdbc driver.

So far everything seems to work,

The source gets created successfully
I can see my datasets and tables from BigQuery in dremio (with their field names, etc.)

But just when I open the dataset to see the records, I encounter this error :

com.dremio.common.exceptions.UserException: IllegalArgumentException
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:802)
at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:140)
at com.dremio.sabot.driver.SmartOp$SmartProducer.setup(SmartOp.java:567)
at com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer(Pipe.java:79)
at com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer(Pipe.java:63)
at com.dremio.sabot.driver.SmartOp$SmartProducer.accept(SmartOp.java:533)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.Pipeline.setup(Pipeline.java:68)
at com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution(FragmentExecutor.java:388)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:270)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1200(FragmentExecutor.java:92)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:674)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:104)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:226)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:156)
Caused by: java.lang.IllegalArgumentException: null
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128)
	at com.dremio.exec.store.jdbc.JdbcRecordReader.checkSchemaConsistency(JdbcRecordReader.java:333)
	at com.dremio.exec.store.jdbc.JdbcRecordReader.setup(JdbcRecordReader.java:213)
	at com.dremio.exec.store.CoercionReader.setup(CoercionReader.java:125)
	at com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser(ScanOperator.java:222)
	at com.dremio.sabot.op.scan.ScanOperator.setupReader(ScanOperator.java:195)
	at com.dremio.sabot.op.scan.ScanOperator.setup(ScanOperator.java:181)
	at com.dremio.sabot.driver.SmartOp$SmartProducer.setup(SmartOp.java:563)
	... 17 common frames omitted

These are my data type mappings :

data_types:
  mappings:
    # Manually Configured Data Types Mappings Section.
    - source:
        name: "INTEGER"
      dremio:
        name: "BIGINT"
      required_cast_arguments: "none"
    - source:
        name: "NUMERIC"
      dremio:
        name: "DOUBLE"
      required_cast_arguments: "precision_scale"
    - source:
        name: "DATE"
      dremio:
        name: "DATE"
      required_cast_arguments: "none"
    - source:
        name: "STRING"
      dremio:
        name: "VARCHAR"
      required_cast_arguments: "none"

Mappings for the driver that I am using:
http://cdn.cdata.com/help/DBE/jdbc/pg_datatypemapping.htm

What am I missing here ?, please advice.

Thanks in advance,
Farhan.

panomike · May 5, 2020, 7:14pm

Hey Farhan,

We have a BigQuery connector that we’ve got partially working - we’re going to be publishing it on GitHub later today.

It utilizes the official Simba JDBC connector published by Google.

I’ll update this post with a link once the repo is live.

-Mike

panomike · May 6, 2020, 3:18am

Here is a link to our BigQuery connector code. It’s still very much a work in progress, but we can currently issues queries to BigQuery and get the results in Dremio.

Most of the work left is in the ARP setup, and configuring all the pushdowns. Any and all feedback is appreciated.

Farhan · May 7, 2020, 11:06am

Thanks @panomike,

This should be helpful.

But, I am trying to make this work for our own internal use cases and I am still stuck here.

If I am able to solve this, it will open for us many more similar use cases, where we will create additional datasources using ARP for not just BigQuery but for other technologies too.

Since you have worked on a similar scenario, do you think this could be a problem due to my incorrect data type mappings ?

Farhan · June 9, 2020, 9:42am

Hi @balaji.ramaswamy,

We are still getting this error. Any idea where am I going wrong here ?

jduong · June 9, 2020, 4:45pm

Hi @Farhan,

It’s possible that the CData JDBC driver returns different typenames when you execute queries than it would when you request catalog information.

In your ARP file, can you try adding data type entries where the source name is the CData type name in addition to your already existing set of BigQuery type names?

What I’m thinking is your query is returning only types that failed to map due to type name inconsistencies.

Farhan · June 11, 2020, 6:36pm

Thanks @jduong.

I tried setting the additional mapping where my source mapping is String (CData) and varchar(dremio), didn’t work. I am querying a table in big-query with just two columns, both of type string.

These are my data type mappings:

data_types:
  mappings:
    # Manually Configured Data Types Mappings Section.
    #------------Numeric types--------------#
    - source:
        name: "NUMERIC"
        max_precision: 38
        max_scale: 37
      required_cast_args: "precision_scale"
      dremio:
        name: "DECIMAL"
    - source:
        name: "INT64"
      dremio:
        name: "bigint"
    - source:
        name: "BIGINT"
      dremio:
        name: "bigint"
    - source:
        name: "FLOAT64"
      dremio:
        name: "double"
    - source:
        name: "DOUBLE"
      dremio:
        name: "double"

    #------------String types--------------#
    - source:
        name: "VARCHAR"
        max_precision: 16777216
        literal_length_limit: 16777216
      required_cast_args: "precision"
      dremio:
        name: "varchar"
    - source:
        name: "STRING"
      dremio:
        name: "varchar"

I am just setting Numeric and String types for now. Will this work ? Or it requires all data type mappings to be set beforehand ?

Do you think my data-type mappings are incorrect ? These are my references.
http://cdn.cdata.com/help/DBE/jdbc/pg_datatypemapping.htm
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#geography_type
https://docs.dremio.com/sql-reference/data-types.html

I took the idea from @panomike’s bigquery-arp.yml which uses Simba’s JDBC driver.

jduong · June 11, 2020, 7:17pm

Hi @Farhan,

I don’t have the CData driver so I’m not sure what the problem is. From examining the stack trace I believe the driver is returning different typenames from the JDBC ResultSetMetaData object than what’s showing up in your ARP file (and also being different from the type names reported by DatabaseMetaData.getColumns()).

Is there a reason you must use the CData driver? Have you tried @panomike’s ARP file using Google’s official JDBC driver: https://cloud.google.com/bigquery/providers/simba-drivers ?

Praveen_Badam · November 25, 2021, 1:28am

Hi @panomike
I am trying this for Dremio 17X (17.0.0-202107060524010627-31b5222b) and getting the compilation error:

org.apache.calcite.sql.SqlAbstractStringLiteral is not public in org.apache.calcite.sql; cannot be accessed from outside package

Can you please assist on how to fix this? Even I tried with below dependency but of no use:

org.apache.calcite calcite-core 1.16.0

Topic		Replies	Views
Error when configuring Salesforce data source	0	1404	September 25, 2019
ERROR：The JDBC storage plugin failed while trying setup the SQL query	2	1477	October 15, 2019
I'm writing an ARP plugin but now I'm stuck	3	1175	February 19, 2020
Re-Opening: Dremio 19.0.0 Regression - "java.lang.UnsupportedOperationException) Unknown type to copy."	1	881	December 8, 2021
Dremio ARP -- drill connector issue	2	1149	September 9, 2020

Creating BigQuery source using ARP framework, failure in reading records

Related topics