Creating BigQuery source using ARP framework, failure in reading records

I am trying to create a Google BigQuery source in Dremio using CData’s BigQuery jdbc driver.

So far everything seems to work,

  • The source gets created successfully
  • I can see my datasets and tables from BigQuery in dremio (with their field names, etc.)

But just when I open the dataset to see the records, I encounter this error :

com.dremio.common.exceptions.UserException: IllegalArgumentException
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:802)
at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:140)
at com.dremio.sabot.driver.SmartOp$SmartProducer.setup(SmartOp.java:567)
at com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer(Pipe.java:79)
at com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer(Pipe.java:63)
at com.dremio.sabot.driver.SmartOp$SmartProducer.accept(SmartOp.java:533)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.StraightPipe.setup(StraightPipe.java:102)
at com.dremio.sabot.driver.Pipeline.setup(Pipeline.java:68)
at com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution(FragmentExecutor.java:388)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:270)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1200(FragmentExecutor.java:92)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:674)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:104)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:226)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:156)
Caused by: java.lang.IllegalArgumentException: null
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:128)
	at com.dremio.exec.store.jdbc.JdbcRecordReader.checkSchemaConsistency(JdbcRecordReader.java:333)
	at com.dremio.exec.store.jdbc.JdbcRecordReader.setup(JdbcRecordReader.java:213)
	at com.dremio.exec.store.CoercionReader.setup(CoercionReader.java:125)
	at com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser(ScanOperator.java:222)
	at com.dremio.sabot.op.scan.ScanOperator.setupReader(ScanOperator.java:195)
	at com.dremio.sabot.op.scan.ScanOperator.setup(ScanOperator.java:181)
	at com.dremio.sabot.driver.SmartOp$SmartProducer.setup(SmartOp.java:563)
	... 17 common frames omitted

These are my data type mappings :

data_types:
  mappings:
    # Manually Configured Data Types Mappings Section.
    - source:
        name: "INTEGER"
      dremio:
        name: "BIGINT"
      required_cast_arguments: "none"
    - source:
        name: "NUMERIC"
      dremio:
        name: "DOUBLE"
      required_cast_arguments: "precision_scale"
    - source:
        name: "DATE"
      dremio:
        name: "DATE"
      required_cast_arguments: "none"
    - source:
        name: "STRING"
      dremio:
        name: "VARCHAR"
      required_cast_arguments: "none"

Mappings for the driver that I am using:
http://cdn.cdata.com/help/DBE/jdbc/pg_datatypemapping.htm

What am I missing here ?, please advice.

Thanks in advance,
Farhan.

Hey Farhan,

We have a BigQuery connector that we’ve got partially working - we’re going to be publishing it on GitHub later today.

It utilizes the official Simba JDBC connector published by Google.

I’ll update this post with a link once the repo is live.

-Mike

Here is a link to our BigQuery connector code. It’s still very much a work in progress, but we can currently issues queries to BigQuery and get the results in Dremio.

Most of the work left is in the ARP setup, and configuring all the pushdowns. Any and all feedback is appreciated.

Thanks @panomike,

This should be helpful.

But, I am trying to make this work for our own internal use cases and I am still stuck here.

If I am able to solve this, it will open for us many more similar use cases, where we will create additional datasources using ARP for not just BigQuery but for other technologies too.

Since you have worked on a similar scenario, do you think this could be a problem due to my incorrect data type mappings ?

Hi @balaji.ramaswamy,

We are still getting this error. Any idea where am I going wrong here ?

Hi @Farhan,

It’s possible that the CData JDBC driver returns different typenames when you execute queries than it would when you request catalog information.

In your ARP file, can you try adding data type entries where the source name is the CData type name in addition to your already existing set of BigQuery type names?

What I’m thinking is your query is returning only types that failed to map due to type name inconsistencies.

Thanks @jduong.

I tried setting the additional mapping where my source mapping is String (CData) and varchar(dremio), didn’t work. I am querying a table in big-query with just two columns, both of type string.

These are my data type mappings:

data_types:
  mappings:
    # Manually Configured Data Types Mappings Section.
    #------------Numeric types--------------#
    - source:
        name: "NUMERIC"
        max_precision: 38
        max_scale: 37
      required_cast_args: "precision_scale"
      dremio:
        name: "DECIMAL"
    - source:
        name: "INT64"
      dremio:
        name: "bigint"
    - source:
        name: "BIGINT"
      dremio:
        name: "bigint"
    - source:
        name: "FLOAT64"
      dremio:
        name: "double"
    - source:
        name: "DOUBLE"
      dremio:
        name: "double"

    #------------String types--------------#
    - source:
        name: "VARCHAR"
        max_precision: 16777216
        literal_length_limit: 16777216
      required_cast_args: "precision"
      dremio:
        name: "varchar"
    - source:
        name: "STRING"
      dremio:
        name: "varchar"

I am just setting Numeric and String types for now. Will this work ? Or it requires all data type mappings to be set beforehand ?

Do you think my data-type mappings are incorrect ? These are my references.
http://cdn.cdata.com/help/DBE/jdbc/pg_datatypemapping.htm
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#geography_type
https://docs.dremio.com/sql-reference/data-types.html

I took the idea from @panomike’s bigquery-arp.yml which uses Simba’s JDBC driver.

Hi @Farhan,

I don’t have the CData driver so I’m not sure what the problem is. From examining the stack trace I believe the driver is returning different typenames from the JDBC ResultSetMetaData object than what’s showing up in your ARP file (and also being different from the type names reported by DatabaseMetaData.getColumns()).

Is there a reason you must use the CData driver? Have you tried @panomike’s ARP file using Google’s official JDBC driver: https://cloud.google.com/bigquery/providers/simba-drivers ?

Hi @panomike
I am trying this for Dremio 17X (17.0.0-202107060524010627-31b5222b) and getting the compilation error:

org.apache.calcite.sql.SqlAbstractStringLiteral is not public in org.apache.calcite.sql; cannot be accessed from outside package

Can you please assist on how to fix this? Even I tried with below dependency but of no use:

org.apache.calcite calcite-core 1.16.0