No suitable driver error

Hi,
I am trying to connect dremio to spark using python and create a dataframe, but this error pops up every time.
To run, I used a spark-submit with the jdbc jar file in the ‘–jars’ option.
Pretty sure there is some basic mistake here in the code. Here’s the code.

import pyodbc, pandas
from pyspark import SparkContext
sc = SparkContext()
from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

host = ‘127.0.0.1’
port = 31010
uid = ‘username’
pwd = ‘password’
driver = ‘/opt/dremio-odbc/lib64/libdrillodbc_sb64.so’ #I couldnt get the DSN to work, but this works.

con = pyodbc.connect(“Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}”.format(driver, host, port, uid, pwd), autocommit=True)

df0 = sqlContext.read.format(“jdbc”).option(“url”, “jdbc:dremio:direct=127.0.0.1:31010”).option(“dbtable”, “”"’@username’.‘spacename.datasetname’""").option(“user”, “username”).option(“password”, “password”).load()

ERROR MESSAGE:
raceback (most recent call last):
File “dremio_test.py”, line 29, in
.option(“password”, “password”)
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py”, line 172, in load
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py”, line 63, in deco
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.load.
: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:85)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:85)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:84)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:34)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

I tried similar code in scala too. But same error.

PLS HELP :slightly_smiling_face:

Seems like the error is coming from the df0 declaration. From a glance, there is no need to make a JDBC call when you are already setting up an ODBC connection. Maybe this documentation will help you with a sample connection via Python - https://docs.dremio.com/client-applications/python.html

Thanks for the quick response.

The program in the above link reads dataset using pandas. But what I’m trying to do is do the same using pyspark. Is there any possible way to do it with pyspark…???

But anyway I tried using pandas also, which gives another error:

Traceback (most recent call last):
File “/home/arjun/anaconda3/envs/test-env/lib/python3.6/site-packages/pandas/io/sql.py”, line 1378, in execute
cur.execute(*args)
pyodbc.Error: (‘HY000’, ‘[HY000] [Dremio][Connector] (1040) Dremio failed to execute the query: SELECT * FROM product_budget.budget18\n[30034]Query execution error. Details:[ \nSYSTEM ERROR: CompileException: Line 64, Column 30: No applicable constructor/method found for actual parameters “org.apache.arrow.vector.holders.UnionHolder”; candidates are: "public void com.dremio.exec.vector.complex.fn.JsonWriter.write(org.apache.arrow.vector.complex.reader.FieldReader) t…[see log] (1040) (SQLExecDirectW)’)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/arjun/PycharmProjects/test_stuff/dremio_test.py”, line 22, in
dataframe = pandas.read_sql(sql, con)
File “/home/arjun/anaconda3/envs/test-env/lib/python3.6/site-packages/pandas/io/sql.py”, line 381, in read_sql
chunksize=chunksize)
File “/home/arjun/anaconda3/envs/test-env/lib/python3.6/site-packages/pandas/io/sql.py”, line 1413, in read_query
cursor = self.execute(*args)
File “/home/arjun/anaconda3/envs/test-env/lib/python3.6/site-packages/pandas/io/sql.py”, line 1390, in execute
raise_with_traceback(ex)
File “/home/arjun/anaconda3/envs/test-env/lib/python3.6/site-packages/pandas/compat/init.py”, line 403, in raise_with_traceback
raise exc.with_traceback(traceback)
File “/home/arjun/anaconda3/envs/test-env/lib/python3.6/site-packages/pandas/io/sql.py”, line 1378, in execute
cur.execute(*args)
pandas.io.sql.DatabaseError: Execution failed on sql ‘SELECT * FROM product_budget.budget18’: (‘HY000’, ‘[HY000] [Dremio][Connector] (1040) Dremio failed to execute the query: SELECT * FROM spacename.datasetname\n[30034]Query execution error. Details:[ \nSYSTEM ERROR: CompileException: Line 64, Column 30: No applicable constructor/method found for actual parameters “org.apache.arrow.vector.holders.UnionHolder”; candidates are: "public void com.dremio.exec.vector.complex.fn.JsonWriter.write(org.apache.arrow.vector.complex.reader.FieldReader) t…[see log] (1040) (SQLExecDirectW)’)