Hi,
I am trying to connect dremio to spark using python and create a dataframe, but this error pops up every time.
To run, I used a spark-submit with the jdbc jar file in the ‘–jars’ option.
Pretty sure there is some basic mistake here in the code. Here’s the code.
import pyodbc, pandas
from pyspark import SparkContext
sc = SparkContext()
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
host = ‘127.0.0.1’
port = 31010
uid = ‘username’
pwd = ‘password’
driver = ‘/opt/dremio-odbc/lib64/libdrillodbc_sb64.so’ #I couldnt get the DSN to work, but this works.
con = pyodbc.connect(“Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}”.format(driver, host, port, uid, pwd), autocommit=True)
df0 = sqlContext.read.format(“jdbc”).option(“url”, “jdbc:dremio:direct=127.0.0.1:31010”).option(“dbtable”, “”"’@username’.‘spacename.datasetname’""").option(“user”, “username”).option(“password”, “password”).load()
ERROR MESSAGE:
raceback (most recent call last):
File “dremio_test.py”, line 29, in
.option(“password”, “password”)
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py”, line 172, in load
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1257, in call
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py”, line 63, in deco
File “/usr/local/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.load.
: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:85)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:85)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:84)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:34)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
I tried similar code in scala too. But same error.
PLS HELP