Getting Dremio Server connected to R (dbplyr, dplyr) on macOS using ODBC driver

Hello all! I felt the documentation was limited for getting Dremio Server connected to R using ODBC drivers for macOS, so I figured out how to get it working and created an R Markdown walkthrough for anyone else who is interested!

https://cojamalo.github.io/macOS_Dremio_R/guide.html

Open to any comments or feedback!

Hey Connor, thanks for posting this! I’m not an R expert, but this looks helpful.

Do you think there’s anything in this tutorial that we need to change or improve?

Hey Connor, thanks for posting this! I’m not an R expert, but this looks helpful.

Do you think there’s anything in this tutorial that we need to change or improve?

Hi Kelly!

Thank you for reaching out. Yes, I do believe both the tutorial you linked
and the Dremio documentation section for R that shares its code are
deficient and not complete. Here are my concerns/feedback:

  1. In the code you use to connect to Dremio, you use the “RODBC” package.
    From what I understand, the “odbc” package has greater support from RStudio
    is a superior method of connecting to databases. For instance, in the
    benchmarking section in the odbc package documentation, it says: "The
    odbc package
    is often much faster than the existing RODBC and DBI compatible
    RODBCDBI packages."
    Moreover, RStudio gives a tutorial using odbc as an example of how to
    connect to a database see
    https://rviews.rstudio.com/2017/05/17/databases-using-r/.

  2. Your tutorial also shows the connection using the "Dremio Connector"
    driver that is only available for Windows. R is cross-platform and, thus,
    if your tutorial only mentions the Dremio Connector software, it leaves
    Linux and Mac R users out to dry. This is another reason I went the
    odbc route. Having the odbc driver installed in addition to unixODBC
    installed on my Mac allowed me to connect to Dremio using the odbc package
    R.

  3. These two links are dead in your documentation for R: Install the Dremio
    Connector (ODBC)
    or Dremio JDBC Driver.

Feel free to scavenge or copy any content from my walkthrough
https://cojamalo.github.io/macOS_Dremio_R/guide.html. The important steps
are:

  1. Installing unixODBC
  2. Installing the Dremio ODBC driver
  3. Making sure the Dremio ODBC driver is listed in the unixODBC .ini files
  4. Install latest version of odbc package
  5. Establish connection with DBI
  6. Use your Dremio data with dplyr functions

Let me know if you have any questions!

Connor

1 Like

Super helpful food for thought. We will process and come back to you with any questions.

For other R users out there, love to hear from you as well.

Thanks again!

I personally use RJDBC to connect R with Dremio… Less configuration than ODBC

dremio_drv <- JDBC(“com.dremio.jdbc.Driver”,"/dremio/dremio-community-1.2.2/jars/jdbc-driver/dremio-jdbc-driver-1.2.2.jar", identifier.quote="`")

dremio_conn <- dbConnect(dremio_drv, “jdbc:dremio:zk=myserver:myport”, user= “???”, password = ‘???’ )

sqlStatement = “select * from ???”

result_df = RJDBC::dbGetQuery(dremio_conn, sqlStatement)

This is really great, thanks cojamalo! Given dplyr’s usefulness in keeping data manipulation on the server side, this is very useful for dealing with large datasets. Another useful tip, shown using RJDBC, below, but can equally be done with odbc, is setting up VDS programmatically - useful to rebuild a Dremio service from a code repo:

# JDBC connection
drv <- JDBC("com.dremio.jdbc.Driver",
            "/opt/dremio/jars/jdbc-driver/dremio-jdbc-driver-1.2.2-201710100154510864-d40e31c.jar",
            identifier.quote="'")
conn <- dbConnect(drv,
                  "jdbc:dremio:direct://<dremio host>:31010",
                  user = Sys.getenv("DREMIO_USERNAME"),
                  password = Sys.getenv("DREMIO_PASSWORD"))

# create virtual data sets programatically
sql_some_vds <- glue::glue_sql("
  CREATE VDS {`vds_path`}.some_vds AS (
    SELECT *
    FROM <source>.some_table
  )", 
  vds_path = "<some_space>",
  .con = conn)
vds_some_vds <- dbSendQuery(conn, sql_some_vds)

I am trying to connect via RJDBC to Dremio, but unfortunately it is not working out. How can I add the ssl and disableHostVerification parameter?

# JDBC connection
drv <- JDBC("com.dremio.jdbc.Driver",
        "dremio-jdbc-driver-3.1.4/dremio-jdbc-driver-3.1.4-201902210617130445-0d64760.jar",
        identifier.quote="'")

server <- "servername"

conn         <- dbConnect(drv, paste0("jdbc:dremio:direct=", server), 
                      user = "user_name",
                      password = "pwd",
                      ssl = "true",
                      disableHostVerification ="true")