Hi, all. First time using the new Arrow Flight driver to connect to Dremio instead of ODBC and I’m having some problems. Hoping someone can help me understand what’s going on.
I was able to download and run the example script, no problem:
$ ./example.py -host data.dremio.cloud -port 443 -tls -user ‘$token’ -pat $(cat ~/Documents/pwvault/dremio_pat.txt) -query ‘SELECT * FROM “Jeremy Experiments”.“UJJ-QuestionAnswer_decoding”’
[INFO] Enabling TLS connection
[INFO] Trusted certificates provided
[INFO] Authentication skipped until first request
[INFO] Query: SELECT * FROM “Jeremy Experiments”.“UJJ-QuestionAnswer_decoding”
[INFO] GetSchema was successful
[INFO] Schema: <pyarrow._flight.SchemaResult object at 0x7f98cc5beb90>
[INFO] GetFlightInfo was successful
[INFO] Ticket: <Ticket b’\n@SELECT * FROM “Jeremy Experiments”.“UJJ-QuestionAnswer_decoding”\x12Z\nX\n@SELECT * FROM “Jeremy Experiments”.“UJJ-QuestionAnswer_decoding”\x10\x04\x1a\x12\t\xca\xef\xdd\xa9\x94\xea0\x1d\x11\x00hz\xdb/\xb1\x1f\x95’>
[INFO] Reading query results from Dremio
QuestionId Answer Decoded_Answer LanguageId … Sequence AnsPrecode ShortAnswer AnswerDesc
[data redacted ]
[247892 rows x 9 columns]
But I’m not able to import this script and use these functions, either in a Jupyter Notebook (my preference) or from an interactive Python session. The Jupyter Notebook kernel crashes when it tries to run the connect function, with no output or error messages. From an interactive Python session, running the same code, I get:
$ python
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.import sys
t certif>>> import certifi
ownloaded from https://github.com/dremio-hub/arrow-flight-client-examples/blob/main/python/example.py
import example>>>Downloaded from https://github.com/dremio-hub/arrow-flight-client-examples/blob/main/python/example.py
import example
Confirmed these parameters by printing some debugging output from example.py
hostname = ‘data.dremio.cloud’
port = 443
username = ‘$token’
password = ‘dremio123’ # default; this setup will use PAT instead
th open(>>> with open(‘/home/jeremy/Documents/pwvault/dremio_pat.txt’,‘r’) as pat_file:
… pat_or_auth_token = pat_file.read()
ls = Tr…
ue
trusted_cert>>> tls = True
trusted_certificates = certifi.where() # retrieves a site certificate, cacert.pem
_server_verification =>>> disable_server_verification = False
engine = None
session_properties = None
query = ‘SELECT * FROM “Jeremy Experiments”.“UJJ-QuestionAnswer_decoding”’Connect to Dremio Arrow Flight server endpoint.
xample.c>>> example.connect_to_dremio_flight_server_endpoint(hostname, port, username, password,
… query, tls, trusted_certificates,
… disable_server_verification, pat_or_auth_token,
… engine, session_properties)
[INFO] Enabling TLS connection
[INFO] Trusted certificates provided
[INFO] Authentication skipped until first request
[INFO] Query: SELECT * FROM “Jeremy Experiments”.“UJJ-QuestionAnswer_decoding”
E0713 13:54:39.115737425 4281 call.cc:783] validate_metadata: {“created”:“@1657738479.115716429”,“description”:“Illegal header value”,“file”:“/opt/vcpkg/buildtrees/grpc/src/85a295989c-6cf7bf442d.clean/src/core/lib/surface/validate_metadata.cc”,“file_line”:55,“offset”:71,“raw_bytes”:“42 65 61 72 65 72 20 57 64 69 7a 68 48 58 70 51 63 69 4f 68 47 61 75 34 6e 71 35 6d 49 4b 70 65 69 7a 4f 39 6a 65 46 35 65 34 6e 67 69 75 32 45 36 4e 52 57 62 53 32 4d 39 69 34 31 51 55 6e 45 70 6f 66 4e 41 3d 3d 0a ‘Bearer WdizhHXpQciOhGau4nq5mIKpeizO9jeF5e4ngiu2E6NRWbS2M9i41QUnEpofNA==.’\u0000”}
E0713 13:54:39.115793665 4281 call_op_set.h:980] assertion failed: false
Aborted
Does it not like how I’m defining the variables directly in the interactive version vs. the example script, which is parsing values from the command line? Otherwise I’m lost…