Is there option to configurate Google Cloud Storage Source using proxy?
I couldn’t find similar configuration for GCS which is available for S3 (Dremio)
We would like to have dremio on premise with connection to S3 and GCS storage for querying files from both locations.
Hi @balaji.ramaswamy thanks for respond - I saw this tab and tried to use properties: fs.gs.proxy.address which should be correct for GCS connector - unfortunately this flag doesn’t work in Dremio.
Could you advice what configuration should be added to this properties to make proxy for GCP working?
@balaji.ramaswamy We tried this (fs.gs.proxy.host and fs.gs.proxy.port) but this configuration also doesn’t work - there is the same UnknownHostException: oauth2.googleapis.com error.
We tested different approach with -Dhttps.proxyHost properties in dremio-env and this is working but required ugly hack which we would like to avoid.
We tested versions 19.1.0-202111160130570172-0ee00450 and 23.1.0-202211250136090978-a79618c7
So could you check why standard properties doesn’t work and why there is no documentation for GCP proxy configuration?
Hi @balaji.ramaswamy For GCS proxy any properties set in Advanced option tab doesn’t work. The only working solution for us is to add to dremio-env line
DREMIO_JAVA_EXTRA_OPTS=“-Dhttp.nonProxyHosts=localhost|127.0.0.1|*.S3.domain|*.localHost.domain -Dhttp.proxyHost=… -Dhttp.proxyPort=… -Dhttps.proxyHost=… -Dhttps.proxyPort=…”
Because this -Dhttp.proxyHost properties is for whole Dremio connection we had to add -Dhttp.nonProxyHosts to disable proxy for S3. Additionally Dremio use some internal calls so we have to add our local host domain to this -Dhttp.nonProxyHosts as well.
Unfortunately this proxy configuration was not pick during asynchronous accesss to parquet files so to make it work we have to disable this feature (Enable asynchronous access when possible) what have big performance influence during read parquet files.
After dremio code analyses we notice that for asynch connection AsyncHttpClient library is used and in default configuration of this lib reading standard proxy configuration is disable. To enable this feature you can edit ahc-default.properties file located in /dremio/jars/3rdparty/async-http-client-2.7.0.jar in folder org/asynchttpclient/config and set org.asynchttpclient.useProxyProperties to true
All those changes allow Dremo to work correctly with GCS and S3 but required ugly hack - could you check if there is simples working solution of this proxy issue with GCS.
for me it also works to specifiy the parameter of ahc-default.properties as an addtional parameter for DREMIO_JAVA_EXTRA_OPTS -Dorg.asynchttpclient.useProxyProperties=true Dhttp.nonProxyHosts=localhost|127.0.0.1|.S3.domain|.localHost.domain -Dhttp.proxyHost=… -Dhttp.proxyPort=… -Dhttps.proxyHost=… -Dhttps.proxyPort=…”