How to connect Dremio with GCS storage using proxy

Is there option to configurate Google Cloud Storage Source using proxy?

I couldn’t find similar configuration for GCS which is available for S3 (Dremio)
We would like to have dremio on premise with connection to S3 and GCS storage for querying files from both locations.

@Pawel These are added in the source’s advanced tab (add property), which should be available in all sources

Hi @balaji.ramaswamy thanks for respond - I saw this tab and tried to use properties: fs.gs.proxy.address which should be correct for GCS connector - unfortunately this flag doesn’t work in Dremio.
Could you advice what configuration should be added to this properties to make proxy for GCP working?

@Pawel You would need a host and port an optional username/password

fs.gs.proxy.host and fs.gs.proxy.port 

@balaji.ramaswamy We tried this (fs.gs.proxy.host and fs.gs.proxy.port) but this configuration also doesn’t work - there is the same UnknownHostException: oauth2.googleapis.com error.

We tested different approach with -Dhttps.proxyHost properties in dremio-env and this is working but required ugly hack which we would like to avoid.

We tested versions 19.1.0-202111160130570172-0ee00450 and 23.1.0-202211250136090978-a79618c7

So could you check why standard properties doesn’t work and why there is no documentation for GCP proxy configuration?

@Pawel I have seen that config before, are you able to send the exact dremio-env line or file? Let me check if that is the right one

Hi @balaji.ramaswamy For GCS proxy any properties set in Advanced option tab doesn’t work. The only working solution for us is to add to dremio-env line
DREMIO_JAVA_EXTRA_OPTS=“-Dhttp.nonProxyHosts=localhost|127.0.0.1|*.S3.domain|*.localHost.domain -Dhttp.proxyHost=… -Dhttp.proxyPort=… -Dhttps.proxyHost=… -Dhttps.proxyPort=…”

Because this -Dhttp.proxyHost properties is for whole Dremio connection we had to add -Dhttp.nonProxyHosts to disable proxy for S3. Additionally Dremio use some internal calls so we have to add our local host domain to this -Dhttp.nonProxyHosts as well.

Unfortunately this proxy configuration was not pick during asynchronous accesss to parquet files so to make it work we have to disable this feature (Enable asynchronous access when possible) what have big performance influence during read parquet files.

After dremio code analyses we notice that for asynch connection AsyncHttpClient library is used and in default configuration of this lib reading standard proxy configuration is disable. To enable this feature you can edit ahc-default.properties file located in /dremio/jars/3rdparty/async-http-client-2.7.0.jar in folder org/asynchttpclient/config and set org.asynchttpclient.useProxyProperties to true

All those changes allow Dremo to work correctly with GCS and S3 but required ugly hack - could you check if there is simples working solution of this proxy issue with GCS.

1 Like

Thanks for the detailed explanation, let me check this and get back to you

for me it also works to specifiy the parameter of ahc-default.properties as an addtional parameter for DREMIO_JAVA_EXTRA_OPTS -Dorg.asynchttpclient.useProxyProperties=true Dhttp.nonProxyHosts=localhost|127.0.0.1|.S3.domain|.localHost.domain -Dhttp.proxyHost=… -Dhttp.proxyPort=… -Dhttps.proxyHost=… -Dhttps.proxyPort=…”