Unable to execute HTTP request: Timeout waiting for connection from pool

mitchell.davis · January 10, 2019, 11:00pm

I keep running into this error when querying parquet from AWS S3:

 Unable to execute HTTP request: Timeout waiting for connection from pool

It’s not against all data sources, but enough to cause pain.

From the research I’ve done, it looks like I need to make sure that the fs.s3a.connection.maximum connection property is set to a high number. We currently have it set to 100000 at the moment. I could easily up the value to a million, but before I need to reset all our S3 data sources, I feel like there may be something else going on.

Is anyone else seeing this kind of behavior? Do I need to up the connection property to a million? If I do change the value to a million is there a better way to do it other than the GUI because rebuilding all the data sources is a pain.

Any help is appreciated. Thank you!

anthony · January 11, 2019, 12:06am

fs.s3a.connection.maximum = 100000 should be sufficient. Can you try to increase fs.s3a.threads.max too? Just make sure that fs.s3a.connection.maximum is at least larger than fs.s3a.threads.max

Also, I’m curious why you say “rebuilding all the data sources is a pain”? It should be an easy quick update.

balaji.ramaswamy · January 11, 2019, 3:18am

HI @mitchell.davis

Are the Dremio executors on AWS as EC2 instances? Then you may have to add this parameter on core-site.xml and drop under the $DREMIO_HOME/conf on all executors and restart them.

<configuration>
   <property>
       <name>fs.s3a.connection.maximum</name>
       <description>Connection Maximum</description>
       <value> 100000 </value>
   </property>
   <property>
       <name></name>
       <description>fs.s3a.threads.max</description>
       <value>5000</value>
   </property>
</configuration>

Thanks
@balaji.ramaswamy

mitchell.davis · January 11, 2019, 2:15pm

@anthony, I meant to say “Data Sets”. We have a ton of data locations in S3 Buckets and when I change any setting on the S3 “Data Source” it drops all the defined “Data Sets”. The only way I know how to create “Data Sets” is to navigate to each folder in the interface. So, unless there is a different way to define “Data Sets” I’ll have to do that through the UI when I update the S3 “Data Source”.

mitchell.davis · January 11, 2019, 3:13pm

@balaji.ramaswamy, does that mean that the S3 Data Source setup on the Dremio GUI doesn’t propagate those settings to all the executors in the cluster?

mitchell.davis · January 11, 2019, 3:52pm

@balaji.ramaswamy and @anthony I wanted to give you an update on the changes that have been made and are looking to fix the problem right now.

I have NOT updated the S3 Source through the UI yet, because I was hoping @anthony could fill me in on a better way to create DataSets so I don’t have to go to each one and recreate them when I update the properties of the S3 Source.
I updated the core-site.xml files of the master and all the executors with the change that @balaji.ramaswamy suggested and I was able to setup the dataset on the problem parquet location. I assumed that the properties entered through the UI for the S3 Source propagated to all the executors, but after make the change with the core-site.xml, I now assume that is not true. Is the GUI S3 Source properties only for the UI operations? Could you guys shed some light on that?

balaji.ramaswamy · January 11, 2019, 4:08pm

@mitchell.davis

Create datasets: When you change something in the source and it is a metadata impacting change then you do not have to recreate datasets, you might just have to repromote them as PDS. Not exactly sure what you mean by recreating datasets? Are you referring to Virtual Datasets?
The timeout parameter has to go to the source if the error is because the Parquet file is on a S3 bucket. It also has to go into your executors core-site.xml if your executors are on AWS as EC2 machines. Makes sense?

Thanks
@balaji.ramaswamy

mitchell.davis · January 11, 2019, 4:59pm

Thanks for the response @balaji.ramaswamy. I think I’m confused by the terminology.

1.) When we create a Data Set in S3, we use the UI to navigate to the location in an S3 bucket and click the Action button to convert the folder to a Data Set. I’m not sure if that’s a Physical Data Set or a Virtual Data Set. I guess I see them as Purple or Green? In this case, to create Purple Data sets, we have to do the navigation thing in the UI.

When we change metadata in the S3 Source configuration, it clears all the purple Data Sets defined in that source. So, setting them back up is a pain. (I’m sorry guys, that’s the best way I know how to explain it.)

2.) Yes, that makes sense @balaji.ramaswamy!

balaji.ramaswamy · January 11, 2019, 5:23pm

Hi @mitchell.davis,

Thanks for confirming on #2

#1, now I am also in the same page

Initially when you promote a S3 folder it is a PDS (Purple)
You can do transformations on PDS and save as VDS (Green)

You can promote via UI or also through REST API and script it all

Promote to PDS Via REST API
REST API Reference

Kindly let me know if you have any other questions

Thanks
@balaji.ramaswamy

mitchell.davis · January 11, 2019, 7:41pm

That does make sense @balaji.ramaswamy, thank you so much for all your help!

Topic		Replies	Views
Dremio 3.1 - Unable to execute HTTP request: Timeout waiting for connection from pool	1	1346	January 31, 2019
Getting a timeout error when trying to load results from a s3	3	6854	August 20, 2018
Timeout reading parquet file from s3	5	4025	January 24, 2019
S3 Table not found after a while	8	2496	May 16, 2019
Dremio - S3 bucket by access key and secrete key	1	1162	October 14, 2020

Unable to execute HTTP request: Timeout waiting for connection from pool

Related topics