Dremio API 104 closed by remote host error

While trying to kick off a string of api calls from different docker containers, I’m getting a 104 error. ‘Connection aborted.’, error(104, ‘Connection reset by peer’)

I can run the api’s from one container, but the instant I kick off the second group of calls which is querying against a different table, both code sets fail due to the connection being closed by the host. I even tried making different user accounts for the api’s to be called with. Any help here would be great.

Hi @R_Dirden

Do you see any jobs for this on the Dremio UI jobs page? If yes can you please send us the profile. If not kindly send us the server.log

Dremio Logs
Share a Query Profile

Thanks
@balaji.ramaswamy

The server.log file is not being updated. Also, there’s not a failed query to pull a profile from due to the call not making it through dremio. Is there some type of setting surrounding api’s from different sessions?

Hi,

If the logs are not being updated then it sounds like a networking issue. Can you ping the Dremio UI the first container from the second container?

I’m able to kick off the code and constantly submit queries from either of the 2 containers individually. The moment I try to run the code from the 2 containers in parallel is when they fail. So the 2 containers can definitely talk with dremio and actively submit and receive data via the api.

From the coordinator, node activity, do you see both nodes?

I’m using a single node dremio instance, and I’m accessing that instance from 2 separate python containers.

Hi @R_Dirden

We would need Dremio to be running on the containers if they need to execute queries… We are little confused by your architecture . Can you please explain?

Thanks
@balaji.ramaswamy

Dremio 3.0 is running as a single node within a container. I also have 2 idle python containers I’m using to start and stop processes. So if I go into either of the 2 python containers and start a process that use Dremio’s API it works without any issues. However, if I go into the container that’s not running and start the process while the first python process is still running, they both will fail immediately. Not sure why they can run without issues individually and when I run them at the same time they fail.

Hi @R_Dirden

When you say this "if I go into either of the 2 python containers and start a process that use Dremio’s API ", do you mean run a query?

Thanks
@balaji.ramaswamy

@R_Dirden

Can you give us an exact HTTP response when its failing? Is there perhaps a proxy infront of the Dremio instance?

Container 1 Running then fails almost immediately after starting container 2
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:55.858Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:56.606Z’, u’startedAt’: u’2019-01-15T17:31:54.139Z’, u’queueName’: u’LARGE’}
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:55.858Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:56.606Z’, u’startedAt’: u’2019-01-15T17:31:54.139Z’, u’queueName’: u’LARGE’}
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:55.858Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:56.606Z’, u’startedAt’: u’2019-01-15T17:31:54.139Z’, u’queueName’: u’LARGE’}
Traceback (most recent call last):
File “Correlation.py”, line 171, in
qryRowCount, qryRowArray = checkRunStatus(arrJobList)
File “Correlation.py”, line 70, in checkRunStatus
response = requests.request(“GET”, urlStatus + sJobId, headers=headers)
File “/usr/local/lib/python2.7/site-packages/requests/api.py”, line 58, in request
return session.request(method=method, url=url, **kwargs)
File “/usr/local/lib/python2.7/site-packages/requests/sessions.py”, line 508, in request
resp = self.send(prep, **send_kwargs)
File “/usr/local/lib/python2.7/site-packages/requests/sessions.py”, line 618, in send
r = adapter.send(request, **kwargs)
File “/usr/local/lib/python2.7/site-packages/requests/adapters.py”, line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, error(104, ‘Connection reset by peer’))

Container 2 fails after making a few successful calls:
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:24.609Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:25.275Z’, u’startedAt’: u’2019-01-15T17:31:24.314Z’, u’queueName’: u’LARGE’}
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:24.609Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:25.275Z’, u’startedAt’: u’2019-01-15T17:31:24.314Z’, u’queueName’: u’LARGE’}
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:24.609Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:25.275Z’, u’startedAt’: u’2019-01-15T17:31:24.314Z’, u’queueName’: u’LARGE’}
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:24.609Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:25.275Z’, u’startedAt’: u’2019-01-15T17:31:24.314Z’, u’queueName’: u’LARGE’}
Status {u’queueId’: u’LARGE’, u’jobState’: u’RUNNING’, u’resourceSchedulingStartedAt’: u’2019-01-15T17:31:24.609Z’, u’errorMessage’: u’’, u’queryType’: u’REST’, u’rowCount’: 0, u’resourceSchedulingEndedAt’: u’2019-01-15T17:31:25.275Z’, u’startedAt’: u’2019-01-15T17:31:24.314Z’, u’queueName’: u’LARGE’}
Traceback (most recent call last):
File “Correlation2.py”, line 89, in
tagCount, tagRowCounts = checkRunStatus(arrTagListId)
File “Correlation2.py”, line 70, in checkRunStatus
response = requests.request(“GET”, urlStatus + sJobId, headers=headers)
File “/usr/local/lib/python2.7/site-packages/requests/api.py”, line 58, in request
return session.request(method=method, url=url, **kwargs)
File “/usr/local/lib/python2.7/site-packages/requests/sessions.py”, line 508, in request
resp = self.send(prep, **send_kwargs)
File “/usr/local/lib/python2.7/site-packages/requests/sessions.py”, line 618, in send
r = adapter.send(request, **kwargs)
File “/usr/local/lib/python2.7/site-packages/requests/adapters.py”, line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, error(104, ‘Connection reset by peer’))

In the Jobs page in the Dremio UI, are there any failed jobs being listed? If not then it seems like its a networking setup issue as the queries are not even arriving at Dremio.

Yeah, I believe it may be some sort of networking issue as well. I may just be making way to many calls back to back.