Hey Guys
In short:
We encountered the “EOFException: Unexpected end of input stream.” error when we tried to query the aws load balancer access log s3 bucket. It looks like there are 0 byte *.log.gz files in there which might be tripping dremio up. It would be good if dremio could handle this error.
In more detail:
We managed to reproduce this error in the following dremio versions:
- 21.1.1
- 20.1.0
The error appears in the dremio ui like this:
In the server.log file, it appears something like this:
2022-05-17 23:03:01,546 [Fabric-RPC-Offload46] INFO c.d.exec.work.foreman.AttemptManager - 1d7be11c-a461-5b08-e129-23445906cf00: State change requested RUNNING --> FAILED, Exception com.dremio.common.exceptions.UserRemoteException: SYSTEM ERROR: EOFException: Unexpected end of input stream
SqlOperatorImpl TEXT_SUB_SCAN
Location 2:0:6
Fragment 2:0
[Error Id: c417bf74-0807-48ea-a0d2-a44131fa48b3 on ip-172-18-61-130.ec2.internal:0]
(com.dremio.common.exceptions.ExecutionSetupException) Failure while setting up text reader for file dremioS3:/XXXBucketNameXXX/XXXLoadBalancerNameXXX/AWSLogs/XXXAwsAccountIdXXX/elasticloadbalancing/us-east-1/2021/05/15/XXXAwsAccountIdXXX_elasticloadbalancing_us-east-1_app.XXXLoadBalancerNameXXX.XXXSomeIdLookingHexNumXXX_20210515T1950Z_34.234.214.74_1i9ez6hk.log.gz
com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():148
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():311
com.dremio.sabot.op.scan.ScanOperator.setupReader():302
com.dremio.sabot.op.scan.ScanOperator.outputData():355
com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():521
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():111
com.dremio.sabot.driver.Pipeline.pumpOnce():101
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():371
com.dremio.sabot.exec.fragment.FragmentExecutor.run():308
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():95
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():773
com.dremio.sabot.task.AsyncTaskWrapper.run():120
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():243
com.dremio.sabot.task.slicing.SlicingThread.run():171
Caused By (java.io.EOFException) Unexpected end of input stream
org.apache.hadoop.io.compress.DecompressorStream.decompress():165
org.apache.hadoop.io.compress.DecompressorStream.read():105
com.dremio.exec.hadoop.CompressionInputStreamWrapper.read():51
com.dremio.exec.hadoop.CompressionInputStreamWrapper.readDirectBuffer():80
com.dremio.exec.hadoop.CompressionInputStreamWrapper.read():59
com.dremio.exec.store.easy.text.compliant.TextInput.read():185
com.dremio.exec.store.easy.text.compliant.TextInput.updateBuffer():202
com.dremio.exec.store.easy.text.compliant.TextInput.start():141
com.dremio.exec.store.easy.text.compliant.TextReader.start():360
com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.readFirstLineForColumnNames():172
com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.generateColumnNames():251
com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():121
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():311
com.dremio.sabot.op.scan.ScanOperator.setupReader():302
com.dremio.sabot.op.scan.ScanOperator.outputData():355
com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():521
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():111
com.dremio.sabot.driver.Pipeline.pumpOnce():101
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():371
com.dremio.sabot.exec.fragment.FragmentExecutor.run():308
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():95
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():773
com.dremio.sabot.task.AsyncTaskWrapper.run():120
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():243
com.dremio.sabot.task.slicing.SlicingThread.run():171
2022-05-17 23:03:01,549 [FABRIC-rpc-event-queue] INFO c.dremio.sabot.exec.FragmentHandler - set cancel for fragment 1:1
Here is an excerpt from server.log from one of the other nodes:
2022-05-17 23:03:00,912 [e0 - 1d7be11c-a461-5b08-e129-23445906cf00:frag:2:0] ERROR com.dremio.sabot.driver.SmartOp - EOFException: Unexpected end of input stream
com.dremio.common.exceptions.UserException: EOFException: Unexpected end of input stream
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:885)
at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:140)
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:523)
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:111)
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:101)
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:371)
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:308)
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600(FragmentExecutor.java:95)
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:773)
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:120)
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:243)
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: com.dremio.common.exceptions.ExecutionSetupException: Failure while setting up text reader for file dremioS3:/XXXBucketNameXXX/XXXLoadBalancerNameXXX/AWSLogs/XXXAwsAccountIdXXX/elasticloadbalancing/us-east-1/2021/05/15/XXXAwsAccountIdXXX_elasticloadbalancing_us-east-1_app.XXXLoadBalancerNameXXX.XXXSomeIdLookingHexNumXXX_20210515T1950Z_34.234.214.74_1i9ez6hk.log.gz
at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:148)
at com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup(AdditionalColumnsRecordReader.java:75)
at com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser(ScanOperator.java:311)
at com.dremio.sabot.op.scan.ScanOperator.setupReader(ScanOperator.java:302)
at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:355)
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:521)
... 10 common frames omitted
Caused by: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:165)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at com.dremio.exec.hadoop.CompressionInputStreamWrapper.read(CompressionInputStreamWrapper.java:51)
at com.dremio.exec.hadoop.CompressionInputStreamWrapper.readDirectBuffer(CompressionInputStreamWrapper.java:80)
at com.dremio.exec.hadoop.CompressionInputStreamWrapper.read(CompressionInputStreamWrapper.java:59)
at com.dremio.exec.store.easy.text.compliant.TextInput.read(TextInput.java:185)
at com.dremio.exec.store.easy.text.compliant.TextInput.updateBuffer(TextInput.java:202)
at com.dremio.exec.store.easy.text.compliant.TextInput.start(TextInput.java:141)
at com.dremio.exec.store.easy.text.compliant.TextReader.start(TextReader.java:360)
at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.readFirstLineForColumnNames(CompliantTextRecordReader.java:172)
at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.generateColumnNames(CompliantTextRecordReader.java:251)
at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:121)
... 15 common frames omitted
2022-05-17 23:03:01,553 [FABRIC-rpc-event-queue] INFO c.dremio.sabot.exec.FragmentHandler - set cancel for fragment 0:0
I guess the log messages look slightly different depending on what role the node fulfilled for the query.
Looking into the specific s3 file referenced in the error message, it looks like the file is a *.log.gz file 0 bytes in size:
— image removed —
— sorry it looks like i’m only allowed to attach one image per post as I am new forum user —
Initially when I pointed dremio at the s3 bucket, even though the log files are zipped up in the log.gz format, it looks like dremio transparently unzipped the gz file and inspected the contents. As a guess, dremio is not handling the edge case that the log.gz file is an empty file and the query process is crashing out.
Before anyone makes a suggestion, please consider the following preemptive replies:
- I cannot change the way these log.gz files are produced as the access log archiving process is automatically done by aws.
- there is no chance of getting aws to change their access log output format or behaviour as aws wants everyone to us their AWS Athena for access log queries and athena handles this scenario without any issues.
- there is no chance of us preprocessing these log files to remove the 0 byte log gz files as editing the contents of these access log s3 buckets violates the sanctity of these audit orientated tools
Please address this issue as it might be affecting more customers than the ones actually bothering to write up a post.
Thanks