"EOFException: Unexpected end of input stream" when trying to query aws load balancer access logs due to 0 byte log.gz file

Hey Guys

In short:
We encountered the “EOFException: Unexpected end of input stream.” error when we tried to query the aws load balancer access log s3 bucket. It looks like there are 0 byte *.log.gz files in there which might be tripping dremio up. It would be good if dremio could handle this error.

In more detail:
We managed to reproduce this error in the following dremio versions:

  • 21.1.1
  • 20.1.0

The error appears in the dremio ui like this:
image

In the server.log file, it appears something like this:

2022-05-17 23:03:01,546 [Fabric-RPC-Offload46] INFO  c.d.exec.work.foreman.AttemptManager - 1d7be11c-a461-5b08-e129-23445906cf00: State change requested RUNNING --> FAILED, Exception com.dremio.common.exceptions.UserRemoteException: SYSTEM ERROR: EOFException: Unexpected end of input stream

SqlOperatorImpl TEXT_SUB_SCAN
Location 2:0:6
Fragment 2:0

[Error Id: c417bf74-0807-48ea-a0d2-a44131fa48b3 on ip-172-18-61-130.ec2.internal:0]

  (com.dremio.common.exceptions.ExecutionSetupException) Failure while setting up text reader for file dremioS3:/XXXBucketNameXXX/XXXLoadBalancerNameXXX/AWSLogs/XXXAwsAccountIdXXX/elasticloadbalancing/us-east-1/2021/05/15/XXXAwsAccountIdXXX_elasticloadbalancing_us-east-1_app.XXXLoadBalancerNameXXX.XXXSomeIdLookingHexNumXXX_20210515T1950Z_34.234.214.74_1i9ez6hk.log.gz
    com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():148
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
    com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():311
    com.dremio.sabot.op.scan.ScanOperator.setupReader():302
    com.dremio.sabot.op.scan.ScanOperator.outputData():355
    com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():521
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():111
    com.dremio.sabot.driver.Pipeline.pumpOnce():101
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():371
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():308
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():95
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():773
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():243
    com.dremio.sabot.task.slicing.SlicingThread.run():171
  Caused By (java.io.EOFException) Unexpected end of input stream
    org.apache.hadoop.io.compress.DecompressorStream.decompress():165
    org.apache.hadoop.io.compress.DecompressorStream.read():105
    com.dremio.exec.hadoop.CompressionInputStreamWrapper.read():51
    com.dremio.exec.hadoop.CompressionInputStreamWrapper.readDirectBuffer():80
    com.dremio.exec.hadoop.CompressionInputStreamWrapper.read():59
    com.dremio.exec.store.easy.text.compliant.TextInput.read():185
    com.dremio.exec.store.easy.text.compliant.TextInput.updateBuffer():202
    com.dremio.exec.store.easy.text.compliant.TextInput.start():141
    com.dremio.exec.store.easy.text.compliant.TextReader.start():360
    com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.readFirstLineForColumnNames():172
    com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.generateColumnNames():251
    com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup():121
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
    com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser():311
    com.dremio.sabot.op.scan.ScanOperator.setupReader():302
    com.dremio.sabot.op.scan.ScanOperator.outputData():355
    com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():521
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():111
    com.dremio.sabot.driver.Pipeline.pumpOnce():101
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():371
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():308
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():95
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():773
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():243
    com.dremio.sabot.task.slicing.SlicingThread.run():171

2022-05-17 23:03:01,549 [FABRIC-rpc-event-queue] INFO  c.dremio.sabot.exec.FragmentHandler - set cancel for fragment 1:1

Here is an excerpt from server.log from one of the other nodes:

2022-05-17 23:03:00,912 [e0 - 1d7be11c-a461-5b08-e129-23445906cf00:frag:2:0] ERROR com.dremio.sabot.driver.SmartOp - EOFException: Unexpected end of input stream
com.dremio.common.exceptions.UserException: EOFException: Unexpected end of input stream
        at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:885)
        at com.dremio.sabot.driver.SmartOp.contextualize(SmartOp.java:140)
        at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:523)
        at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56)
        at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:111)
        at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:101)
        at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:371)
        at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:308)
        at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600(FragmentExecutor.java:95)
        at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:773)
        at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:120)
        at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:243)
        at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:171)
Caused by: com.dremio.common.exceptions.ExecutionSetupException: Failure while setting up text reader for file dremioS3:/XXXBucketNameXXX/XXXLoadBalancerNameXXX/AWSLogs/XXXAwsAccountIdXXX/elasticloadbalancing/us-east-1/2021/05/15/XXXAwsAccountIdXXX_elasticloadbalancing_us-east-1_app.XXXLoadBalancerNameXXX.XXXSomeIdLookingHexNumXXX_20210515T1950Z_34.234.214.74_1i9ez6hk.log.gz
        at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:148)
        at com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup(AdditionalColumnsRecordReader.java:75)
        at com.dremio.sabot.op.scan.ScanOperator.setupReaderAsCorrectUser(ScanOperator.java:311)
        at com.dremio.sabot.op.scan.ScanOperator.setupReader(ScanOperator.java:302)
        at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:355)
        at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:521)
        ... 10 common frames omitted
Caused by: java.io.EOFException: Unexpected end of input stream
        at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:165)
        at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
        at com.dremio.exec.hadoop.CompressionInputStreamWrapper.read(CompressionInputStreamWrapper.java:51)
        at com.dremio.exec.hadoop.CompressionInputStreamWrapper.readDirectBuffer(CompressionInputStreamWrapper.java:80)
        at com.dremio.exec.hadoop.CompressionInputStreamWrapper.read(CompressionInputStreamWrapper.java:59)
        at com.dremio.exec.store.easy.text.compliant.TextInput.read(TextInput.java:185)
        at com.dremio.exec.store.easy.text.compliant.TextInput.updateBuffer(TextInput.java:202)
        at com.dremio.exec.store.easy.text.compliant.TextInput.start(TextInput.java:141)
        at com.dremio.exec.store.easy.text.compliant.TextReader.start(TextReader.java:360)
        at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.readFirstLineForColumnNames(CompliantTextRecordReader.java:172)
        at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.generateColumnNames(CompliantTextRecordReader.java:251)
        at com.dremio.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:121)
        ... 15 common frames omitted
2022-05-17 23:03:01,553 [FABRIC-rpc-event-queue] INFO  c.dremio.sabot.exec.FragmentHandler - set cancel for fragment 0:0

I guess the log messages look slightly different depending on what role the node fulfilled for the query.

Looking into the specific s3 file referenced in the error message, it looks like the file is a *.log.gz file 0 bytes in size:
— image removed —
— sorry it looks like i’m only allowed to attach one image per post as I am new forum user —

Initially when I pointed dremio at the s3 bucket, even though the log files are zipped up in the log.gz format, it looks like dremio transparently unzipped the gz file and inspected the contents. As a guess, dremio is not handling the edge case that the log.gz file is an empty file and the query process is crashing out.

Before anyone makes a suggestion, please consider the following preemptive replies:

  • I cannot change the way these log.gz files are produced as the access log archiving process is automatically done by aws.
  • there is no chance of getting aws to change their access log output format or behaviour as aws wants everyone to us their AWS Athena for access log queries and athena handles this scenario without any issues.
  • there is no chance of us preprocessing these log files to remove the 0 byte log gz files as editing the contents of these access log s3 buckets violates the sanctity of these audit orientated tools

Please address this issue as it might be affecting more customers than the ones actually bothering to write up a post.

Thanks