How to create datasource of s3 folder with different storage classes?

We have older data automatically move to archive s3 storage classes. But the folders of course remain in the same location. When trying to create a datasource of the folder I get errors like

Failure reading JSON file  ... com.amazonaws.services.s3.model.AmazonS3Exception: The operation is not valid for the object's storage class 

Is there any way to create a datasource that will ignore files in infrequent storage classes? Or to pre-filter them?

(I’m not quite sure why infrequent storage class should even cause an error though)

Hi @memelet,

If you have added the source successfully, but periodically get this error when accessing the directory after its contents have been archived, you could refreshing is metadata in Dremio

In the SQL editor (or via the REST API) try:

ALTER PDS <path.to.directory.in.Dremio> REFRESH METADATA

It fails when initially creating the datasource.

@memelet, can you try adding the source and attach the Dremio server.log to this ticket? It would be useful to have the full stack trace of the error.

Sure @ben. (And thanks for the help!)

I think is all the bits

2019-02-04 00:28:33,866 [qtp1124973744-654] INFO  c.d.e.s.easy.json.JSONRecordReader - User Error Occurred [ErrorId: c654a694-5be8-460e-a779-ac85c2865aa8]
com.dremio.common.exceptions.UserException: Failure reading JSON file - s3a://systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/acutec/acutec_DT2030_M127B/2017-03-13/1_0_00000000003524693532: Reopen at position 0 on s3a://systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/acutec/acutec_DT2030_M127B/2017-03-13/1_0_00000000003524693532: com.amazonaws.services.s3.model.AmazonS3Exception: The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: DF23E1C3A6BECC88), S3 Extended Request ID: 8rLZAOacBZ4gQQmPFtEEkM8UkxJ6qcajNsqeE7urBpZf4k+PKDSdO8qXNcDA5sasj6Qj79L96Dg=
	at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:746) ~[dremio-common-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:180) [dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.exec.store.easy.json.JSONRecordReader.setup(JSONRecordReader.java:146) [dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.dac.model.sources.FormatTools.getData(FormatTools.java:327) [dremio-dac-backend-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.dac.model.sources.FormatTools.previewData(FormatTools.java:261) [dremio-dac-backend-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.dac.resource.SourceResource.previewFolderFormat(SourceResource.java:309) [dremio-dac-backend-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_181]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_181]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_181]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_181]
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [jersey-common-2.25.1.jar:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [jersey-common-2.25.1.jar:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [jersey-common-2.25.1.jar:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [jersey-common-2.25.1.jar:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [jersey-common-2.25.1.jar:na]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [jersey-common-2.25.1.jar:na]
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [jersey-server-2.25.1.jar:na]
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) [jersey-container-servlet-core-2.25.1.jar:na]
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) [jersey-container-servlet-core-2.25.1.jar:na]
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) [jersey-container-servlet-core-2.25.1.jar:na]
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) [jersey-container-servlet-core-2.25.1.jar:na]
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) [jersey-container-servlet-core-2.25.1.jar:na]
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) [jetty-servlet-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) [jetty-servlet-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [jetty-servlets-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:301) [jetty-servlets-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.Server.handle(Server.java:499) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258) [jetty-server-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) [jetty-io-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [jetty-util-9.2.26.v20180806.jar:9.2.26.v20180806]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [jetty-util-9.2.26.v20180806.jar:9.2.26.v20180806]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: java.nio.file.AccessDeniedException: s3a://systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/acutec/acutec_DT2030_M127B/2017-03-13/1_0_00000000003524693532: Reopen at position 0 on s3a://systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/acutec/acutec_DT2030_M127B/2017-03-13/1_0_00000000003524693532: com.amazonaws.services.s3.model.AmazonS3Exception: The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: DF23E1C3A6BECC88), S3 Extended Request ID: 8rLZAOacBZ4gQQmPFtEEkM8UkxJ6qcajNsqeE7urBpZf4k+PKDSdO8qXNcDA5sasj6Qj79L96Dg=
	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) ~[hadoop-aws-2.8.3.jar:na]
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:155) ~[hadoop-aws-2.8.3.jar:na]
	at org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:281) ~[hadoop-aws-2.8.3.jar:na]
	at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:364) ~[hadoop-aws-2.8.3.jar:na]
	at java.io.DataInputStream.read(DataInputStream.java:149) ~[na:1.8.0_181]
	at com.dremio.exec.store.dfs.FSDataInputStreamWrapper$WrappedInputStream.read(FSDataInputStreamWrapper.java:250) ~[dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at java.io.DataInputStream.read(DataInputStream.java:149) ~[na:1.8.0_181]
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:522) ~[jackson-core-2.9.7.jar:2.9.7]
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:129) ~[jackson-core-2.9.7.jar:2.9.7]
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:246) ~[jackson-core-2.9.7.jar:2.9.7]
	at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1315) ~[jackson-core-2.9.7.jar:2.9.7]
	at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:820) ~[jackson-core-2.9.7.jar:2.9.7]
	at com.dremio.exec.store.easy.json.reader.BaseJsonProcessor.setSource(BaseJsonProcessor.java:38) ~[dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.exec.vector.complex.fn.JsonReader.setSource(JsonReader.java:162) ~[dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.exec.store.easy.json.JSONRecordReader.setupParser(JSONRecordReader.java:152) [dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	at com.dremio.exec.store.easy.json.JSONRecordReader.setup(JSONRecordReader.java:144) [dremio-sabot-kernel-3.1.1-201901281837360699-30c9d74.jar:3.1.1-201901281837360699-30c9d74]
	... 49 common frames omitted
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The operation is not valid for the object's storage class (Service: Amazon S3; Status Code: 403; Error Code: InvalidObjectState; Request ID: DF23E1C3A6BECC88)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.156.jar:na]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221) ~[aws-java-sdk-s3-1.11.156.jar:na]
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168) ~[aws-java-sdk-s3-1.11.156.jar:na]
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1378) ~[aws-java-sdk-s3-1.11.156.jar:na]
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:148) ~[hadoop-aws-2.8.3.jar:na]
	... 63 common frames omitted
10.0.16.166 - - [04/Feb/2019:00:28:33 +0000] "POST /apiv2/source/s3%20connectdata/folder_preview/systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/

Hi @memelet,

I reproduced this error by have one object in my S3 directory with Glacier storage class. Dremio seems fine with the other storage classes.

Your error complains about one file in particular:

s3a://systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/acutec/acutec_DT2030_M127B/2017-03-13/1_0_00000000003524693532: Reopen at position 0 on s3a://systeminsights-connectdata/production/connect-plants-acutec-acutec/acutec/acutec/acutec_DT2030_M127B/2017-03-13/1_0_00000000003524693532

Can you login to your AWS S3 console and let us know the specific storage class of that file?

Meanwhile, I’ll see if there’s some possible workaround to this (beyond just removing the file)

Hey @ben!

That file’s storage class is glacier, which makes sense why dremio can’t get to it.

It would be ideal to be able to exclude S3 files in glacier storage class when creating a datasource from a folder. There is no way we can remove the file.

Hi @ben – is there any possible workaround here?

I’m trying to add an s3 directory that has lifecycle rules set up so that older data is transitioned to glacier.

I was trying to see if there’s some way to just create a virtual dataset over the folder than only includes the last N days of data (the folder is partitioned, so I think I could do this with dir0, dir1 etc) – however it doesn’t seem like it’s possible to do this without first creating a dataset from the whole folder (which doesn’t work because it tries to access glacier files)

@bknk

Can we not move all the S3 folders under one folder in S3 and just promote that?

I’m trying to avoid moving these files in s3 – there are a couple hundred terabytes of them in their existing directory structure. It seems like my only option is to change my archive process going forward so that I move files to a new path before changing storage to glacier?