lvhuyen
September 8, 2022, 7:39am
1
Hi,
I have some Iceberg tables in AWS Glue, created either by Spark or Athena.
In Dremio AWS Edition (22.1.1-20220823) I tried to query those Glue tables, and got this error:
getFileStatus on s3://<our_s3_bucket>/<our_table_s3_prefix>/metadata/00000-d9cb0fa7-5ee5-43ab-bf24-af8e64eb891f.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
On the Datasets tab, when clicking on the table name to see the details, I got this message Cannot provide more information about this dataset.
Legacy Glue external tables (using parquet only) could still be queried without any issues.
When I tried to view the S3 folder for my table in an S3 source, I could preview the data, which was recognised as in Iceberg format. But when I clicked “Save”, I got the error “Failed to get iceberg metadata”. I guess it’s facing the same issue as with the Glue table.
I also tried to create a table using this query:
create table test_dremio as select * from test_legacy
and got a similar error:
getFileStatus on s3://<our_bucket>/<warehouse_base_prefix>/test_dremio/metadata/0942d9f3-181e-4ab3-9d6c-47a7653cae18.avro: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
Could someone please help?
Thanks!
jdlong
November 28, 2022, 8:02pm
2
I just tried writing a table to Iceberg format using Athena then reading it with Dremio and I get a very similar error:
@jdlong Any chance you are able to share the job profile?
jdlong
November 29, 2022, 3:08pm
4
hey @balaji.ramaswamy
Interestingly this morning I re-ran and the error is different… so things are moving underneath me a bit. Here’s new error:
and here’s the verbose error from the job profile (is that what you wanted?)
SYSTEM ERROR: RuntimeException: native zStandard library not available: this version of libhadoop was built without zstd support.
SqlOperatorImpl TABLE_FUNCTION
Location 0:0:6
Fragment 0:0
[Error Id: c3aa1479-aa1a-47c4-895a-a849052f927c on 10.134.16.182:0]
(com.dremio.common.exceptions.ExecutionSetupException) java.io.IOException: Failed to set up column readers
com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():441
com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():111
com.dremio.sabot.driver.Pipeline.pumpOnce():101
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
com.dremio.sabot.task.AsyncTaskWrapper.run():120
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
com.dremio.sabot.task.slicing.SlicingThread.run():171
Caused By (java.io.IOException) Failed to set up column readers
com.dremio.parquet.reader.SimpleReader.newRowGroupReader():89
com.dremio.parquet.reader.SimpleReader.newRowGroupReader():34
com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():437
com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():111
com.dremio.sabot.driver.Pipeline.pumpOnce():101
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
com.dremio.sabot.task.AsyncTaskWrapper.run():120
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
com.dremio.sabot.task.slicing.SlicingThread.run():171
Caused By (org.apache.parquet.hadoop.DirectCodecFactory.DirectCodecPool.ParquetCompressionCodecException) Error creating compression codec pool.
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():441
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():371
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool.codec():473
org.apache.parquet.hadoop.DirectCodecFactory.createDecompressor():144
org.apache.parquet.hadoop.CodecFactory.getDecompressor():199
org.apache.parquet.hadoop.CodecFactory.getDecompressor():42
com.dremio.parquet.pages.BaseReaderIterator.<init>():54
com.dremio.parquet.pages.IncrementalPageReaderIterator.<init>():47
com.dremio.parquet.pages.FSPageReader.openPageIterator():67
com.dremio.parquet.reader.SimpleReader.getPrimitiveColumnReader():123
com.dremio.parquet.reader.SimpleReader.getColumnReader():100
com.dremio.parquet.reader.SimpleReader.newRowGroupReader():77
com.dremio.parquet.reader.SimpleReader.newRowGroupReader():34
com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():437
com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():111
com.dremio.sabot.driver.Pipeline.pumpOnce():101
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
com.dremio.sabot.task.AsyncTaskWrapper.run():120
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
com.dremio.sabot.task.slicing.SlicingThread.run():171
Caused By (java.lang.RuntimeException) native zStandard library not available: this version of libhadoop was built without zstd support.
org.apache.hadoop.io.compress.ZStandardCodec.checkNativeCodeLoaded():65
org.apache.hadoop.io.compress.ZStandardCodec.createCompressor():164
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool$1.makeObject():385
org.apache.commons.pool.impl.GenericObjectPool.borrowObject():1188
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():389
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():371
org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool.codec():473
org.apache.parquet.hadoop.DirectCodecFactory.createDecompressor():144
org.apache.parquet.hadoop.CodecFactory.getDecompressor():199
org.apache.parquet.hadoop.CodecFactory.getDecompressor():42
com.dremio.parquet.pages.BaseReaderIterator.<init>():54
com.dremio.parquet.pages.IncrementalPageReaderIterator.<init>():47
com.dremio.parquet.pages.FSPageReader.openPageIterator():67
com.dremio.parquet.reader.SimpleReader.getPrimitiveColumnReader():123
com.dremio.parquet.reader.SimpleReader.getColumnReader():100
com.dremio.parquet.reader.SimpleReader.newRowGroupReader():77
com.dremio.parquet.reader.SimpleReader.newRowGroupReader():34
com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():437
com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():111
com.dremio.sabot.driver.Pipeline.pumpOnce():101
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
com.dremio.sabot.task.AsyncTaskWrapper.run():120
com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
com.dremio.sabot.task.slicing.SlicingThread.run():171
SqlOperatorImpl TABLE_FUNCTION
Location 0:0:6
Fragment 0:0
com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader(ParquetVectorizedReader.java:441)
com.dremio.exec.store.parquet.UnifiedParquetReader(UnifiedParquetReader.java:252)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:216)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:190)
com.dremio.exec.store.parquet.ParquetCoercionReader(ParquetCoercionReader.java:81)
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader(AdditionalColumnsRecordReader.java:75)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:167)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:155)
com.dremio.sabot.op.tablefunction.TableFunctionOperator(TableFunctionOperator.java:96)
com.dremio.sabot.driver.SmartOp$SmartSingleInput(SmartOp.java:193)
com.dremio.sabot.driver.StraightPipe(StraightPipe.java:56)
com.dremio.sabot.driver.Pipeline(Pipeline.java:111)
com.dremio.sabot.driver.Pipeline(Pipeline.java:101)
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper(FragmentExecutor.java:418)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:355)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:97)
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl(FragmentExecutor.java:820)
com.dremio.sabot.task.AsyncTaskWrapper(AsyncTaskWrapper.java:120)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:247)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:171)
it looks like I can’t attache the profile JSON files directly, so if you want one of those, let me know and I’ll paste in its text
fwiw, the table was created in Athena with compression set to SNAPPY
I also tried with compression set to NONE
and I get a different error… see below:
IO_EXCEPTION ERROR: getFileStatus on s3://rnr-datalabs/scratchpad/jal/df1_iceberg/metadata/00001-9be51136-abf1-45d3-9165-7af2cbc1619b.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
SQL Query select * from "AWS_Glue_Catalog_DataLab"."jal_testing_db"."jal_testing_df1_iceberg"
(org.apache.hadoop.fs.s3a.AWSClientIOException) getFileStatus on s3://rnr-datalabs/scratchpad/jal/df1_iceberg/metadata/00001-9be51136-abf1-45d3-9165-7af2cbc1619b.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
org.apache.hadoop.fs.s3a.S3AUtils.translateException():128
org.apache.hadoop.fs.s3a.S3AUtils.translateException():101
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():1571
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():117
com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():235
com.dremio.exec.store.iceberg.DremioFileIO.newInputFile():116
org.apache.iceberg.TableMetadataParser.read():245
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0():171
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1():185
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry():404
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded():214
org.apache.iceberg.util.Tasks$Builder.run():198
org.apache.iceberg.util.Tasks$Builder.run():190
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():185
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():170
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():165
com.dremio.exec.store.hive.iceberg.IcebergHiveTableOperations.doRefresh():44
org.apache.iceberg.BaseMetastoreTableOperations.refresh():95
org.apache.iceberg.BaseTable.refresh():59
com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadataFromIceberg():560
com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadata():536
com.dremio.exec.store.hive.HiveStoragePlugin.listPartitionChunks():1290
com.dremio.plugins.awsglue.store.AWSGlueStoragePlugin.listPartitionChunks():653
com.dremio.exec.catalog.DatasetSaverImpl.saveUsingV1Flow():248
com.dremio.exec.catalog.DatasetSaverImpl.save():121
com.dremio.exec.catalog.DatasetSaverImpl.save():143
com.dremio.exec.catalog.EnterpriseDatasetSaver.save():83
com.dremio.exec.catalog.DatasetManager.getTableFromPlugin():373
com.dremio.exec.catalog.DatasetManager.getTable():215
com.dremio.exec.catalog.CatalogImpl.getTableHelper():472
com.dremio.exec.catalog.CatalogImpl.getTable():225
com.dremio.exec.catalog.CatalogImpl.getTableForQuery():500
com.dremio.exec.catalog.EnterpriseCatalogImpl.getTableForQuery():260
com.dremio.exec.catalog.SourceAccessChecker.lambda$getTableForQuery$4():133
com.dremio.exec.catalog.SourceAccessChecker.getIfVisible():97
com.dremio.exec.catalog.SourceAccessChecker.getTableForQuery():133
com.dremio.exec.catalog.DelegatingCatalog.getTableForQuery():110
com.dremio.exec.catalog.CachingCatalog.getTableForQuery():106
com.dremio.exec.catalog.DremioCatalogReader.getTable():102
com.dremio.exec.catalog.DremioCatalogReader.getTable():79
org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():44
org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():34
org.apache.calcite.sql.validate.DelegatingScope.resolveTable():203
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():105
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3147
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3132
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3399
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
org.apache.calcite.sql.SqlSelect.validate():242
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():931
com.dremio.exec.planner.sql.SqlValidatorImpl.validate():111
com.dremio.exec.planner.sql.SqlValidatorAndToRelContext.validate():81
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():204
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():186
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():178
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():67
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():59
com.dremio.exec.work.foreman.AttemptManager.plan():494
com.dremio.exec.work.foreman.AttemptManager.lambda$run$4():392
com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3():138
com.dremio.service.commandpool.CommandWrapper.run():62
com.dremio.context.RequestContext.run():95
com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3():199
com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run():180
java.util.concurrent.Executors$RunnableAdapter.call():511
java.util.concurrent.FutureTask.run():266
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():750
Caused By (com.amazonaws.SdkClientException) Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse():1738
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse():1434
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest():1356
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper():1139
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():796
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():764
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():738
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():698
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():680
com.amazonaws.http.AmazonHttpClient.execute():544
com.amazonaws.http.AmazonHttpClient.execute():524
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke():1719
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1686
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1675
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole():589
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole():561
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.newSession():321
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.access$000():37
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call():76
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call():73
com.amazonaws.auth.RefreshableTask.refreshValue():257
com.amazonaws.auth.RefreshableTask.blockingRefresh():213
com.amazonaws.auth.RefreshableTask.getValue():154
com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.getCredentials():299
com.dremio.plugins.s3.store.STSCredentialProviderV1.getCredentials():95
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials():123
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext():1251
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers():827
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():777
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():764
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():738
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():698
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():680
com.amazonaws.http.AmazonHttpClient.execute():544
com.amazonaws.http.AmazonHttpClient.execute():524
com.amazonaws.services.s3.AmazonS3Client.invoke():5054
com.amazonaws.services.s3.AmazonS3Client.invoke():5000
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata():1335
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata():1309
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata():904
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():1553
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():117
com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():235
com.dremio.exec.store.iceberg.DremioFileIO.newInputFile():116
org.apache.iceberg.TableMetadataParser.read():245
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0():171
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1():185
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry():404
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded():214
org.apache.iceberg.util.Tasks$Builder.run():198
org.apache.iceberg.util.Tasks$Builder.run():190
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():185
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():170
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():165
com.dremio.exec.store.hive.iceberg.IcebergHiveTableOperations.doRefresh():44
org.apache.iceberg.BaseMetastoreTableOperations.refresh():95
org.apache.iceberg.BaseTable.refresh():59
com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadataFromIceberg():560
com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadata():536
com.dremio.exec.store.hive.HiveStoragePlugin.listPartitionChunks():1290
com.dremio.plugins.awsglue.store.AWSGlueStoragePlugin.listPartitionChunks():653
com.dremio.exec.catalog.DatasetSaverImpl.saveUsingV1Flow():248
com.dremio.exec.catalog.DatasetSaverImpl.save():121
com.dremio.exec.catalog.DatasetSaverImpl.save():143
com.dremio.exec.catalog.EnterpriseDatasetSaver.save():83
com.dremio.exec.catalog.DatasetManager.getTableFromPlugin():373
com.dremio.exec.catalog.DatasetManager.getTable():215
com.dremio.exec.catalog.CatalogImpl.getTableHelper():472
com.dremio.exec.catalog.CatalogImpl.getTable():225
com.dremio.exec.catalog.CatalogImpl.getTableForQuery():500
com.dremio.exec.catalog.EnterpriseCatalogImpl.getTableForQuery():260
com.dremio.exec.catalog.SourceAccessChecker.lambda$getTableForQuery$4():133
com.dremio.exec.catalog.SourceAccessChecker.getIfVisible():97
com.dremio.exec.catalog.SourceAccessChecker.getTableForQuery():133
com.dremio.exec.catalog.DelegatingCatalog.getTableForQuery():110
com.dremio.exec.catalog.CachingCatalog.getTableForQuery():106
com.dremio.exec.catalog.DremioCatalogReader.getTable():102
com.dremio.exec.catalog.DremioCatalogReader.getTable():79
org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():44
org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():34
org.apache.calcite.sql.validate.DelegatingScope.resolveTable():203
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():105
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3147
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3132
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3399
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
org.apache.calcite.sql.SqlSelect.validate():242
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():931
com.dremio.exec.planner.sql.SqlValidatorImpl.validate():111
com.dremio.exec.planner.sql.SqlValidatorAndToRelContext.validate():81
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():204
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():186
com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():178
com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():67
com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():59
com.dremio.exec.work.foreman.AttemptManager.plan():494
com.dremio.exec.work.foreman.AttemptManager.lambda$run$4():392
com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3():138
com.dremio.service.commandpool.CommandWrapper.run():62
com.dremio.context.RequestContext.run():95
com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3():199
com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run():180
java.util.concurrent.Executors$RunnableAdapter.call():511
java.util.concurrent.FutureTask.run():266
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():750
Caused By (java.lang.ClassCastException) com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory
javax.xml.stream.XMLInputFactory.newInstance():-1
com.amazonaws.util.XmlUtils.createXmlInputFactory():63
com.amazonaws.util.XmlUtils.access$000():27
com.amazonaws.util.XmlUtils$1.initialValue():36
com.amazonaws.util.XmlUtils$1.initialValue():33
java.lang.ThreadLocal.setInitialValue():195
java.lang.ThreadLocal.get():172
com.amazonaws.util.XmlUtils.getXmlInputFactory():54
com.amazonaws.http.StaxResponseHandler.handle():94
com.amazonaws.http.StaxResponseHandler.handle():42
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle():69
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse():1714
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse():1434
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest():1356
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper():1139
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():796
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():764
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():738
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():698
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():680
com.amazonaws.http.AmazonHttpClient.execute():544
com.amazonaws.http.AmazonHttpClient.execute():524
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke():1719
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1686
com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1675
...
(output truncated for forum constraints)
dch
December 12, 2022, 9:05pm
5
Did we get any further here?
What else is running on the system?
We are looking at couple of similar issues and suspecting library conflicts and/or startup argument overrides.
@jdlong @balaji.ramaswamy I’m also encountering similar issues
ServiceConfigurationError: javax.xml.stream.XMLInputFactory: com.ctc.wstx.stax.WstxInputFactory not a subtype
. Did you have any luck solving it?
@balaji.ramaswamy I’ve sent you the query profile in private
dch
January 5, 2023, 5:01pm
7
Our Dev team is zeroing in on the problem. I am optimistic we can find a solution within couple of weeks (plus some time to figure out release vehicle).
Sorry and stay tuned!
danh
January 5, 2023, 5:24pm
8
Hey folks–
I am part of the Dev team here at Dremio that has been looking into these errors with Glue Iceberg tables. If you are hitting something like javax.xml.stream.XMLInputFactory: Provider com.ctc.wstx.stax.WstxInputFactory not a subtype
OR Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory)
the below steps may help serve as a workaround.
I have seen this workaround function on v22.1.1 and above.
Create your Glue source that contains the Iceberg tables with your Access key, Secret key, and IAM role you are assuming
Create a S3 source where you can format at least one file or folder (I’ve been using the same authentication method as my Glue source but that doesnt appear to be necessary from my testing)
Go to the SQL Runner
Change the context to this S3 source, create a query against your Glue source, and run it
Once you have done these steps, you should be able to query your Glue source normally using it as the context. See the screenshots below for the error, workaround, and working states.
I hope this helps some of you while we continue to focus on a quality fix.
1 Like
jdlong
January 6, 2023, 12:24am
9
Yeah, I’m stuck at the same error now too.
getFileStatus on s3://xxxxx/jal/df1_iceberg/metadata/00001-727a2396-6e6b-4378-
9377-fc03e91e7501.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall
response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory).
Response Code: 200, Response Text: OK: Unable to unmarshall response
(com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response
Code: 200, Response Text: OK
jdlong
January 6, 2023, 12:32am
10
hey I really appreciate the ideas and I know you all are hard at work here.
we’re on 22.1.1-202208230402290397-a7010f28
so I thought the work around might help me. However, I’m not able to get it right. On your example above, is GlueSourceRepro
a database in Glue Catalog? Or is it an S3 bucket?
In my situation I want to point Dremio at a Glue Catalog database and have Iceberg tables in that database able to be queried by Dremio. So I changed my context
to the Glue Catalog Database. No dice:
danh
January 6, 2023, 4:00pm
11
In my scenario, GlueSourceRepro is a Glue Database with an Iceberg table where the information is stored in an S3 bucket. It was created using these steps: Build an Apache Iceberg data lake using Amazon Athena, Amazon EMR, and AWS Glue | AWS Big Data Blog
The s3 source is just simply the same credentials used with a single file that I formatted.
jdlong
January 6, 2023, 5:15pm
12
what’s the core issue here? I don’t understand. Is the core issue one of permissions? of context? something else?
dch
January 6, 2023, 5:20pm
13
Java’s classloader snafus