Unable to create/read Glue Iceberg tables

Hi,

I have some Iceberg tables in AWS Glue, created either by Spark or Athena.
In Dremio AWS Edition (22.1.1-20220823) I tried to query those Glue tables, and got this error:

getFileStatus on s3://<our_s3_bucket>/<our_table_s3_prefix>/metadata/00000-d9cb0fa7-5ee5-43ab-bf24-af8e64eb891f.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK

On the Datasets tab, when clicking on the table name to see the details, I got this message Cannot provide more information about this dataset.
Legacy Glue external tables (using parquet only) could still be queried without any issues.

When I tried to view the S3 folder for my table in an S3 source, I could preview the data, which was recognised as in Iceberg format. But when I clicked “Save”, I got the error “Failed to get iceberg metadata”. I guess it’s facing the same issue as with the Glue table.

I also tried to create a table using this query:

create table test_dremio as select * from test_legacy 

and got a similar error:

getFileStatus on s3://<our_bucket>/<warehouse_base_prefix>/test_dremio/metadata/0942d9f3-181e-4ab3-9d6c-47a7653cae18.avro: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK

Could someone please help?

Thanks!

I just tried writing a table to Iceberg format using Athena then reading it with Dremio and I get a very similar error:

image

@jdlong Any chance you are able to share the job profile?

hey @balaji.ramaswamy

Interestingly this morning I re-ran and the error is different… so things are moving underneath me a bit. Here’s new error:

and here’s the verbose error from the job profile (is that what you wanted?)

SYSTEM ERROR: RuntimeException: native zStandard library not available: this version of libhadoop was built without zstd support.

SqlOperatorImpl TABLE_FUNCTION
Location 0:0:6
Fragment 0:0

[Error Id: c3aa1479-aa1a-47c4-895a-a849052f927c on 10.134.16.182:0]

  (com.dremio.common.exceptions.ExecutionSetupException) java.io.IOException: Failed to set up column readers
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():441
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():111
    com.dremio.sabot.driver.Pipeline.pumpOnce():101
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
    com.dremio.sabot.task.slicing.SlicingThread.run():171
  Caused By (java.io.IOException) Failed to set up column readers
    com.dremio.parquet.reader.SimpleReader.newRowGroupReader():89
    com.dremio.parquet.reader.SimpleReader.newRowGroupReader():34
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():437
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():111
    com.dremio.sabot.driver.Pipeline.pumpOnce():101
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
    com.dremio.sabot.task.slicing.SlicingThread.run():171
  Caused By (org.apache.parquet.hadoop.DirectCodecFactory.DirectCodecPool.ParquetCompressionCodecException) Error creating compression codec pool.
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():441
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():371
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool.codec():473
    org.apache.parquet.hadoop.DirectCodecFactory.createDecompressor():144
    org.apache.parquet.hadoop.CodecFactory.getDecompressor():199
    org.apache.parquet.hadoop.CodecFactory.getDecompressor():42
    com.dremio.parquet.pages.BaseReaderIterator.<init>():54
    com.dremio.parquet.pages.IncrementalPageReaderIterator.<init>():47
    com.dremio.parquet.pages.FSPageReader.openPageIterator():67
    com.dremio.parquet.reader.SimpleReader.getPrimitiveColumnReader():123
    com.dremio.parquet.reader.SimpleReader.getColumnReader():100
    com.dremio.parquet.reader.SimpleReader.newRowGroupReader():77
    com.dremio.parquet.reader.SimpleReader.newRowGroupReader():34
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():437
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():111
    com.dremio.sabot.driver.Pipeline.pumpOnce():101
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
    com.dremio.sabot.task.slicing.SlicingThread.run():171
  Caused By (java.lang.RuntimeException) native zStandard library not available: this version of libhadoop was built without zstd support.
    org.apache.hadoop.io.compress.ZStandardCodec.checkNativeCodeLoaded():65
    org.apache.hadoop.io.compress.ZStandardCodec.createCompressor():164
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool$1.makeObject():385
    org.apache.commons.pool.impl.GenericObjectPool.borrowObject():1188
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():389
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool$CodecPool.<init>():371
    org.apache.parquet.hadoop.DirectCodecFactory$DirectCodecPool.codec():473
    org.apache.parquet.hadoop.DirectCodecFactory.createDecompressor():144
    org.apache.parquet.hadoop.CodecFactory.getDecompressor():199
    org.apache.parquet.hadoop.CodecFactory.getDecompressor():42
    com.dremio.parquet.pages.BaseReaderIterator.<init>():54
    com.dremio.parquet.pages.IncrementalPageReaderIterator.<init>():47
    com.dremio.parquet.pages.FSPageReader.openPageIterator():67
    com.dremio.parquet.reader.SimpleReader.getPrimitiveColumnReader():123
    com.dremio.parquet.reader.SimpleReader.getColumnReader():100
    com.dremio.parquet.reader.SimpleReader.newRowGroupReader():77
    com.dremio.parquet.reader.SimpleReader.newRowGroupReader():34
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.setup():437
    com.dremio.exec.store.parquet.UnifiedParquetReader.setup():252
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setupCurrentReader():216
    com.dremio.exec.store.parquet.TransactionalTableParquetReader.setup():190
    com.dremio.exec.store.parquet.ParquetCoercionReader.setup():81
    com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.setup():75
    com.dremio.exec.store.parquet.ScanTableFunction.setupNextReader():167
    com.dremio.exec.store.parquet.ScanTableFunction.startRow():155
    com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():96
    com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():193
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():111
    com.dremio.sabot.driver.Pipeline.pumpOnce():101
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():418
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():355
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():97
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():820
    com.dremio.sabot.task.AsyncTaskWrapper.run():120
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():247
    com.dremio.sabot.task.slicing.SlicingThread.run():171


SqlOperatorImpl TABLE_FUNCTION
Location 0:0:6
Fragment 0:0

com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader(ParquetVectorizedReader.java:441)
com.dremio.exec.store.parquet.UnifiedParquetReader(UnifiedParquetReader.java:252)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:216)
com.dremio.exec.store.parquet.TransactionalTableParquetReader(TransactionalTableParquetReader.java:190)
com.dremio.exec.store.parquet.ParquetCoercionReader(ParquetCoercionReader.java:81)
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader(AdditionalColumnsRecordReader.java:75)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:167)
com.dremio.exec.store.parquet.ScanTableFunction(ScanTableFunction.java:155)
com.dremio.sabot.op.tablefunction.TableFunctionOperator(TableFunctionOperator.java:96)
com.dremio.sabot.driver.SmartOp$SmartSingleInput(SmartOp.java:193)
com.dremio.sabot.driver.StraightPipe(StraightPipe.java:56)
com.dremio.sabot.driver.Pipeline(Pipeline.java:111)
com.dremio.sabot.driver.Pipeline(Pipeline.java:101)
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper(FragmentExecutor.java:418)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:355)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:97)
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl(FragmentExecutor.java:820)
com.dremio.sabot.task.AsyncTaskWrapper(AsyncTaskWrapper.java:120)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:247)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:171)

it looks like I can’t attache the profile JSON files directly, so if you want one of those, let me know and I’ll paste in its text

fwiw, the table was created in Athena with compression set to SNAPPY

I also tried with compression set to NONE and I get a different error… see below:

         IO_EXCEPTION ERROR: getFileStatus on s3://rnr-datalabs/scratchpad/jal/df1_iceberg/metadata/00001-9be51136-abf1-45d3-9165-7af2cbc1619b.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK

SQL Query select * from "AWS_Glue_Catalog_DataLab"."jal_testing_db"."jal_testing_df1_iceberg"


  (org.apache.hadoop.fs.s3a.AWSClientIOException) getFileStatus on s3://rnr-datalabs/scratchpad/jal/df1_iceberg/metadata/00001-9be51136-abf1-45d3-9165-7af2cbc1619b.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():128
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():101
    org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():1571
    org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():117
    com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():235
    com.dremio.exec.store.iceberg.DremioFileIO.newInputFile():116
    org.apache.iceberg.TableMetadataParser.read():245
    org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0():171
    org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1():185
    org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry():404
    org.apache.iceberg.util.Tasks$Builder.runSingleThreaded():214
    org.apache.iceberg.util.Tasks$Builder.run():198
    org.apache.iceberg.util.Tasks$Builder.run():190
    org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():185
    org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():170
    org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():165
    com.dremio.exec.store.hive.iceberg.IcebergHiveTableOperations.doRefresh():44
    org.apache.iceberg.BaseMetastoreTableOperations.refresh():95
    org.apache.iceberg.BaseTable.refresh():59
    com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadataFromIceberg():560
    com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadata():536
    com.dremio.exec.store.hive.HiveStoragePlugin.listPartitionChunks():1290
    com.dremio.plugins.awsglue.store.AWSGlueStoragePlugin.listPartitionChunks():653
    com.dremio.exec.catalog.DatasetSaverImpl.saveUsingV1Flow():248
    com.dremio.exec.catalog.DatasetSaverImpl.save():121
    com.dremio.exec.catalog.DatasetSaverImpl.save():143
    com.dremio.exec.catalog.EnterpriseDatasetSaver.save():83
    com.dremio.exec.catalog.DatasetManager.getTableFromPlugin():373
    com.dremio.exec.catalog.DatasetManager.getTable():215
    com.dremio.exec.catalog.CatalogImpl.getTableHelper():472
    com.dremio.exec.catalog.CatalogImpl.getTable():225
    com.dremio.exec.catalog.CatalogImpl.getTableForQuery():500
    com.dremio.exec.catalog.EnterpriseCatalogImpl.getTableForQuery():260
    com.dremio.exec.catalog.SourceAccessChecker.lambda$getTableForQuery$4():133
    com.dremio.exec.catalog.SourceAccessChecker.getIfVisible():97
    com.dremio.exec.catalog.SourceAccessChecker.getTableForQuery():133
    com.dremio.exec.catalog.DelegatingCatalog.getTableForQuery():110
    com.dremio.exec.catalog.CachingCatalog.getTableForQuery():106
    com.dremio.exec.catalog.DremioCatalogReader.getTable():102
    com.dremio.exec.catalog.DremioCatalogReader.getTable():79
    org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():44
    org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():34
    org.apache.calcite.sql.validate.DelegatingScope.resolveTable():203
    org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():105
    org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
    org.apache.calcite.sql.validate.AbstractNamespace.validate():84
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3147
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3132
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3399
    org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
    org.apache.calcite.sql.validate.AbstractNamespace.validate():84
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
    org.apache.calcite.sql.SqlSelect.validate():242
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():931
    com.dremio.exec.planner.sql.SqlValidatorImpl.validate():111
    com.dremio.exec.planner.sql.SqlValidatorAndToRelContext.validate():81
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():204
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():186
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():178
    com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():67
    com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():59
    com.dremio.exec.work.foreman.AttemptManager.plan():494
    com.dremio.exec.work.foreman.AttemptManager.lambda$run$4():392
    com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3():138
    com.dremio.service.commandpool.CommandWrapper.run():62
    com.dremio.context.RequestContext.run():95
    com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3():199
    com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run():180
    java.util.concurrent.Executors$RunnableAdapter.call():511
    java.util.concurrent.FutureTask.run():266
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():750
  Caused By (com.amazonaws.SdkClientException) Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: OK
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse():1738
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse():1434
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest():1356
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper():1139
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():796
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():764
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():738
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():698
    com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():680
    com.amazonaws.http.AmazonHttpClient.execute():544
    com.amazonaws.http.AmazonHttpClient.execute():524
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke():1719
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1686
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1675
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole():589
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole():561
    com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.newSession():321
    com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.access$000():37
    com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call():76
    com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call():73
    com.amazonaws.auth.RefreshableTask.refreshValue():257
    com.amazonaws.auth.RefreshableTask.blockingRefresh():213
    com.amazonaws.auth.RefreshableTask.getValue():154
    com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.getCredentials():299
    com.dremio.plugins.s3.store.STSCredentialProviderV1.getCredentials():95
    org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials():123
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext():1251
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers():827
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():777
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():764
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():738
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():698
    com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():680
    com.amazonaws.http.AmazonHttpClient.execute():544
    com.amazonaws.http.AmazonHttpClient.execute():524
    com.amazonaws.services.s3.AmazonS3Client.invoke():5054
    com.amazonaws.services.s3.AmazonS3Client.invoke():5000
    com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata():1335
    com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata():1309
    org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata():904
    org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():1553
    org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():117
    com.dremio.exec.store.hive.exec.dfs.DremioHadoopFileSystemWrapper.getFileAttributes():235
    com.dremio.exec.store.iceberg.DremioFileIO.newInputFile():116
    org.apache.iceberg.TableMetadataParser.read():245
    org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0():171
    org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1():185
    org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry():404
    org.apache.iceberg.util.Tasks$Builder.runSingleThreaded():214
    org.apache.iceberg.util.Tasks$Builder.run():198
    org.apache.iceberg.util.Tasks$Builder.run():190
    org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():185
    org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():170
    org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation():165
    com.dremio.exec.store.hive.iceberg.IcebergHiveTableOperations.doRefresh():44
    org.apache.iceberg.BaseMetastoreTableOperations.refresh():95
    org.apache.iceberg.BaseTable.refresh():59
    com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadataFromIceberg():560
    com.dremio.exec.store.hive.metadata.HiveMetadataUtils.getTableMetadata():536
    com.dremio.exec.store.hive.HiveStoragePlugin.listPartitionChunks():1290
    com.dremio.plugins.awsglue.store.AWSGlueStoragePlugin.listPartitionChunks():653
    com.dremio.exec.catalog.DatasetSaverImpl.saveUsingV1Flow():248
    com.dremio.exec.catalog.DatasetSaverImpl.save():121
    com.dremio.exec.catalog.DatasetSaverImpl.save():143
    com.dremio.exec.catalog.EnterpriseDatasetSaver.save():83
    com.dremio.exec.catalog.DatasetManager.getTableFromPlugin():373
    com.dremio.exec.catalog.DatasetManager.getTable():215
    com.dremio.exec.catalog.CatalogImpl.getTableHelper():472
    com.dremio.exec.catalog.CatalogImpl.getTable():225
    com.dremio.exec.catalog.CatalogImpl.getTableForQuery():500
    com.dremio.exec.catalog.EnterpriseCatalogImpl.getTableForQuery():260
    com.dremio.exec.catalog.SourceAccessChecker.lambda$getTableForQuery$4():133
    com.dremio.exec.catalog.SourceAccessChecker.getIfVisible():97
    com.dremio.exec.catalog.SourceAccessChecker.getTableForQuery():133
    com.dremio.exec.catalog.DelegatingCatalog.getTableForQuery():110
    com.dremio.exec.catalog.CachingCatalog.getTableForQuery():106
    com.dremio.exec.catalog.DremioCatalogReader.getTable():102
    com.dremio.exec.catalog.DremioCatalogReader.getTable():79
    org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():44
    org.apache.calcite.sql.validate.DremioEmptyScope.resolveTable():34
    org.apache.calcite.sql.validate.DelegatingScope.resolveTable():203
    org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():105
    org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
    org.apache.calcite.sql.validate.AbstractNamespace.validate():84
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3147
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():3132
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3399
    org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
    org.apache.calcite.sql.validate.AbstractNamespace.validate():84
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():975
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():956
    org.apache.calcite.sql.SqlSelect.validate():242
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():931
    com.dremio.exec.planner.sql.SqlValidatorImpl.validate():111
    com.dremio.exec.planner.sql.SqlValidatorAndToRelContext.validate():81
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():204
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():186
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():178
    com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():67
    com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():59
    com.dremio.exec.work.foreman.AttemptManager.plan():494
    com.dremio.exec.work.foreman.AttemptManager.lambda$run$4():392
    com.dremio.service.commandpool.ReleasableBoundCommandPool.lambda$getWrappedCommand$3():138
    com.dremio.service.commandpool.CommandWrapper.run():62
    com.dremio.context.RequestContext.run():95
    com.dremio.common.concurrent.ContextMigratingExecutorService.lambda$decorate$3():199
    com.dremio.common.concurrent.ContextMigratingExecutorService$ComparableRunnable.run():180
    java.util.concurrent.Executors$RunnableAdapter.call():511
    java.util.concurrent.FutureTask.run():266
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():750
  Caused By (java.lang.ClassCastException) com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory
    javax.xml.stream.XMLInputFactory.newInstance():-1
    com.amazonaws.util.XmlUtils.createXmlInputFactory():63
    com.amazonaws.util.XmlUtils.access$000():27
    com.amazonaws.util.XmlUtils$1.initialValue():36
    com.amazonaws.util.XmlUtils$1.initialValue():33
    java.lang.ThreadLocal.setInitialValue():195
    java.lang.ThreadLocal.get():172
    com.amazonaws.util.XmlUtils.getXmlInputFactory():54
    com.amazonaws.http.StaxResponseHandler.handle():94
    com.amazonaws.http.StaxResponseHandler.handle():42
    com.amazonaws.http.response.AwsResponseHandlerAdapter.handle():69
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse():1714
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse():1434
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest():1356
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper():1139
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():796
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():764
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():738
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():698
    com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():680
    com.amazonaws.http.AmazonHttpClient.execute():544
    com.amazonaws.http.AmazonHttpClient.execute():524
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke():1719
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1686
    com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke():1675
    ...

(output truncated for forum constraints)

Did we get any further here?
What else is running on the system?
We are looking at couple of similar issues and suspecting library conflicts and/or startup argument overrides.

@jdlong @balaji.ramaswamy I’m also encountering similar issues

ServiceConfigurationError: javax.xml.stream.XMLInputFactory: com.ctc.wstx.stax.WstxInputFactory not a subtype. Did you have any luck solving it?

@balaji.ramaswamy I’ve sent you the query profile in private

Our Dev team is zeroing in on the problem. I am optimistic we can find a solution within couple of weeks (plus some time to figure out release vehicle).
Sorry and stay tuned!

Hey folks–

I am part of the Dev team here at Dremio that has been looking into these errors with Glue Iceberg tables. If you are hitting something like javax.xml.stream.XMLInputFactory: Provider com.ctc.wstx.stax.WstxInputFactory not a subtype OR Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory) the below steps may help serve as a workaround.

I have seen this workaround function on v22.1.1 and above.

  1. Create your Glue source that contains the Iceberg tables with your Access key, Secret key, and IAM role you are assuming
  2. Create a S3 source where you can format at least one file or folder (I’ve been using the same authentication method as my Glue source but that doesnt appear to be necessary from my testing)
  3. Go to the SQL Runner
  4. Change the context to this S3 source, create a query against your Glue source, and run it

Once you have done these steps, you should be able to query your Glue source normally using it as the context. See the screenshots below for the error, workaround, and working states.



I hope this helps some of you while we continue to focus on a quality fix.

1 Like

Yeah, I’m stuck at the same error now too.

getFileStatus on s3://xxxxx/jal/df1_iceberg/metadata/00001-727a2396-6e6b-4378-
9377-fc03e91e7501.metadata.json: com.amazonaws.SdkClientException: Unable to unmarshall 
response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). 
Response Code: 200, Response Text: OK: Unable to unmarshall response 
(com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory). Response 
Code: 200, Response Text: OK

hey I really appreciate the ideas and I know you all are hard at work here.

we’re on 22.1.1-202208230402290397-a7010f28 so I thought the work around might help me. However, I’m not able to get it right. On your example above, is GlueSourceRepro a database in Glue Catalog? Or is it an S3 bucket?

In my situation I want to point Dremio at a Glue Catalog database and have Iceberg tables in that database able to be queried by Dremio. So I changed my context to the Glue Catalog Database. No dice:

In my scenario, GlueSourceRepro is a Glue Database with an Iceberg table where the information is stored in an S3 bucket. It was created using these steps: Build an Apache Iceberg data lake using Amazon Athena, Amazon EMR, and AWS Glue | AWS Big Data Blog

The s3 source is just simply the same credentials used with a single file that I formatted.

what’s the core issue here? I don’t understand. Is the core issue one of permissions? of context? something else?

Java’s classloader snafus :frowning: