IllegalStateException when selecting from Elasticsearch source containing GeoJson

When querying an Elasticsearch index with GeoJson, the query fails with the error below. The field coordinates refers to a key within a GeoJson object. An example of GeoJson and coordinates can be found here http://wiki.geojson.org/GeoJSON_draft_version_6#MultiPolygon

Is there any way to get Dremio to ignore this field? I don’t need its values for the task I’m working on.

 DATA_READ ERROR: IllegalStateException

Line  1
Column  1485
Field  coordinates
Line  1
Column  1485
Field  coordinates
Line  1
Column  1485
Field  coordinates
Line  1
Column  1485
Field  coordinates
SQL Query SELECT id
FROM "Elastic Search Production".fields.field


  (java.lang.IllegalStateException) null
    com.dremio.plugins.elastic.execution.WriteHolders$InvalidWriteHolder.writeList():58
    com.dremio.plugins.elastic.execution.FieldReadDefinition.writeList():126
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredList():356
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredList():348
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredList():348
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredList():348
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredMap():311
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredMap():324
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeDeclaredMap():324
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.writeToVector():214
    com.dremio.plugins.elastic.execution.ElasticsearchJsonReader.write():182
    com.dremio.plugins.elastic.execution.ElasticsearchRecordReader.next():351
    com.dremio.plugins.elastic.ElasticTableBuilder.getSampledSchema():322
    com.dremio.plugins.elastic.ElasticTableBuilder.populate():190
    com.dremio.plugins.elastic.ElasticTableBuilder.buildIfNecessary():161
    com.dremio.plugins.elastic.ElasticTableBuilder.getDataset():142
    com.dremio.exec.catalog.DatasetManager.getTableFromPlugin():297
    com.dremio.exec.catalog.DatasetManager.getTable():190
    com.dremio.exec.catalog.CatalogImpl.getTable():134
    com.dremio.exec.catalog.DelegatingCatalog.getTable():57
    com.dremio.exec.catalog.CachingCatalog.getTable():66
    com.dremio.exec.catalog.DremioCatalogReader.getTable():79
    com.dremio.exec.catalog.DremioCatalogReader.getTable():65
    org.apache.calcite.sql.validate.EmptyScope.getTableNamespace():71
    org.apache.calcite.sql.validate.DelegatingScope.getTableNamespace():189
    org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():104
    org.apache.calcite.sql.validate.AbstractNamespace.validate():84
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2859
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2844
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3077
    org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
    org.apache.calcite.sql.validate.AbstractNamespace.validate():84
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():910
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():891
    org.apache.calcite.sql.SqlSelect.validate():208
    org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():866
    org.apache.calcite.sql.validate.SqlValidatorImpl.validate():577
    com.dremio.exec.planner.sql.SqlConverter.validate():168
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateNode():176
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():165
    com.dremio.exec.planner.sql.handlers.PrelTransformer.validateAndConvert():161
    com.dremio.exec.planner.sql.handlers.query.NormalHandler.getPlan():43
    com.dremio.exec.planner.sql.handlers.commands.HandlerToExec.plan():66
    com.dremio.exec.work.foreman.AttemptManager.run():293
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748

Can you confirm what version of ES you are on?

The ES version is 5.0.2

"version": {
  "number": "5.0.2",
  "build_hash": "f6b4951",
  "build_date": "2016-11-24T10:07:18.101Z",
  "build_snapshot": false,
  "lucene_version": "6.2.1"}

Hi @Aaron_Santos,

Can you validate the contents of the field for all rows are valid?

Could you send a sample of the data and mapping for that field?

The mapping for this part of the doc looks like this

"boundary": {
    "type": "geo_shape"
}

An example value is

{
   "type": "MultiPolygon",
   "coordinates": [
       [
           [ [102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0] ]
       ],
       [
           [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ],
           [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]
       ]
   ]
}

The value is just an example because the geo location contains personally identifiable information, but it is representative of what we store.

In some cases where boundary values may be empty ie:

{
   "type": "MultiPolygon",
   "coordinates": []
}

Determining the validity of geospatial values is an interesting topic. There are 4.1M documents in this index so it’s not something I can check by hand. Every boundary is valid GeoJSON as far as I know and coordinate values are in lat,lon with appropriate values ranges. However I’m not certain that the values don’t have flaws like self-intersections, duplicate rings, etc. Let me know if you need more specifics. :slight_smile:

Geoshape type is something Dremio supports - https://docs.dremio.com/data-sources/elasticsearch.html

Would you happen to have multiple fields with the same name (maybe different case) in the same index?

Would you happen to have multiple fields with the same name (maybe different case) in the same index?

We do. One example is another field named boundary that exists at different location in docs in this index. Its mapping is

"boundary": {
  "type": "keyword"
}

Could that duplication of fields with the same name but different locations and mappings cause this issue?

We currently don’t support multiple fields with the same name within the same index, it is something we are currently scoping. Can you try another index without duped fields with geoshape type?

Good idea. I was able to pull up another index that contains polygons just fine. Thanks for taking the time to find a root cause and explain how it works.

1 Like