Kerberized HDP: HDFS works, Hive doesn't

Dear community,

I am currently attempting to set up a link between Dremio and my Hadoop cluster (Hortonworks HDP 2.6.5.0-292 ). Despite Dremio being installed directly on the head node, I can’t seem to get them to work with one another.
After some trying, I set up some symlinks between the cluster’s .xml config files and the Dremio config folder, which managed to get me access to HDFS. Sadly, this doesn’t extend to Hive. I am still getting the “Failure while configuring source” error message with no further explanation. Every time I try, the server log just gives me a hexadecimal error ID (different each time), but nothing else.

Any ideas what I might be doing wrong or what I might be missing? Thank you very much in advance.

The modified segments of my dremio.conf read as follows:

services: {
coordinator.enabled: true,
coordinator.master.enabled: true,
coordinator.master.embedded-zookeeper.enabled: false
executor.enabled: true
kerberos.principal: “admin/admin@COMPUTE.INTERNAL”
kerberos.keytab.file.path: /etc/admin.keytab
}
zookeeper: “10.104.180.222:2181,10.104.180.140:2181,10.104.180.106:2181”

I have symlinked the following files into the /opt/dremio/conf directory:

core-site.xml
hdfs-site.xml
yarn-site.xml
hive-site.xml
mapred-site.xml

The output in the server.log reads as follows:

2018-06-16 14:07:30,602 [Plugin Startup: Ultron Hive] INFO c.d.e.store.hive.HiveStoragePlugin - Hive Metastore SASL enabled. Kerberos principal: admin/admin@COMPUTE.INTERNAL
2018-06-16 14:08:48,345 [catalog-source-synchronization] WARN c.d.exec.catalog.CatalogServiceImpl - Failure while synchronizing sources.
2018-06-16 14:09:30,604 [qtp1545629340-107] INFO c.d.exec.catalog.CatalogServiceImpl - User Error Occurred [ErrorId: 30ff6d60-38ab-42ab-a8d2-81a268353106]
com.dremio.common.exceptions.UserException: Failure while configuring source [Ultron Hive]
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:746) ~[dremio-common-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.catalog.CatalogServiceImpl.createOrUpdateSource(CatalogServiceImpl.java:629) [dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.catalog.CatalogServiceImpl.createSource(CatalogServiceImpl.java:376) [dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.catalog.CatalogServiceImpl.access$600(CatalogServiceImpl.java:100) [dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.catalog.CatalogServiceImpl$SourceModifier.createSource(CatalogServiceImpl.java:946) [dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.catalog.CatalogImpl.createSource(CatalogImpl.java:529) [dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.exec.catalog.DelegatingCatalog.createSource(DelegatingCatalog.java:182) [dremio-sabot-kernel-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.dac.service.source.SourceService.registerSourceWithRuntime(SourceService.java:147) [dremio-dac-backend-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]
at com.dremio.dac.service.source.SourceService.registerSourceWithRuntime(SourceService.java:138) [dremio-dac-backend-2.0.5-201806021755080191-767cfb5.jar:2.0.5-201806021755080191-767cfb5]

Did you have a look at running Dremio as a YARN app;

https://docs.dremio.com/deployment/yarn-hadoop.html

This might make your deployment more seamless overall.

I’ll let others chime in on specific errors you are seeing.

Thank you very much for the advice. I changed the setup to match the instructions as far as possible and got a step further. The new error message reads as follows:

INFO c.d.exec.catalog.CatalogServiceImpl - User Error Occurred [ErrorId: 9da52059-80a4-4538-9a74-756e14488a93]
com.dremio.common.exceptions.UserException: Failure while configuring source [Ultron Hive]
[…]
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed

The odd part is that this looks like a Kerberos error. However, accessing HDFS, which uses the same Kerberos principal and key server, works like a charm. The principal I set up should have full access to everything, and while exploring HDFS, I never encountered a single Access Denied message or the likes. Any ideas? Thanks in advance.

Silly question but can you verify Hive metastore service is up & running properly? This is different from regular Hiveserver that Dremio uses.
If yes, can you check Hive metastore logs to see if anymore specific errors there?

I can confirm that Hive Metastore is up and running. According to Ambari, all Hive services are online, and the regular queries I tested are working fine.
The hivemetastore.log gives me the following:

2018-06-17 14:58:36,024 ERROR [pool-7-thread-199]: transport.TSaslTransport (TSaslTransport.java:open(315)) - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
[…]
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)
[…]
Caused by: KrbException: Checksum failed
[…]
Caused by: java.security.GeneralSecurityException: Checksum failed

ERROR [pool-7-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: GSS initiate failed
[…]
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
[…]

Another silly question - did you enable SASL in the UI? It’s an option when creating the connection

Yes, SASL is enabled. Settings in the UI are as follows:


I tried both the IP and the FQDN for the host, the result is the same.

Hey there,

Would you mind sharing your core-site.xml, and any server.out and server.log files.

Thanks

Christy

It’s also worth nothing that the kerberos keytab file needs to be readable by the dremio user.

No problem. Here’s my core-site.xml . As for the keytab file, the rights are set to 777, so the dremio user should be able to read it fine.

  <configuration>
    
    <property>
      <name>fs.azure.user.agent.prefix</name>
      <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/2.6.5.0-292</value>
    </property>
    
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://ip-10-104-180-222.eu-central-1.compute.internal:8020</value>
      <final>true</final>
    </property>
    
    <property>
      <name>fs.s3a.fast.upload</name>
      <value>true</value>
    </property>
    
    <property>
      <name>fs.s3a.fast.upload.buffer</name>
      <value>disk</value>
    </property>
    
    <property>
      <name>fs.s3a.multipart.size</name>
      <value>67108864</value>
    </property>
    
    <property>
      <name>fs.s3a.user.agent.prefix</name>
      <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/2.6.5.0-292</value>
    </property>
    
    <property>
      <name>fs.trash.interval</name>
      <value>5</value>
    </property>
    
    <property>
      <name>ha.failover-controller.active-standby-elector.zk.op.retries</name>
      <value>120</value>
    </property>
    
    <property>
      <name>ha.zookeeper.acl</name>
      <value>sasl:nn:rwcda</value>
    </property>
    
    <property>
      <name>hadoop.custom-extensions.root</name>
      <value>/hdp/ext/2.6/hadoop</value>
    </property>
    
    <property>
      <name>hadoop.http.authentication.simple.anonymous.allowed</name>
      <value>true</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.ambari-server-ultron.groups</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.ambari-server-ultron.hosts</name>
      <value>ip-10-104-180-222.eu-central-1.compute.internal</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.dremio.groups</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.dremio.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.hcat.groups</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.hcat.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.hdfs.groups</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.hdfs.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.hive.groups</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.hive.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.HTTP.groups</name>
      <value>users</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.HTTP.hosts</name>
      <value>ip-10-104-180-140.eu-central-1.compute.internal,ip-10-104-180-222.eu-central-1.compute.internal</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.yarn.groups</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.yarn.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>RULE:[1:$1@$0](ambari-qa-ultron@COMPUTE.INTERNAL)s/.*/ambari-qa/
RULE:[1:$1@$0](hbase-ultron@COMPUTE.INTERNAL)s/.*/hbase/
RULE:[1:$1@$0](hdfs-ultron@COMPUTE.INTERNAL)s/.*/hdfs/
RULE:[1:$1@$0](.*@COMPUTE.INTERNAL)s/@.*//
RULE:[2:$1@$0](activity_analyzer@COMPUTE.INTERNAL)s/.*/activity_analyzer/
RULE:[2:$1@$0](activity_explorer@COMPUTE.INTERNAL)s/.*/activity_explorer/
RULE:[2:$1@$0](amshbase@COMPUTE.INTERNAL)s/.*/ams/
RULE:[2:$1@$0](amszk@COMPUTE.INTERNAL)s/.*/ams/
RULE:[2:$1@$0](dn@COMPUTE.INTERNAL)s/.*/hdfs/
RULE:[2:$1@$0](hbase@COMPUTE.INTERNAL)s/.*/hbase/
RULE:[2:$1@$0](hive@COMPUTE.INTERNAL)s/.*/hive/
RULE:[2:$1@$0](jhs@COMPUTE.INTERNAL)s/.*/mapred/
RULE:[2:$1@$0](knox@COMPUTE.INTERNAL)s/.*/knox/
RULE:[2:$1@$0](nfs@COMPUTE.INTERNAL)s/.*/hdfs/
RULE:[2:$1@$0](nm@COMPUTE.INTERNAL)s/.*/yarn/
RULE:[2:$1@$0](nn@COMPUTE.INTERNAL)s/.*/hdfs/
RULE:[2:$1@$0](rm@COMPUTE.INTERNAL)s/.*/yarn/
RULE:[2:$1@$0](yarn@COMPUTE.INTERNAL)s/.*/yarn/
DEFAULT</value>
    </property>
    
    <property>
      <name>hadoop.security.authentication</name>
      <value>kerberos</value>
    </property>
    
    <property>
      <name>hadoop.security.authorization</name>
      <value>true</value>
    </property>
    
    <property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>
    
    <property>
      <name>io.file.buffer.size</name>
      <value>131072</value>
    </property>
    
    <property>
      <name>io.serializations</name>
      <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
    </property>
    
    <property>
      <name>ipc.client.connect.max.retries</name>
      <value>50</value>
    </property>
    
    <property>
      <name>ipc.client.connection.maxidletime</name>
      <value>30000</value>
    </property>
    
    <property>
      <name>ipc.client.idlethreshold</name>
      <value>8000</value>
    </property>
    
    <property>
      <name>ipc.server.tcpnodelay</name>
      <value>true</value>
    </property>
    
    <property>
      <name>mapreduce.jobtracker.webinterface.trusted</name>
      <value>false</value>
    </property>
    
    <property>
      <name>net.topology.script.file.name</name>
      <value>/etc/hadoop/conf/topology_script.py</value>
    </property>
    
  </configuration>

Hey there,

There is a similar thread here:

Would you mind following the instructions there and reporting back the extra log details?

Thanks

Can you also share hive-site.xml please?

Christy: I tried both of your pointers. Switching users doesn’t change the error message. The confusing part is that HDFS works fine, regardless of which user I start the service as.
As for the Java options, I get the following error message while starting the server:
-Dsun.security.spnego.debug=true: command not found
No change in the log output.

Anthony: here’s the hive-site.xml .

<property>
  <name>ambari.hive.db.schema.name</name>
  <value>hive</value>
</property>

<property>
  <name>atlas.hook.hive.maxThreads</name>
  <value>1</value>
</property>

<property>
  <name>atlas.hook.hive.minThreads</name>
  <value>1</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.cache.level2.type</name>
  <value>none</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>hive.auto.convert.join</name>
  <value>true</value>
</property>

<property>
  <name>hive.auto.convert.join.noconditionaltask</name>
  <value>true</value>
</property>

<property>
  <name>hive.auto.convert.join.noconditionaltask.size</name>
  <value>858993459</value>
</property>

<property>
  <name>hive.auto.convert.sortmerge.join</name>
  <value>false</value>
</property>

<property>
  <name>hive.auto.convert.sortmerge.join.to.mapjoin</name>
  <value>false</value>
</property>

<property>
  <name>hive.cbo.enable</name>
  <value>true</value>
</property>

<property>
  <name>hive.cli.print.header</name>
  <value>false</value>
</property>

<property>
  <name>hive.cluster.delegation.token.store.class</name>
  <value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>
</property>

<property>
  <name>hive.cluster.delegation.token.store.zookeeper.connectString</name>
  <value>ip-10-104-180-106.eu-central-1.compute.internal:2181,ip-10-104-180-140.eu-central-1.compute.internal:2181,ip-10-104-180-222.eu-central-1.compute.internal:2181</value>
</property>

<property>
  <name>hive.cluster.delegation.token.store.zookeeper.znode</name>
  <value>/hive/cluster/delegation</value>
</property>

<property>
  <name>hive.compactor.abortedtxn.threshold</name>
  <value>1000</value>
</property>

<property>
  <name>hive.compactor.check.interval</name>
  <value>300L</value>
</property>

<property>
  <name>hive.compactor.delta.num.threshold</name>
  <value>10</value>
</property>

<property>
  <name>hive.compactor.delta.pct.threshold</name>
  <value>0.1f</value>
</property>

<property>
  <name>hive.compactor.initiator.on</name>
  <value>false</value>
</property>

<property>
  <name>hive.compactor.worker.threads</name>
  <value>0</value>
</property>

<property>
  <name>hive.compactor.worker.timeout</name>
  <value>86400L</value>
</property>

<property>
  <name>hive.compute.query.using.stats</name>
  <value>true</value>
</property>

<property>
  <name>hive.conf.restricted.list</name>
  <value>hive.security.authenticator.manager,hive.security.authorization.manager,hive.users.in.admin.role</value>
</property>

<property>
  <name>hive.convert.join.bucket.mapjoin.tez</name>
  <value>false</value>
</property>

<property>
  <name>hive.default.fileformat</name>
  <value>TextFile</value>
</property>

<property>
  <name>hive.default.fileformat.managed</name>
  <value>TextFile</value>
</property>

<property>
  <name>hive.enforce.bucketing</name>
  <value>true</value>
</property>

<property>
  <name>hive.enforce.sorting</name>
  <value>true</value>
</property>

<property>
  <name>hive.enforce.sortmergebucketmapjoin</name>
  <value>true</value>
</property>

<property>
  <name>hive.exec.compress.intermediate</name>
  <value>false</value>
</property>

<property>
  <name>hive.exec.compress.output</name>
  <value>false</value>
</property>

<property>
  <name>hive.exec.dynamic.partition</name>
  <value>true</value>
</property>

<property>
  <name>hive.exec.dynamic.partition.mode</name>
  <value>strict</value>
</property>

<property>
  <name>hive.exec.failure.hooks</name>
  <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>

<property>
  <name>hive.exec.max.created.files</name>
  <value>100000</value>
</property>

<property>
  <name>hive.exec.max.dynamic.partitions</name>
  <value>5000</value>
</property>

<property>
  <name>hive.exec.max.dynamic.partitions.pernode</name>
  <value>2000</value>
</property>

<property>
  <name>hive.exec.orc.compression.strategy</name>
  <value>SPEED</value>
</property>

<property>
  <name>hive.exec.orc.default.compress</name>
  <value>ZLIB</value>
</property>

<property>
  <name>hive.exec.orc.default.stripe.size</name>
  <value>67108864</value>
</property>

<property>
  <name>hive.exec.orc.encoding.strategy</name>
  <value>SPEED</value>
</property>

<property>
  <name>hive.exec.parallel</name>
  <value>false</value>
</property>

<property>
  <name>hive.exec.parallel.thread.number</name>
  <value>8</value>
</property>

<property>
  <name>hive.exec.post.hooks</name>
  <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>

<property>
  <name>hive.exec.pre.hooks</name>
  <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
</property>

<property>
  <name>hive.exec.reducers.bytes.per.reducer</name>
  <value>67108864</value>
</property>

<property>
  <name>hive.exec.reducers.max</name>
  <value>1009</value>
</property>

<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive</value>
</property>

<property>
  <name>hive.exec.submit.local.task.via.child</name>
  <value>true</value>
</property>

<property>
  <name>hive.exec.submitviachild</name>
  <value>false</value>
</property>

<property>
  <name>hive.execution.engine</name>
  <value>mr</value>
</property>

<property>
  <name>hive.fetch.task.aggr</name>
  <value>false</value>
</property>

<property>
  <name>hive.fetch.task.conversion</name>
  <value>more</value>
</property>

<property>
  <name>hive.fetch.task.conversion.threshold</name>
  <value>1073741824</value>
</property>

<property>
  <name>hive.limit.optimize.enable</name>
  <value>true</value>
</property>

<property>
  <name>hive.limit.pushdown.memory.usage</name>
  <value>0.04</value>
</property>

<property>
  <name>hive.map.aggr</name>
  <value>true</value>
</property>

<property>
  <name>hive.map.aggr.hash.force.flush.memory.threshold</name>
  <value>0.9</value>
</property>

<property>
  <name>hive.map.aggr.hash.min.reduction</name>
  <value>0.5</value>
</property>

<property>
  <name>hive.map.aggr.hash.percentmemory</name>
  <value>0.5</value>
</property>

<property>
  <name>hive.mapjoin.bucket.cache.size</name>
  <value>10000</value>
</property>

<property>
  <name>hive.mapjoin.optimized.hashtable</name>
  <value>true</value>
</property>

<property>
  <name>hive.mapred.reduce.tasks.speculative.execution</name>
  <value>false</value>
</property>

<property>
  <name>hive.merge.mapfiles</name>
  <value>true</value>
</property>

<property>
  <name>hive.merge.mapredfiles</name>
  <value>false</value>
</property>

<property>
  <name>hive.merge.orcfile.stripe.level</name>
  <value>true</value>
</property>

<property>
  <name>hive.merge.rcfile.block.level</name>
  <value>true</value>
</property>

<property>
  <name>hive.merge.size.per.task</name>
  <value>256000000</value>
</property>

<property>
  <name>hive.merge.smallfiles.avgsize</name>
  <value>16000000</value>
</property>

<property>
  <name>hive.merge.tezfiles</name>
  <value>false</value>
</property>

<property>
  <name>hive.metastore.authorization.storage.checks</name>
  <value>false</value>
</property>

<property>
  <name>hive.metastore.cache.pinobjtypes</name>
  <value>Table,Database,Type,FieldSchema,Order</value>
</property>

<property>
  <name>hive.metastore.client.connect.retry.delay</name>
  <value>5s</value>
</property>

<property>
  <name>hive.metastore.client.socket.timeout</name>
  <value>1800s</value>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>24</value>
</property>

<property>
  <name>hive.metastore.execute.setugi</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.failure.retries</name>
  <value>24</value>
</property>

<property>
  <name>hive.metastore.kerberos.keytab.file</name>
  <value>/etc/security/keytabs/hive.service.keytab</value>
</property>

<property>
  <name>hive.metastore.kerberos.principal</name>
  <value>hive/_HOST@COMPUTE.INTERNAL</value>
</property>

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>

<property>
  <name>hive.metastore.sasl.enabled</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.server.max.threads</name>
  <value>100000</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://ip-10-104-180-140.eu-central-1.compute.internal:9083</value>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/apps/hive/warehouse</value>
</property>

<property>
  <name>hive.optimize.bucketmapjoin</name>
  <value>true</value>
</property>

<property>
  <name>hive.optimize.bucketmapjoin.sortedmerge</name>
  <value>false</value>
</property>

<property>
  <name>hive.optimize.constant.propagation</name>
  <value>true</value>
</property>

<property>
  <name>hive.optimize.index.filter</name>
  <value>true</value>
</property>

<property>
  <name>hive.optimize.metadataonly</name>
  <value>true</value>
</property>

<property>
  <name>hive.optimize.null.scan</name>
  <value>true</value>
</property>

<property>
  <name>hive.optimize.reducededuplication</name>
  <value>true</value>
</property>

<property>
  <name>hive.optimize.reducededuplication.min.reducer</name>
  <value>4</value>
</property>

<property>
  <name>hive.optimize.sort.dynamic.partition</name>
  <value>false</value>
</property>

<property>
  <name>hive.orc.compute.splits.num.threads</name>
  <value>10</value>
</property>

<property>
  <name>hive.orc.splits.include.file.footer</name>
  <value>false</value>
</property>

<property>
  <name>hive.prewarm.enabled</name>
  <value>false</value>
</property>

<property>
  <name>hive.prewarm.numcontainers</name>
  <value>3</value>
</property>

<property>
  <name>hive.security.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value>
</property>

<property>
  <name>hive.security.authorization.enabled</name>
  <value>false</value>
</property>

<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory</value>
</property>

<property>
  <name>hive.security.metastore.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>

<property>
  <name>hive.security.metastore.authorization.auth.reads</name>
  <value>true</value>
</property>

<property>
  <name>hive.security.metastore.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
</property>

<property>
  <name>hive.server2.allow.user.substitution</name>
  <value>true</value>
</property>

<property>
  <name>hive.server2.authentication</name>
  <value>KERBEROS</value>
</property>

<property>
  <name>hive.server2.authentication.kerberos.keytab</name>
  <value>/etc/security/keytabs/hive.service.keytab</value>
</property>

<property>
  <name>hive.server2.authentication.kerberos.principal</name>
  <value>hive/_HOST@COMPUTE.INTERNAL</value>
</property>

<property>
  <name>hive.server2.authentication.spnego.keytab</name>
  <value>/etc/security/keytabs/spnego.service.keytab</value>
</property>

<property>
  <name>hive.server2.authentication.spnego.principal</name>
  <value>HTTP/_HOST@COMPUTE.INTERNAL</value>
</property>

<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
</property>

<property>
  <name>hive.server2.logging.operation.enabled</name>
  <value>true</value>
</property>

<property>
  <name>hive.server2.logging.operation.log.location</name>
  <value>/tmp/hive/operation_logs</value>
</property>

<property>
  <name>hive.server2.max.start.attempts</name>
  <value>5</value>
</property>

<property>
  <name>hive.server2.support.dynamic.service.discovery</name>
  <value>true</value>
</property>

<property>
  <name>hive.server2.table.type.mapping</name>
  <value>CLASSIC</value>
</property>

<property>
  <name>hive.server2.tez.default.queues</name>
  <value>default</value>
</property>

<property>
  <name>hive.server2.tez.initialize.default.sessions</name>
  <value>false</value>
</property>

<property>
  <name>hive.server2.tez.sessions.per.default.queue</name>
  <value>1</value>
</property>

<property>
  <name>hive.server2.thrift.http.path</name>
  <value>cliservice</value>
</property>

<property>
  <name>hive.server2.thrift.http.port</name>
  <value>10001</value>
</property>

<property>
  <name>hive.server2.thrift.max.worker.threads</name>
  <value>500</value>
</property>

<property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
</property>

<property>
  <name>hive.server2.thrift.sasl.qop</name>
  <value>auth</value>
</property>

<property>
  <name>hive.server2.transport.mode</name>
  <value>binary</value>
</property>

<property>
  <name>hive.server2.use.SSL</name>
  <value>false</value>
</property>

<property>
  <name>hive.server2.zookeeper.namespace</name>
  <value>hiveserver2</value>
</property>

<property>
  <name>hive.smbjoin.cache.rows</name>
  <value>10000</value>
</property>

<property>
  <name>hive.start.cleanup.scratchdir</name>
  <value>false</value>
</property>

<property>
  <name>hive.stats.autogather</name>
  <value>true</value>
</property>

<property>
  <name>hive.stats.dbclass</name>
  <value>fs</value>
</property>

<property>
  <name>hive.stats.fetch.column.stats</name>
  <value>true</value>
</property>

<property>
  <name>hive.stats.fetch.partition.stats</name>
  <value>true</value>
</property>

<property>
  <name>hive.support.concurrency</name>
  <value>false</value>
</property>

<property>
  <name>hive.tez.auto.reducer.parallelism</name>
  <value>true</value>
</property>

<property>
  <name>hive.tez.container.size</name>
  <value>3072</value>
</property>

<property>
  <name>hive.tez.cpu.vcores</name>
  <value>-1</value>
</property>

<property>
  <name>hive.tez.dynamic.partition.pruning</name>
  <value>true</value>
</property>

<property>
  <name>hive.tez.dynamic.partition.pruning.max.data.size</name>
  <value>104857600</value>
</property>

<property>
  <name>hive.tez.dynamic.partition.pruning.max.event.size</name>
  <value>1048576</value>
</property>

<property>
  <name>hive.tez.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.HiveInputFormat</value>
</property>

<property>
  <name>hive.tez.java.opts</name>
  <value>-server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps</value>
</property>

<property>
  <name>hive.tez.log.level</name>
  <value>INFO</value>
</property>

<property>
  <name>hive.tez.max.partition.factor</name>
  <value>2.0</value>
</property>

<property>
  <name>hive.tez.min.partition.factor</name>
  <value>0.25</value>
</property>

<property>
  <name>hive.tez.smb.number.waves</name>
  <value>0.5</value>
</property>

<property>
  <name>hive.txn.manager</name>
  <value>org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager</value>
</property>

<property>
  <name>hive.txn.max.open.batch</name>
  <value>1000</value>
</property>

<property>
  <name>hive.txn.timeout</name>
  <value>300</value>
</property>

<property>
  <name>hive.user.install.directory</name>
  <value>/user/</value>
</property>

<property>
  <name>hive.vectorized.execution.enabled</name>
  <value>true</value>
</property>

<property>
  <name>hive.vectorized.execution.reduce.enabled</name>
  <value>false</value>
</property>

<property>
  <name>hive.vectorized.groupby.checkinterval</name>
  <value>4096</value>
</property>

<property>
  <name>hive.vectorized.groupby.flush.percent</name>
  <value>0.1</value>
</property>

<property>
  <name>hive.vectorized.groupby.maxentries</name>
  <value>100000</value>
</property>

<property>
  <name>hive.warehouse.subdir.inherit.perms</name>
  <value>true</value>
</property>

<property>
  <name>hive.zookeeper.client.port</name>
  <value>2181</value>
</property>

<property>
  <name>hive.zookeeper.namespace</name>
  <value>hive_zookeeper_namespace</value>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <value>ip-10-104-180-106.eu-central-1.compute.internal:2181,ip-10-104-180-140.eu-central-1.compute.internal:2181,ip-10-104-180-222.eu-central-1.compute.internal:2181</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://ip-10-104-180-140.eu-central-1.compute.internal/hive?createDatabaseIfNotExist=true</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

I am using the HDP/HDF version 3.1 (Hortonworks), I looked at the YARN based installation and Standalone, what is the recommended setup in large scale setting. Keeping in mind technology isolation and at the same performance gain. Any recommendation

@datageek

If you it is a large scale setting and most of your data is on prem Hadoop then using Yarn to deploy the Dremio executors on the Hadoop data nodes would be your best option. Also Coordinator needs to be on an edge node and not on the data node

Thanks
@balaji.ramaswamy