Failure to connect to hive metastore in EMR

1,881 views
Skip to first unread message

nel...@leantaas.com

unread,
Aug 17, 2017, 11:47:09 AM8/17/17
to Kylo Community
I am running edge node which is connecting to EMR cluster. Creating tables in hive is working. Although when I run Verify and Split process I see in logs that it is failing to connect to Hive Metastore.
I took hive-site.xml from EMR cluster and it seems like hive server is running on EMR.

Could you please recommend how to troubleshoot this issue?

Thank you,
Nella

Greg Hart

unread,
Aug 17, 2017, 12:40:13 PM8/17/17
to Kylo Community
Hi Nella,

Can test to see if the issue is with Spark / EMR or with Kylo?
spark-shell
> sqlContext.sql('use default')
> sqlContext.sql('show tables').collect()


nel...@leantaas.com

unread,
Aug 17, 2017, 3:42:06 PM8/17/17
to Kylo Community
when I run spark-shell I have this error

Required executor memory (4608+460 MB) is above the max threshold (3072 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb

although spark-submit works correctly. I have not been able to figure out how to change configuration. Could you please recommend what do to about this one?

Thank you,
Nella

Greg Hart

unread,
Aug 17, 2017, 4:05:40 PM8/17/17
to Kylo Community
Try this:
spark-shell --master local

nel...@leantaas.com

unread,
Aug 17, 2017, 4:45:53 PM8/17/17
to Kylo Community
spark-shell --master local
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN hive.metastore: Failed to connect to the MetaStore Server...

Greg Hart

unread,
Aug 17, 2017, 4:51:24 PM8/17/17
to Kylo Community
Hi Nella,

That looks like an EMR error. You can try asking in the AWS EMR forums or contacting Think Big Analytics for additional support options.

Once the EMR issue is resolved then I can help you ensure that the 'Verify and Split Records' Spark job runs correctly.

nel...@leantaas.com

unread,
Aug 18, 2017, 5:40:34 PM8/18/17
to Kylo Community
Greg,

Thank you!! I have created a new EMR cluster and seems like spark-shell --master local works so far.

I have repointed edge node to this new cluster and created trust and tunnel. Now Register Tables box is failing with error

Cannot create PoolableConnectionFactory (Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default: java.net.ConnectException: Connection refused (Connection refused))

It used to work with the previous EMR. Could you please help where to look for the cause of this issue?

Thank you,
Nella

Greg Hart

unread,
Aug 18, 2017, 6:05:59 PM8/18/17
to Kylo Community
Hi Nella,

This is a known issue and the work-around is to restart the Hive Thrift Service controller service whenever HiveServer2 is restarted. It should be fixed in the upcoming Kylo v0.8.3 release.

nel...@leantaas.com

unread,
Aug 18, 2017, 6:54:38 PM8/18/17
to Kylo Community
Thank you so much!!!

It fixed the issue with table registration.

Now I am having an error in Validate and Split

org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'init21_valid' not found in database 'website';

Although I see in hive that this table has been created.

Greg Hart

unread,
Aug 18, 2017, 7:58:09 PM8/18/17
to Kylo Community
Hi Nella,

That message usually indicates that Spark cannot find your hive-site.xml configuration. You can try adding your hive-site.xml to the 'Extra Files' property of the 'Validate and Split Records' processor in NiFi and copy it to your $SPARK_HOME/conf/ folder.

nel...@leantaas.com

unread,
Aug 21, 2017, 4:24:56 PM8/21/17
to Kylo Community
Thank you!!

I have created a soft link from SPARK_HOME directory and it worked.

Now I verified that hive tables have been created and would like to see tables in Data Wrangler.

When I go to Tables menu - I have error on the bottom regarding unexpected error.

In the logs I see exception:

org.springframework.jdbc.support.MetaDataAccessException: Could not get Connection for extracting meta data; nested exception is org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLSyntaxErrorException: Could not connect: Unknown database 'hive'

On EMR cluster in mySQL I cant see hive table when I login as hadoop. When I login as root - I can see it.

Thank you,
Nella

Greg Hart

unread,
Aug 21, 2017, 5:46:21 PM8/21/17
to Kylo Community
Hi Nella,

The Tables page requires access to the Hive metastore. You should set the following properties in /opt/kylo/kylo-services/conf/application.properties:
hive.metastore.datasource.driverClassName=org.mariadb.jdbc.Driver
hive.metastore.datasource.url=jdbc:mysql://localhost:3306/hive
hive.metastore.datasource.username=root
hive.metastore.datasource.password=hadoop

These are just sample values. You'll need to determine the correct values for your environment from your Hive configuration.

nel...@leantaas.com

unread,
Aug 21, 2017, 7:25:46 PM8/21/17
to Kylo Community
I have repointed properties to mysql in EMR cluster

hive.metastore.datasource.driverClassName=org.mariadb.jdbc.Driver
hive.metastore.datasource.url=jdbc:mysql://emr-internal-ip:3306/hive
hive.metastore.datasource.username=hive
hive.metastore.datasource.password=hive-password
hive.metastore.datasource.validationQuery=SELECT 1
hive.metastore.datasource.testOnBorrow=true

I have created hive user in MariaDB on EMR and granted all privileges.

I can see tables now!!!!

When I go to Visual query though and execute select * statement it is failing. I can also see in the kylo log that spark service failed since it had multiple messages

22:56:46,079 Client.logInfo: Application report for application_1503078848312_0015 (state: ACCEPTED)

ValidateAndSplit job is working though.

Is there anything I can change on SPARK parameters?

Greg Hart

unread,
Aug 21, 2017, 7:55:02 PM8/21/17
to Kylo Community
Hi Nella,

Do you see any messages where it says RUNNING instead of ACCEPTED? If not then that might indicate that Spark is requesting too much memory from your cluster. You can try setting these properties in /opt/kylo/kylo-services/conf/spark.properties and restarting kylo-spark-shell:
spark.driver.memory=512m
spark.executor.memory=512m


nel...@leantaas.com

unread,
Aug 21, 2017, 8:35:43 PM8/21/17
to Kylo Community
I have changed resources but it is still failing :(

Is there anything else that can be done or tested in this case? How much resources are required on EMR cluster so that Data Wrangler could run?

Thank you,
Nella

Greg Hart

unread,
Aug 22, 2017, 2:52:30 PM8/22/17
to Kylo Community
Hi Nella,

It uses the default values set by your cluster configuration. You can also try commenting out the spark.shell.server.* properties and setting spark.shell.master in spark.properties:
#spark.shell.server.host = localhost
#spark.shell.server.port = 8450
spark.shell.master = local

You'll need to stop kylo-spark-shell and restart kylo-services.

nel...@leantaas.com

unread,
Aug 28, 2017, 6:00:02 PM8/28/17
to Kylo Community
Thank you!!

It worked.

I am reconfiguring the setup with the latest kylo using document in JIRA and my notes from previous installation.

Feed is running in nifi although I cant see the progress in Kylo

ERROR DefaultMessageListenerContainer-1:DefaultMessageListenerContainer:941 - Could not refresh JMS Connection for destination 'thinkbig.provenance-event-stats' - retrying using FixedBackOff{interval=5000, currentAttempts=60, maxAttempts=unlimited}. Cause: Error while attempting to add new Connection to the pool; nested exception is javax.jms.JMSException: Could not connect to broker URL: tcp://localhost:61616. Reason: java.net.ConnectException: Connection refused (Connection refused)

Could you please help to understand what might be going wrong?

Thank you,
Nella

Greg Hart

unread,
Aug 28, 2017, 7:55:26 PM8/28/17
to Kylo Community
Hi Nella,

This message indicates that NiFi is unable to communicate with your ActiveMQ server. Please ensure that your ActiveMQ server is running and that jms.activemq.broker.url in /opt/nifi/ext-config/config.properties properly points to your ActiveMQ server.
Reply all
Reply to author
Forward
0 new messages