Enable SSL

846 views
Skip to first unread message

pk.4...@gmail.com

unread,
Aug 24, 2020, 2:59:27 PM8/24/20
to MR3
Hi Sungwoo,

I followed directions from following pages:

Create keystores and self-signed certs:
--------------------------------------
$ ls -la key/
total 80
drwxr-xr-x  10 pkumar  staff   320 Aug  5 21:39 .
drwxr-xr-x   4 pkumar  staff   128 Aug  6 07:11 ..
-rw-------   1 pkumar  staff    56 Aug  6 06:37 .hivemr3-ssl-certificate.jceks.crc
-rw-------   1 pkumar  staff    56 Aug  5 20:34 .mr3-ssl.jceks.crc
-rw-------   1 pkumar  staff  5876 Aug  6 06:37 hivemr3-ssl-certificate.jceks
-rw-r--r--   1 pkumar  staff  2198 Aug  6 06:37 hivemr3-ssl-certificate.jks
-rw-r--r--   1 pkumar  staff   835 Aug  5 20:44 mr3-ssl.cert
-rw-------   1 pkumar  staff  5876 Aug  5 20:34 mr3-ssl.jceks
-rw-r--r--   1 pkumar  staff  2198 Aug  5 20:33 mr3-ssl.jks
-rw-r--r--   1 pkumar  staff  1188 Aug  5 20:48 mr3-ssl.pem

I have updated hive-site.xml and core-site.xml (attached them here only the diff):
I have few questions:
  1. In hive-site.xml:
    1. I see property hive.server2.keystore.password for HS2 only not for metastore
    2. value is set to "_" : don't I have to replace with actual password (used while generating cert and keystore) ?
  2. I am not planning to use SSL on postgres. Do I have to do following: ?
    1. copied from doc for enable SSL on Metastore (The user should make a copy of the certificate file for connecting to the MySQL database for Metastore and set MR3_METASTORE_MYSQL_CERTIFICATE in kubernetes/config-run.sh to point to the copy.)
  3. In fact, I was not planning to enable SSL on metastore - terminating TLS at HiveServer2 (HS2). Looks like it will be easier to enable it for both HS2 & MS
  4. In hive-site.xml, for "javax.jdo.option.ConnectionURL" property, will query-param "createDatabaseIfNotExist=true" also work for postgres? I have seen in your doc for MySQL.
  5. I did not see any keystore/cert for Beeline services (as mentioned in the doc). Did I miss something while generating the cert/trustore/keystore? Or we can use HS2 files?
Let. me know your thoughts on above questions and settings in the XML files. This is last mile/step to complete deployment of HiveMR3 - I am very excited (fingers crossed).

Thanks,
- Praveen.
 
temp-ssl-hive-site.xml
temp-ssl-core-site.xml

pk.4...@gmail.com

unread,
Aug 24, 2020, 4:59:51 PM8/24/20
to MR3
I tried above and HS2 fail to launch with following errors:
Caused by: java.io.IOException: Could not load file: /opt/mr3-run/key/hivemr3-ssl-certificate.jks

Detail log is attached.

Here is listing
bash-4.2$ ls -la /opt/mr3-run/key
total 4
drwxrwxrwt 3 root root   80 Aug 24 20:29 .
drwxr-xr-x 1 root root 4096 Aug 24 20:29 ..
drwxr-xr-x 2 root root   40 Aug 24 20:29 ..2020_08_24_20_29_33.060511995
lrwxrwxrwx 1 root root   31 Aug 24 20:29 ..data -> ..2020_08_24_20_29_33.060511995
bash-4.2$
bash-4.2$
bash-4.2$ ls -la /opt/mr3-run/key/..data
lrwxrwxrwx 1 root root 31 Aug 24 20:29 /opt/mr3-run/key/..data -> ..2020_08_24_20_29_33.060511995

bash-4.2$ ls -ll /opt/mr3-run/key/..data
lrwxrwxrwx 1 root root 31 Aug 24 20:29 /opt/mr3-run/key/..data -> ..2020_08_24_20_29_33.060511995

bash-4.2$ ls -ll /opt/mr3-run/key/..2020_08_24_20_29_33.060511995/
total 0
bash-4.2$ ls -ll /opt/mr3-run/key/..2020_08_24_20_29_33.060511995/


Any ideas? 
I will try again after deleting all resources like secrets, configmaps.

Thanks.
2020-08-24-hs2-ssl-error-could-not-load-cert-file.txt

pk.4...@gmail.com

unread,
Aug 24, 2020, 7:47:48 PM8/24/20
to MR3
I made little progress (keys are getting copied correctly to HS2). Now, metastore is throwing error:
 "Error getting metastore password: Configuration problem with provider path".

Detail log file from Metastore is attached.

I did following:
  1. Deleted deployments and all resources
  2. I had missed to set hive.createSecret=true that created secret correctly and now I see all items in the key/ on HS2.
  3. I provided keystore password in hive-site.xml properties:
    1. hive.server2.keystore.password
    2. hive.metastore.keystore.password.     (this is not required but I added myself 2nd time - to try to fix the password issue in metastore)

Going to do followings again:
  1. I will go through docs again to find where to provide KEYSTORE_PASSWORD 
  2. List the keystore and trustore to check if password is valid or not

short.ms-ssl-ps-verififcation-failed.txt

pk.4...@gmail.com

unread,
Aug 24, 2020, 8:14:11 PM8/24/20
to MR3
I am able to fix metastore issue, I had to add following in env.sh:
  1. HIVE_SERVER2_SSL_TRUSTSTOREPASS=<password>
  2. export HADOOP_CREDSTORE_PASSWORD=<password>
The HS2 seems to be stuck and I don't see any mr3-master pod/container. The log of HS2 is attached:

Do I need to provide any seetings to run-master/worker.sh ? I am using helm chart to deploy HiveMR3.

Thanks,
- Praveen
hs2-ssl-stuck??-log.txt

Sungwoo Park

unread,
Aug 24, 2020, 9:23:40 PM8/24/20
to MR3
Hi,

1. For running with Helm, env-secret.sh is a better place for defining HIVE_SERVER2_SSL_TRUSTSTOREPASS and HADOOP_CREDSTORE_PASSWORD. This is because env-secret.sh is mounted as a Secret whereas env.sh is not.

HIVE_SERVER2_SSL_TRUSTSTOREPASS=8ecead69-2f02-4177-a988-9c0daee15980
export HADOOP_CREDSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980

2. For some reason, HiveServer2 does not start (so DAGAppMaster is not created). From the log, it's hard to tell what is going on, so to find out why HiveServer2 gets stuck, we could set the logging level to DEBUG in kubernetes/conf/hive-log4j2.properties. (The best way to locate the place where HiveServer2 gets stuck is to run jstack and analyze all the stack traces.) Let me also try to figure out why this behavior would occur (which I haven't seen before).

Cheers,

--- Sungwoo 

Sungwoo Park

unread,
Aug 24, 2020, 10:09:44 PM8/24/20
to MR3
Going to do followings again:
  1. I will go through docs again to find where to provide KEYSTORE_PASSWORD 
  2. List the keystore and trustore to check if password is valid or not
For running with Helm, the generated password is set only in config-run.sh and env-secret.sh:

./helm/hive/env-secret.sh:HIVE_SERVER2_SSL_TRUSTSTOREPASS=8ecead69-2f02-4177-a988-9c0daee15980
./helm/hive/env-secret.sh:export HADOOP_CREDSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980
./config-run.sh:MR3_SSL_KEYSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980

If all things fail, could you reset the setting (to make sure that MR3 starts okay without SSL) and follow the instruction in 'Enabling SSL' in https://mr3docs.datamonad.com/docs/k8s/helm/run-metastore/? This page contains all necessary changes and pointers to other related pages (which have been tested). 

Minor comments:
1. hive.server2.keystore.password can be set to _. Setting it to the password seems okay, though.
2. No need to set hive.metastore.keystore.password in hive-site.xml.

Cheers,

--- Sungwoo

Sungwoo Park

unread,
Aug 24, 2020, 10:28:42 PM8/24/20
to MR3
I have updated hive-site.xml and core-site.xml (attached them here only the diff):
I have few questions:
  1. In hive-site.xml:
    1. I see property hive.server2.keystore.password for HS2 only not for metastore
    2. value is set to "_" : don't I have to replace with actual password (used while generating cert and keystore) ?
No, you don't have to. However, from my own testing, it is okay to use the actual password. 
  1. I am not planning to use SSL on postgres. Do I have to do following: ?
    1. copied from doc for enable SSL on Metastore (The user should make a copy of the certificate file for connecting to the MySQL database for Metastore and set MR3_METASTORE_MYSQL_CERTIFICATE in kubernetes/config-run.sh to point to the copy.)
No, you don't have to. 
  1. In fact, I was not planning to enable SSL on metastore - terminating TLS at HiveServer2 (HS2). Looks like it will be easier to enable it for both HS2 & MS
I would suggest to enable SSL for both HiveServer2 and Metastore. One might be able to configure them differently, but it seems like an unusual setting.
  1. In hive-site.xml, for "javax.jdo.option.ConnectionURL" property, will query-param "createDatabaseIfNotExist=true" also work for postgres? I have seen in your doc for MySQL.
I suppose so, but have not tested it myself.
  1. I did not see any keystore/cert for Beeline services (as mentioned in the doc). Did I miss something while generating the cert/trustore/keystore? Or we can use HS2 files?
Beeline should use its own KeyStore file. Please see "As HiveServer2 runs with SSL enabled, Beeline should use its own KeyStore file..." in https://mr3docs.datamonad.com/docs/k8s/helm/run-metastore/. This is how I run Beeline with SSL using run-beeline.sh included in MR3.

./hive/hive/run-beeline.sh --ssl /home/gitlab-runner/mr3-run/kubernetes/beeline-ssl.jks

Cheers,

-- Sungwoo

pk.4...@gmail.com

unread,
Aug 24, 2020, 10:48:17 PM8/24/20
to MR3
Please see my comments inline. Thanks.


On Monday, August 24, 2020 at 7:09:44 PM UTC-7 Sungwoo Park wrote:
Going to do followings again:
  1. I will go through docs again to find where to provide KEYSTORE_PASSWORD 
  2. List the keystore and trustore to check if password is valid or not
For running with Helm, the generated password is set only in config-run.sh and env-secret.sh:

./helm/hive/env-secret.sh:HIVE_SERVER2_SSL_TRUSTSTOREPASS=8ecead69-2f02-4177-a988-9c0daee15980
./helm/hive/env-secret.sh:export HADOOP_CREDSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980
./config-run.sh:MR3_SSL_KEYSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980

If I understand it correctly:
  1. config-run.sh is used to perform SSL operations (like creating keystore,  truststore, self-sign certs etc which are copied to key/ and eventually used in the HS2 deployment). config-run.sh is not used in the deployment. If I need to provide config-run.sh  (with MR3_SSL_KEYSTORE_PASSWORD=<password>) where should I copy it in the helm chart? fyi: I have copied (customizable via java properties) hise-setup.sh, run-master/worker.sh into docker image (that means it's final and remain same for all our clusters. I do not have an option to bundle a config-run.sh with a hard coded password because it will have different password for different cluster. Our cluster:
    1. do not share HS2, Metastore, etc and completely independent HS2 service - one cluster per VPC.
    2. They are customized via secrets, configmaps, and/or java properties
  2. env.sh (or better/safer env-secret.sh) has settings for password.

If all things fail, could you reset the setting (to make sure that MR3 starts okay without SSL) and follow the instruction in 'Enabling SSL' in https://mr3docs.datamonad.com/docs/k8s/helm/run-metastore/? This page contains all necessary changes and pointers to other related pages (which have been tested). 
I am about to start again first without SSL then SSL again. I try again with a new cluster.

Minor comments:
1. hive.server2.keystore.password can be set to _. Setting it to the password seems okay, though.

If password is in env-secret.sh, I will reset it back to hive.server2.keystore.password=_ to keep it secured. 

2. No need to set hive.metastore.keystore.password in hive-site.xml.

Yes
 

Cheers,

--- Sungwoo

Sungwoo Park

unread,
Aug 24, 2020, 11:05:48 PM8/24/20
to MR3
If I understand it correctly:
  1. config-run.sh is used to perform SSL operations (like creating keystore,  truststore, self-sign certs etc which are copied to key/ and eventually used in the HS2 deployment). config-run.sh is not used in the deployment. If I need to provide config-run.sh  (with MR3_SSL_KEYSTORE_PASSWORD=<password>) where should I copy it in the helm chart? fyi: I have copied (customizable via java properties) hise-setup.sh, run-master/worker.sh into docker image (that means it's final and remain same for all our clusters. I do not have an option to bundle a config-run.sh with a hard coded password because it will have different password for different cluster. Our cluster:
    1. do not share HS2, Metastore, etc and completely independent HS2 service - one cluster per VPC.
    2. They are customized via secrets, configmaps, and/or java properties
  2. env.sh (or better/safer env-secret.sh) has settings for password.
1. config-run.sh is not used in the deployment. 

--> Correct. We read it only when executing 'kubernetes/run-hive.sh --generate-truststore'. When using Helm, we don't need it anymore after executing 'kubernetes/run-hive.sh --generate-truststore'.

2. If you need to pass anything secret, env-secret.sh should be preferred to env.sh.

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 24, 2020, 11:29:27 PM8/24/20
to MR3
I tried again without SSL and it worked (HS2, Metastore, Master all are up and running - I did not run the query but I am sure it would work).

I compared the log of HS2 (with and without SSL). The HS2 was stuck did not log any message after "Created HDFS directory: s3a://<bucket-name>/work-dir/users/hive/hive/5390b00c-7b53-48dd-8c63-4a688a28cfbe" (inclusive). 

I have attached a log from HS2 (w/o SSL) after removing sensitive info. HS2 (w/ SSL) was stuck after printing line #1 (rest are present only in HS2 w/o SSL). fyi: We use custom PAM and not use kerberos.

It was stuck in access/creating directory on S3 ?? or it is custom authentication/authorization an issue?? Thought of sharing this info if you have any ideas/suggestions/recommendations.

I will disable our custom PAM to check if that's an issue too. 

Thanks.

hs2-no-ssl-log-diff-from-ssl-one.txt

Sungwoo Park

unread,
Aug 24, 2020, 11:55:21 PM8/24/20
to MR3
If line #2 is not printed, I guess it is related to authentication.

2020-08-25T03:01:36,589 INFO [main] Configuration.deprecation: fs.s3a.server-side-encryption-key is deprecated. Instead, use fs.s3a.server-side-encryption.key

You can get a lot more info by setting logging level to DEBUG in hive-log4j2.properties in kubernetes/conf/. So, could you try logging level DEBUG and see what happens? You can ignore an error with stack trace, e.g.:

2020-08-25T03:44:30,019 DEBUG [main] beanutils.FluentPropertyBeanIntrospector: Exception is:
java.beans.IntrospectionException: bad write method arg count: public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)

It may be that the problem is specific to Amazon AWS, in which case I will try everything sometime later.

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 25, 2020, 12:30:59 AM8/25/20
to MR3
I tried DEBUG level log but it did not log/debug anything. I checked the HS2/container also and it was set correctly. Did I miss something?

conf/hive-log4j2.properties

# list of properties
property.hive.log.level = DEBUG
property.hive.perflogger.log.level = DEBUG


Everything was logged as INFO only
--------------------
2020-08-25T04:19:27,255  INFO [main] SessionState: Hive Session ID = d9c6cb85-06c7-47f2-a9b7-40e75a2599c7
2020-08-25T04:19:27,890  INFO [main] beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2020-08-25T04:19:27,931  INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2020-08-25T04:19:27,959  INFO [main] impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-08-25T04:19:27,959  INFO [main] impl.MetricsSystemImpl: s3a-file-system metrics system started.      [LAST MESSAGE -  HS2 stuck here]
++ cleanup_child
+++ jobs -p
++ local pid=26
++ [[ 26 != '' ]]
++ kill 26
++ ps -p 26
++ echo 'Waiting for process 26 to stop...'
  PID TTY          TIME CMD
   26 ?        00:00:08 java
Waiting for process 26 to stop...
++ sleep 1
2020-08-25T04:21:57,631  INFO [shutdown-hook-0] server.HiveServer2: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down HiveServer2 at hivemr3-hiveserver2-4zxrj/100.121.124.199
************************************************************/
++ ps -p 26
++ echo 'Process 26 stopped.'
  PID TTY          TIME CMD
Process 26 stopped.
----------

Sungwoo Park

unread,
Aug 25, 2020, 12:34:14 AM8/25/20
to MR3
You could just replace all occurrences of INFO with DEBUG in hive-log4j2.properties. Then, it should print DEBUG messages.

gitlab-runner@orange1:~/mr3-run/kubernetes/conf$ grep INFO hive-log4j2.properties
status = INFO
property.hive.log.level = INFO
property.hive.perflogger.log.level = INFO
filter.threshold.level = INFO
logger.DataNucleus.level = INFO
logger.Datastore.level = INFO
logger.JPOX.level = INFO

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 25, 2020, 12:42:58 AM8/25/20
to MR3
fyi: 
I tried again after disabling custom authentication - result were same - HS2 was stuck at the same place.

I could not use jstack it's not available on HS2 and not very familiar with it - It's installed on my laptop - but I did not know how to use it.

Thanks.

pk.4...@gmail.com

unread,
Aug 25, 2020, 12:53:40 AM8/25/20
to MR3

Sungwoo Park

unread,
Aug 25, 2020, 1:22:52 AM8/25/20
to MR3
1. If the problem is specific to accessing Amazon S3 with SSL enabled, I think it can be useful.

2. The log with DEBUG might reveal something that we do not know yet.

Let me also try again with MinIO.

Cheers,

--- Sungwoo

Sungwoo Park

unread,
Aug 25, 2020, 4:44:49 AM8/25/20
to MR3
Hi Praveen,

I reproduced the problem, and fixed it in my local cluster. The problem is due to using SSL for connecting to S3 without providing a certificate. You can set logging level to DEBUG and see messages like:

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: null; S3 Extended Request ID: null)

If you set the logging level to INFO, nothing is printed and HiveServer2 just gets stuck.

Suppose that you want to use SSL for connecting to S3. Then, you would set in conf/core-site.xml:

<property>
  <name>fs.s3a.connection.ssl.enabled</name>
  <value>true</value>
</property>

<property>
  <name>fs.s3a.endpoint</name>
  <value>https://orange0:9000</value>
</property>

Then, you have to import the public certificate provided by the S3 server into kubernetes/key/hivemr3-ssl-certificate.jks. This is easy to do with 'kubernetes/run-hive.sh --generate-truststore', as follows:

1. In kubernetes/config-run.sh, introduce a new variable, e.g.,  MR3_S3_CERTIFICATE:

MR3_S3_CERTIFICATE=/home/gitlab-runner/mr3-run/kubernetes/minio-public.crt

2. In import_certificates(), expand the variable 'certificates'.

function import_certificates {
    keystore=$1

    certificates="$MR3_RANGER_MYSQL_CERTIFICATE $MR3_METASTORE_MYSQL_CERTIFICATE $MR3_KMS_CERTIFICATE $MR3_S3_CERTIFICATE"

3. Run 'kubernetes/run-hive.sh --generate-truststore'.

Now the hivemr3-ssl-certificate.jks contains the public certificate, so the connection should be okay.

I have not tested the combination of AWS S3 + SSL yet, but the logic should be the same. If you would like to disable SSL for S3, you can set fs.s3a.connection.ssl.enabled to false (which does not affect SSL for HiveServer2 and Metastore).

If you have other questions, please let me know.

Cheers,

--- Sungwoo


pk.4...@gmail.com

unread,
Aug 25, 2020, 5:58:40 AM8/25/20
to MR3
Thanks Sungwoo. 

I am going to to give it a try. The issue can be resolved in two ways. Few questions:
  1. Short and easy route: disable  SSL for S# by setting fs.s3a.connection.ssl.enabled = false
    1. What will be security implications of it - assuming that only component reachable by external client is HS2 ?
  2. Long approach: 
    1. import public cert of AWS S3 into truststore
    2. Set fs.s3a.endpoint = https://orange0:9000 . What is https://orange0:9000 ? Metastore host and port or HS2:port?

Sungwoo Park

unread,
Aug 25, 2020, 6:26:02 AM8/25/20
to MR3
  1. Short and easy route: disable  SSL for S# by setting fs.s3a.connection.ssl.enabled = false
    1. What will be security implications of it - assuming that only component reachable by external client is HS2 ?
As far as I can tell, there is no security issue here (other than the traffic being not encrypted) because you can prevent external clients from directly accessing S3 by setting IAM roles appropriately. Furthermore, if authorization is handled by Ranger, there are practically no bad actions that rouge users can take.

  1. Long approach: 
    1. import public cert of AWS S3 into truststore
    2. Set fs.s3a.endpoint = https://orange0:9000 . What is https://orange0:9000 ? Metastore host and port or HS2:port?
fs.s3a.endpoint should be set if the user wants to access a custom S3-compatible storage. For accessing AWS S3, it does not need to be set. This is briefly mentioned in https://mr3docs.datamonad.com/docs/k8s/advanced/access-s3/.

Let me test S3 + SSL on AWS EKS sometime.

Cheers,

--- Sungwoo
 

pk.4...@gmail.com

unread,
Aug 25, 2020, 6:38:34 AM8/25/20
to MR3
I tried option-1 (setting ). I see HS2 server opened SSL connection to Metastore successfully. But DAGMaster failing to launch with the "Failed to run DAGAppMaster" (log attached). It appears password has been not copied to DAGMaster pod/container. Any idea why? I did not see in the doc if I have to provide keystore/password to master.

short exception log in master :
-------------------------------------------------
Failed to run DAGAppMaster
Caused by: java.io.IOException: Cannot find password option fs.s3a.bucket.<bucket-name>.fs.s3a.server-side-encryption-algorithm
Caused by: java.io.IOException: Configuration problem with provider path.
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
Caused by: java.security.UnrecoverableKeyException: Password verification failed

HS2 log
---------
2020-08-25T10:18:19,374  INFO [NotificationEventPoll 0] metastore.HiveMetaStoreClient: Opened an SSL connection to metastore, current connections: 2
2020-08-25T10:18:19,408  INFO [NotificationEventPoll 0] metastore.HiveMetaStoreClient: Connected to metastore.
2020-08-25T10:18:19,408  INFO [NotificationEventPoll 0] metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive (auth:SIMPLE) retries=24 delay=5 lifetime=0
2020-08-25T10:18:26,379  INFO [main] ipc.Client: Retrying connect to server: service-master-8663-0.pkumar.svc.cluster.local/100.65.3.220:80. Already tried 2 time(s); maxRetries=45


I am going to try option-2. 

pk.4...@gmail.com

unread,
Aug 25, 2020, 6:49:52 AM8/25/20
to MR3
Thanks Sungwoo. In that case, I will use option-1 for now to complete the integration and comeback to option-2 (if security issue becomes concern by IT, user/customer).

Do I need to export ATS_SECRET_KEY=<password> in env-secret.sh to fix DAGAppMaster issue mentioned in above thread ? I am using only:
1. HS2   with SSL 
2. Metastore  w/ SSL
3. Postgres w/o SSL

Not using:
1. ATS or Timeline server
2. Ranger
3. KMS

Sungwoo Park

unread,
Aug 25, 2020, 6:59:01 AM8/25/20
to MR3
Do I need to export ATS_SECRET_KEY=<password> in env-secret.sh to fix DAGAppMaster issue mentioned in above thread ? I am using only:
1. HS2   with SSL 
2. Metastore  w/ SSL
3. Postgres w/o SSL

ATS_SECRET_KEY is unnecessary because you don't use ATS.

There is still a problem with the option 2, and let me update MR3docs after fixing the issue.

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 25, 2020, 7:07:47 AM8/25/20
to MR3
Thanks.

How about option-1? DAGAppMaster is failing to start - it appears keystore password is not available to the container.

Sungwoo Park

unread,
Aug 25, 2020, 7:22:17 AM8/25/20
to MR3
<property>
  <name>fs.s3a.server-side-encryption-algorithm</name>
  <value>AES256</value>
</property>  

Could you try the above setting in core-site.xml? I haven't reproduced the problem, and I'll try this later. I think this is about configuring Hadoop library (core-site.xml) to access S3.

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 25, 2020, 7:33:22 AM8/25/20
to MR3
Thanks Sungwoo.
  1. fs.s3a.server-side-encryption-algorithm is AES256 (always) in core-site.xml.
  2. I could not enable DEBUG level logging.
    1. Set LOG_LEVEL=DEBUG in env.sh file
    2. Set log,level=DEBUG in conf/hive-log4j2.properties file

pk.4...@gmail.com

unread,
Aug 25, 2020, 7:49:50 AM8/25/20
to MR3
Fixed the issue with DAGAppMaster. Somehow, I had lost the entry for HADOOP_CREDSTORE_PASSWORD  (mr3.am.launch.env and mr3.container.launch.env) from conf/core-site.xml. All services are up and running. I see lots of handshake related error as seen in attached log. Is it related to same no data or no sasl" from thrift-server for liveness/readiness probes ?
  1. Going to test the client to connect to SSL enabled HS2. 
  2. Try option-2 after #1 - will wait for updated instructions from you as you mentioned "There is still a problem with the option 2"
I will keep you posted. 

Thanks,


hs2-ssl-log-remote-host-closed.txt

Sungwoo Park

unread,
Aug 25, 2020, 8:19:30 AM8/25/20
to MR3
2-1. Currently LOG_LEVEL in kubernetes/env.sh is ignored (https://mr3docs.datamonad.com/docs/k8s/known-issues/).

2-1. An update to kubernetes/conf/hive-log4j2.properties should be reflected in the log file of HiveServer2.

gitlab-runner@orange1:~/mr3-run/kubernetes$ grep DEBUG conf/hive-log4j2.properties
status = DEBUG
property.hive.log.level = DEBUG
property.hive.perflogger.log.level = DEBUG
filter.threshold.level = DEBUG
logger.DataNucleus.level = DEBUG
logger.Datastore.level = DEBUG
logger.JPOX.level = DEBUG

Cheers,

-- Sungwoo

Sungwoo Park

unread,
Aug 25, 2020, 8:48:37 AM8/25/20
to MR3
Yes, I think it is related to liveness/readiness probes (especially if the same message is printed at a regular interval). It will be printed as a DEBUG message if you use MR3:1.2-SNAPSHOT.

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 25, 2020, 9:03:32 AM8/25/20
to MR3
Thanks Sungwoo.

  1. To fix the SSL handshake error (it's polluting log way faster than "no data or no sasl" and making it difficult to troubleshoot any issue), do I need to set something like "-Dhttps.protocols=TLSv1.1,TLSv1.2" (if yes where in hive-setup.sh and run-master/worker.sh too) ? https://stackoverflow.com/questions/21245796/javax-net-ssl-sslhandshakeexception-remote-host-closed-connection-during-handsh/22629008
  2. JDBC client (external) is failing to connect to SSL-enabled HS2 host. 
    1. The error is "Could not open client transport with JDBC Uri: jdbc:hive2://<HS2-host>:9852/default; Invalid status 21" (Details -    Type: java.sql.SQLException (SQL State: 08S01) ). HS2 server log shows correct message "Unrecognized SSL message, plaintext connection?"
    2. Added SSL related info to the connection URI. Now, I am getting different error "Could not open client transport with JDBC Uri: jdbc:hive2://<HS2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/ws/keystore/dev;trustStorePassword=<password>: Error creating the transport. There is no activity in the HS2 log. It means the URI is not correct - I checked the format and it looks fine to me. I doubt that LoadBalancer is issue here and not letting it forward to HS2. Any idea?
    3. I will try creating truststore for beeline and test it. How do I copy truststore file to HS2 server ?

Sungwoo Park

unread,
Aug 25, 2020, 9:32:52 AM8/25/20
to pk.4...@gmail.com, MR3

  1. To fix the SSL handshake error (it's polluting log way faster than "no data or no sasl" and making it difficult to troubleshoot any issue), do I need to set something like "-Dhttps.protocols=TLSv1.1,TLSv1.2" (if yes where in hive-setup.sh and run-master/worker.sh too) ? https://stackoverflow.com/questions/21245796/javax-net-ssl-sslhandshakeexception-remote-host-closed-connection-during-handsh/22629008
Do you use the new Docker image (MR3:1.2-SNAPSHOT)? It is similar to the message as "no data or no sasl", but printed when SSL is used. You can see how to suppress the message at:


  1. JDBC client (external) is failing to connect to SSL-enabled HS2 host. 
    1. The error is "Could not open client transport with JDBC Uri: jdbc:hive2://<HS2-host>:9852/default; Invalid status 21" (Details -    Type: java.sql.SQLException (SQL State: 08S01) ). HS2 server log shows correct message "Unrecognized SSL message, plaintext connection?"
    2. Added SSL related info to the connection URI. Now, I am getting different error "Could not open client transport with JDBC Uri: jdbc:hive2://<HS2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/ws/keystore/dev;trustStorePassword=<password>: Error creating the transport. There is no activity in the HS2 log. It means the URI is not correct - I checked the format and it looks fine to me. I doubt that LoadBalancer is issue here and not letting it forward to HS2. Any idea?

Not sure about this. I have seen a similar message (Error creating the transport) when I made a mistake of not setting hadoop.security.credential.provider.path. As there is no activity in the HS2 log, I guess it is a configuration problem on the client side. (Is /Users/pkumar/ws/keystore/dev your TrustStore file? It should point to a file.) 
 
    1. I will try creating truststore for beeline and test it. How do I copy truststore file to HS2 server ?
You don't have to, because the admin user distributes a TrustStore file and HiveServer2 already knows about this. It is explained at the end of the following page:


Cheers,

--- Sungwoo 

 
--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/a1496b70-08a4-4c92-88b9-f5e3cb47c4c2n%40googlegroups.com.

pk.4...@gmail.com

unread,
Aug 25, 2020, 10:01:47 AM8/25/20
to MR3
Thanks Sungwoo.
  1. My bad, I was looking for truststore.jks as truststore filename. Changed it to mr3-ssl.jks and it worked. The generate-hivemr3-ssl.sh generates "mr3-ssl.jks" file as truststore file. Correct?
  2. I will teardown everything and try again option-1 before integrating rest of the stuffs.
  3. I will try option-2 after integrating rest of the stuffs (python clients etc). Assuming internal clients (python-based) will have smooth connection using SSL info.
Thanks a ton for help and support. I will provide short list of steps, later - I did to enable SSL.

Sungwoo Park

unread,
Aug 25, 2020, 12:15:32 PM8/25/20
to MR3
Hello,

1.
generate-hivemr3-ssl.sh generates mr3-ssl.jks.

2.
The remaining problem was that ContainerWorker Pods could not connect to S3 with SSL. This can be solved by defining javax.net.ssl.trustStore and javax.net.ssl.trustStoreType with mr3.container.launch.cmd-opts in kubernetes/conf/mr3-site.xml.

  <name>mr3.container.launch.cmd-opts</name>
  <value>-XX:+AlwaysPreTouch -Xss512k -XX:+UseG1GC -XX:TLABSize=8m -XX:+ResizeTLAB -XX:+UseNUMA -XX:+AggressiveOpts -XX:InitiatingHeapOccupancyPercent=40 -XX:G1ReservePercent=20 -XX:MaxGCPauseMillis=200 -XX:MetaspaceSize=1024m -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -Dlog4j.configurationFile=k8s-mr3-container-log4j2.properties -Djavax.net.ssl.trustStore=/opt/mr3-run/key/hivemr3-ssl-certificate.jks -Djavax.net.ssl.trustStoreType=jks</value>
  <!-- -XX:SoftRefLRUPolicyMSPerMB=25 -->
</property>

In the MR3 distribution, mr3.am.launch.cmd-opts already defines javax.net.ssl.trustStore and javax.net.ssl.trustStoreType, so DAGAppMaster has no problem with accessing S3 with SSL.

3.
I have not tested accessing S3 with SSL on Amazon AWS. Let me test it sometime and update MR3docs.

4.
If your clients want to encrypt all data traffic, you should also enable secure shuffle. For more details, please see:

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 25, 2020, 10:56:49 PM8/25/20
to MR3
Hi Sungwoo,

1. 

I encountered another problem after enabling SSL (similar to S3). We use our custom authentication/authorization manager (PAM - pluggable auth module), which accesses another REST API service using https://foo.com/<end-point>. Now it appears it is using not finding cert for this service and throwing error (specially first 4 lines - attached log):
2020-08-25T23:18:24,622  INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling endpoint:
2020-08-25T23:18:24,622  INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false
2020-08-25T23:18:24,636 ERROR [HiveServer2-Handler-Pool: Thread-36] transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login

I thought SASL negotiation is used only for kerberos - why it is happening here? Do I need to disable something? fyi: it works without SSL.

It could be similar issue similar to S3 (I may have to import cert here too - which was not needed w/o SSL HS2) - line # 15
"Caused by: javax.security.sasl.AuthenticationException: General I/O Exception. General I/O Exception. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"

2. 
My python client service also can not connect it returns "TTransportException: TSocket read 0 bytes" error. Any ideas how to provide SSL info? I thought it should work but it did not.
connection url in python (using SQLAlchemy / pyhive) 
After    SSL : hive://hive:<password>@<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;
Before SSL : hive://hive:<password>@<hs2-host>:9852/default;

3.  Both works (after and before SSL urls) 
connection url in JDBC Client (DBVisualizer) : 
After    SSL : jdbc:hive2://<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;
Before SSL : jdbc:hive2://<hs2-host>:9852/default;

userId = hive
password = <hive-password>


Is it possible to terminate SSL at HS2 (and everything else is non-SSL) or it's too much of effort? Any other ideas?

In the meantime, I am going to import the API service's cert (it uses internal dns/hostname) so not sure if that would work or not.

Thanks,

- Praveen.
hs2-auth-mesh-url.txt

pk.4...@gmail.com

unread,
Aug 25, 2020, 10:58:54 PM8/25/20
to MR3
BTW : the python service client is failing and here is HS2 log:

2020-08-26T02:25:32,460 ERROR [HiveServer2-Handler-Pool: Thread-30] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) ~[hive-exec-3.1.2.jar:3.1.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?



Sungwoo Park

unread,
Aug 26, 2020, 1:55:32 AM8/26/20
to MR3
1. 

I encountered another problem after enabling SSL (similar to S3). We use our custom authentication/authorization manager (PAM - pluggable auth module), which accesses another REST API service using https://foo.com/<end-point>. Now it appears it is using not finding cert for this service and throwing error (specially first 4 lines - attached log):
2020-08-25T23:18:24,622  INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling endpoint:
2020-08-25T23:18:24,622  INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false
2020-08-25T23:18:24,636 ERROR [HiveServer2-Handler-Pool: Thread-36] transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login

I thought SASL negotiation is used only for kerberos - why it is happening here? Do I need to disable something? fyi: it works without SSL.

It could be similar issue similar to S3 (I may have to import cert here too - which was not needed w/o SSL HS2) - line # 15
"Caused by: javax.security.sasl.AuthenticationException: General I/O Exception. General I/O Exception. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"

HiveServer2 add these Java options when it uses SSL:

    -Djavax.net.ssl.trustStore=/opt/mr3-run/key/hivemr3-ssl-certificate.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=8ecead69-2f02-4177-a988-9c0daee15980

My guess is that when it contacts https://foo.com, HiveServer2 tries to find the certificate in hivemr3-ssl-certificate.jks, but in your case, the certificate is missing in hivemr3-ssl-certificate.jks.

You can manually add the certificate to hivemr3-ssl-certificate.jks (before executing HiveServer2), or update import_certificates() in kubernetes/config-run.sh and run 'kubernetes/run-hive.sh --generate-truststore' (see my previous message).

So, my question is: can you obtain a certificate for https://foo.com?
 

2. 
My python client service also can not connect it returns "TTransportException: TSocket read 0 bytes" error. Any ideas how to provide SSL info? I thought it should work but it did not.
connection url in python (using SQLAlchemy / pyhive) 
After    SSL : hive://hive:<password>@<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;
Before SSL : hive://hive:<password>@<hs2-host>:9852/default;

Not sure about this, but what happens if you remove 'hive:<password>' in the URL because I think it should be unnecessary? Also, check if HiveServer2 prints any message to see if it is a client-side issue.
 
3.  Both works (after and before SSL urls) 
connection url in JDBC Client (DBVisualizer) : 
After    SSL : jdbc:hive2://<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;
Before SSL : jdbc:hive2://<hs2-host>:9852/default;

userId = hive
password = <hive-password>


Is it possible to terminate SSL at HS2 (and everything else is non-SSL) or it's too much of effort? Any other ideas?

Let me look into this issue. (At the moment, I don't know how.) So, your goal is to enable SSL only for the connection from the client to HiveServer2, right?

Cheers,

--- Sungwoo
 

pk.4...@gmail.com

unread,
Aug 26, 2020, 2:10:02 AM8/26/20
to MR3
Please see my reply inline. Thanks.

On Tuesday, August 25, 2020 at 10:55:32 PM UTC-7 Sungwoo Park wrote:
1. 

I encountered another problem after enabling SSL (similar to S3). We use our custom authentication/authorization manager (PAM - pluggable auth module), which accesses another REST API service using https://foo.com/<end-point>. Now it appears it is using not finding cert for this service and throwing error (specially first 4 lines - attached log):
2020-08-25T23:18:24,622  INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling endpoint:
2020-08-25T23:18:24,622  INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false
2020-08-25T23:18:24,636 ERROR [HiveServer2-Handler-Pool: Thread-36] transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: Error validating the login

I thought SASL negotiation is used only for kerberos - why it is happening here? Do I need to disable something? fyi: it works without SSL.

It could be similar issue similar to S3 (I may have to import cert here too - which was not needed w/o SSL HS2) - line # 15
"Caused by: javax.security.sasl.AuthenticationException: General I/O Exception. General I/O Exception. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"

HiveServer2 add these Java options when it uses SSL:

    -Djavax.net.ssl.trustStore=/opt/mr3-run/key/hivemr3-ssl-certificate.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=8ecead69-2f02-4177-a988-9c0daee15980

My guess is that when it contacts https://foo.com, HiveServer2 tries to find the certificate in hivemr3-ssl-certificate.jks, but in your case, the certificate is missing in hivemr3-ssl-certificate.jks.

You can manually add the certificate to hivemr3-ssl-certificate.jks (before executing HiveServer2), or update import_certificates() in kubernetes/config-run.sh and run 'kubernetes/run-hive.sh --generate-truststore' (see my previous message).

So, my question is: can you obtain a certificate for https://foo.com?

Yes, I am planning to add it. But,  I think it's not going to help me after finding PyHive does not support SSL (my python client service is using PyHive). Our python service client:
  1. uses pyhive to ingest data into metastore. 
  2. uses internal HS2 service (2nd instance of HS2) and I am planning to not use SSL here (as long as it does not delay the release). But both internal and external HS2 services share metastore (which is now SSL enabled unless we terminate the SSL at HS2).
 
 

2. 
My python client service also can not connect it returns "TTransportException: TSocket read 0 bytes" error. Any ideas how to provide SSL info? I thought it should work but it did not.
connection url in python (using SQLAlchemy / pyhive) 
After    SSL : hive://hive:<password>@<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;
Before SSL : hive://hive:<password>@<hs2-host>:9852/default;

Not sure about this, but what happens if you remove 'hive:<password>' in the URL because I think it should be unnecessary? Also, check if HiveServer2 prints any message to see if it is a client-side issue.
 
3.  Both works (after and before SSL urls) 
connection url in JDBC Client (DBVisualizer) : 
After    SSL : jdbc:hive2://<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;
Before SSL : jdbc:hive2://<hs2-host>:9852/default;

userId = hive
password = <hive-password>


Is it possible to terminate SSL at HS2 (and everything else is non-SSL) or it's too much of effort? Any other ideas?

Let me look into this issue. (At the moment, I don't know how.) So, your goal is to enable SSL only for the connection from the client to HiveServer2, right?

Correct - it will be of immense help is we can terminate SSL at HS2. We can use current working cluster (python service client works, our CUSTOM auth/authorization works, two HS2 servers work - one for external and another one for internal use sharing metastore). In this working cluster everything is non-SSL and working fine. Only if we can terminate the TLS at HS2 it will avoid all other changes - which are a lot. I went through at least 50 search result to make our python service work with SSL and PyHive/SQLAlchemy and did not find any solution working.

Thanks a lot. I have always received a very prompt reply - in this case I am requesting the same, if possible as we are blocked. I can not thank you enough for your support and help.
 

Cheers,

--- Sungwoo
 

Sungwoo Park

unread,
Aug 26, 2020, 4:18:39 AM8/26/20
to MR3
Yes, I am planning to add it. But,  I think it's not going to help me after finding PyHive does not support SSL (my python client service is using PyHive). Our python service client:
  1. uses pyhive to ingest data into metastore. 
  2. uses internal HS2 service (2nd instance of HS2) and I am planning to not use SSL here (as long as it does not delay the release). But both internal and external HS2 services share metastore (which is now SSL enabled unless we terminate the SSL at HS2).
Let me look into this issue. (At the moment, I don't know how.) So, your goal is to enable SSL only for the connection from the client to HiveServer2, right?

Correct - it will be of immense help is we can terminate SSL at HS2. We can use current working cluster (python service client works, our CUSTOM auth/authorization works, two HS2 servers work - one for external and another one for internal use sharing metastore). In this working cluster everything is non-SSL and working fine. Only if we can terminate the TLS at HS2 it will avoid all other changes - which are a lot. I went through at least 50 search result to make our python service work with SSL and PyHive/SQLAlchemy and did not find any solution working.

Let me first explain a solution where we do not use a custom authentication/authorization manager. I have tested the solution in a local cluster where S3 runs without SSL.

1. We assume that S3 runs without SSL, so in core-site.xml, we should have:

fs.s3a.connection.ssl.enabled = false

For AWS S3, fs.s3a.endpoint does not have to be set.

2. In config-run.sh, we set ENABLE_SSL=true because HiveServer2 needs SSL.

3. We will run Metastore without SSL and without TrustStore. This requires a minor change to metastore-service.sh, and you should build a new Docker image. (You can reuse the existing Docker image, but it's more complicated.)

Set hive.metastore.use.SSL to false in hive-site.xml.

Then, add to hive/hive/metastore-service.sh:main() as follows:

function main {
    unset HIVE_SERVER2_SSL_TRUSTSTORE   # disable SSL for Metastore

    hive_setup_parse_args_common $@
    parse_args $REMAINING_ARGS
    metastore_service_init

Now, MetaStore runs without SSL and without knowing anything about TrustStore.

4. Do not let DAGAppMaster and ContainerWorker know about TrustStore. This is actually unnecessary, but to make sure that MR3 runs okay without SSL.

In mr3-site.xml, remove -Djavax.net.ssl.trustStore=… from mr3.am.launch.cmd-opts and mr3.container.launch.cmd-opts:

<property>
  <name>mr3.am.launch.cmd-opts</name>
  <value>-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -Dlog4j.configurationFile=k8s-mr3-container-log4j2.properties</value>
</property>

<property>
  <name>mr3.am.launch.cmd-opts</name>
  <value>-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -Dlog4j.configurationFile=k8s-mr3-container-log4j2.properties</value>
</property>

Now, DAGAppMaster and ContainerWorker know nothing about TrustStore.

With these setting, everything runs without SSL, except for HiveServer2. For example, in order to connect Beeline to HiveServer2, you have to pass beeline-ssl.jsk as usual.

./hive/hive/run-beeline.sh --ssl /home/gitlab-runner/mr3-run/kubernetes/beeline-ssl.jks

For your case, the only remaining problem is to figure out how to connect to https://foo.com. I can think of two solutions.

1) Retrieve the certificate for https://foo.com and merge it to key/hivemr3-ssl-certificate.jks, as described in my previous message. In my opinion, this should be the preferred way if you would like to run MR3 in production environments.

2) Extend your custom module so that it ignores javax.net.ssl.trustStore. The fix would depend on the language for implementing the module. (What I am confused about is how you managed to connect to https://foo.com without SSL previously.)

Let me know if this would work for you.

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 26, 2020, 11:18:20 AM8/26/20
to MR3
Thanks Sungwoo, wanted to update you on current state when above changes were made.

In short, all services are up and running. I can connect to SSL-enabled HS2 service using JDBC client.

But, internal python client is failing to connect to 2nd HS2 service (where SSL has been disabled). From 2nd instance of HS2's log, it appears that it is still trying to use SSL and throwing error "". I have attached log. I am debugging to check why SSL is being used on 2nd instance. Any idea how to quickly check it? Java system properties?

Here are the changes I made as per recommendations:
1. in metastore-service.sh:main() ,  I also added following for better visibility.
echo -e "\n#  ----- ignoring HIVE_SERVER2_SSL_TRUSTSTORE #\n" >&2

2. I did not find "-Djavax.net.ssl.trustStore" in mr3-site.xml . That property has been defined, as seen below.
<property>
  <name>mr3.am.launch.cmd-opts</name>
  <value>-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -Dlog4j.configurationFile=k8s-mr3-container-log4j2.properties -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/opt/mr3-run/conf/jgss.conf -Djava.security.krb5.conf=/opt/mr3-run/conf/krb5.conf -Dsun.security.jgss.debug=true</value>
</property>

3. Our custom (auth/authentioncation) module is in Java. I am thiking, in non-SSL mode HS2 managed to connect to https://foo.com because it was downloading cert during handshake? May be? Not sure.
I am not expert on Java and thinking that following should unset "javax.net.ssl.trustStore":
  3.1 System.getProperties().remove("javax.net.ssl.trustStore") or  

  3.2 System.getProperties().remove("javax.net.ssl.trustStore", '')

  What do you recommend? 
hs2-log-from-2nd-hs2-instance.log

pk.4...@gmail.com

unread,
Aug 26, 2020, 11:20:02 AM8/26/20
to MR3
3.2 is System.setProperty(("javax.net.ssl.trustStore", '')

Sungwoo Park

unread,
Aug 26, 2020, 12:15:11 PM8/26/20
to MR3
But, internal python client is failing to connect to 2nd HS2 service (where SSL has been disabled). From 2nd instance of HS2's log, it appears that it is still trying to use SSL and throwing error "". I have attached log. I am debugging to check why SSL is being used on 2nd instance. Any idea how to quickly check it? Java system properties?

For running the second HS2 instance, you should create different Secret/ConfigMap. You could quickly check by inspecting hive-site.xml and env.sh inside the HS2 Pod.

$ kubectl exec -it -n hivemr3 hivemr3-hiveserver2-fcph7 -- /bin/bash
root@hivemr3-hiveserver2-fcph7:/opt/mr3-run/hive# grep -A1 hive.server2.use.SSL /opt/mr3-run/conf/hive-site.xml
  <name>hive.server2.use.SSL</name>
  <value>true</value>
root@hivemr3-hiveserver2-fcph7:/opt/mr3-run/hive# grep HIVE_SERVER2_SSL_TRUSTSTORE /opt/mr3-run/env.sh
HIVE_SERVER2_SSL_TRUSTSTORE=$KEYTAB_MOUNT_DIR/hivemr3-ssl-certificate.jks
HIVE_SERVER2_SSL_TRUSTSTORETYPE=jks
HIVE_SERVER2_SSL_TRUSTSTOREPASS=8ecead69-2f02-4177-a988-9c0daee15980

If hive.server2.use.SSL is set to true, I think you are reusing the ConfigMap object for the first HS2 instance.
 
2. I did not find "-Djavax.net.ssl.trustStore" in mr3-site.xml . That property has been defined, as seen below.
<property>
  <name>mr3.am.launch.cmd-opts</name>
  <value>-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -Dlog4j.configurationFile=k8s-mr3-container-log4j2.properties -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/opt/mr3-run/conf/jgss.conf -Djava.security.krb5.conf=/opt/mr3-run/conf/krb5.conf -Dsun.security.jgss.debug=true</value>
</property>

If you don't find -Djavax.net.ssl.trustStore in mr3-site.xml, that's alright.
 
3. Our custom (auth/authentioncation) module is in Java. I am thiking, in non-SSL mode HS2 managed to connect to https://foo.com because it was downloading cert during handshake? May be? Not sure.
I am not expert on Java and thinking that following should unset "javax.net.ssl.trustStore":
  3.1 System.getProperties().remove("javax.net.ssl.trustStore") or  

  3.2 System.getProperties().remove("javax.net.ssl.trustStore", '')

Unfortunately this is not something that I can give my suggestion about. However, I would try this plan: 1) set hive.server2.use.SSL to false (so that HiveServer2 does not use SSL); 2) set HIVE_SERVER2_SSL_TRUSTSTORE so that HiveServer2 uses TrustStore if it needs to; 3) merge the certificate to hivemr3-ssl-certificate.jks. (Please note that I have not tested this plan.)

Q. Do you also use the custom module in the HiveServer2 instance with SSL enabled? If so, does it connect to https://foo.com without any problem?

Q. If you could somehow get your Python client to connect to HiveServer2 with SSL enabled, would it solve your problem as well?

--- Sungwoo
 

pk.4...@gmail.com

unread,
Aug 26, 2020, 1:32:19 PM8/26/20
to MR3
Hi Sungwoo,

To make our life easier, I am thinking of adding nginx server in the front of HS2 and keep HS2 non-SSL. `nginx` server which routes everything to HS2. Let me know if that will work and if I have to be watchful of annything. Thanks.

Thanks,
- Praveen.

Sungwoo Park

unread,
Aug 26, 2020, 2:24:49 PM8/26/20
to MR3
To make our life easier, I am thinking of adding nginx server in the front of HS2 and keep HS2 non-SSL. `nginx` server which routes everything to HS2. Let me know if that will work and if I have to be watchful of annything. Thanks.

I am not knowledgeable about setting up nginx, so I cannot really make a suggestion here. If you decide to try nginx, I may not be able to reproduce in our local cluster technical problems you may encounter.

I am testing Python clients for HiveServer2. Here are my findings:

1) I can connect Python clients to HiveServer2 with SSL enabled. Queries are running okay. HiveServer2 uses HIVE_SERVER2_AUTHENTICATION=NONE.
2) Ironically it seems hard to connect Python clients to HiveServer2 with SSL disabled. Other users on the internet also seem to have this problem.

Does your internal HiveServer2 use the custom authentication/authorization module? If it uses HIVE_SERVER2_AUTHENTICATION=NONE, we could use Python clients after enabling SSL for it. I am going to create a new page in MR3docs on how to use Python clients, so you could see if this would help solve your problem.

Cheers,

-- Sungwoo


pk.4...@gmail.com

unread,
Aug 26, 2020, 4:24:02 PM8/26/20
to MR3
Hi Sungwoo,

Which driver did you use for Python client ? I am using PyHive https://pypi.org/project/PyHive/ following python-3 based versions :
PyHive==0.6.1
SQLAlchemy==1.3.11
sqlalchemy-utils==0.36.8
sasl==0.2.1
thrift==0.13.0
thrift-sasl==0.3.0


Sungwoo Park

unread,
Aug 26, 2020, 8:16:31 PM8/26/20
to MR3
I used impyla and created a Docker image. I tested these combinations:

1) HIVE_SERVER2_AUTHENTICATION=NONE, hive.server2.use.SSL=true --> okay (auth_mechanism='PLAIN')
2) HIVE_SERVER2_AUTHENTICATION=KERBEROS, hive.server2.use.SSL=true --> okay (auth_mechanism='GSSAPI')
3) HIVE_SERVER2_AUTHENTICATION=NONE, hive.server2.use.SSL=false --> does not work

=== Dockerfile

FROM python:2

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit

RUN pip install pandas 'six==1.12.0' 'bit_array==0.1.0' 'thrift==0.9.3' 'thrift_sasl==0.2.1' 'sasl==0.2.1' 'impyla==0.13.8'

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y krb5-user

COPY krb5.conf /etc/krb5.conf

RUN mkdir -p /opt/impyla

WORKDIR /opt/impyla

CMD ["python", "/opt/impyla/run.py"]

=== Sample Python program:

import os
import pandas
from impala.dbapi import connect
from impala.util import as_pandas

os.system("kinit -kt /opt/impyla/hive.keytab hive@PL")
os.system("klist")

# "CN=gold7" in generate-hivemr3-ssl.sh
os.system("echo '192.168.10.1    gold7' >> /etc/hosts")

conn = connect(host='gold7',
  port=9852,
  database='tpcds_bin_partitioned_orc_1000',
  auth_mechanism='GSSAPI',
  use_ssl='true',
  ca_cert='/opt/impyla/mr3-ssl.pem',
  kerberos_service_name='root',
  user='gitlab-runner',
  password='gitlab-runner')

cursor = conn.cursor()

#cursor.execute('select * from call_center')
#tables = as_pandas(cursor)

cursor.execute("select  i_item_desc \
      ,i_category \
      ,i_class \
      ,i_current_price \
      ,i_item_id \
      ,sum(ws_ext_sales_price) as itemrevenue \
      ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over \
          (partition by i_class) as revenueratio \
from \
  web_sales \
      ,item \
      ,date_dim \
where \
  ws_item_sk = i_item_sk \
    and i_category in ('Jewelry', 'Sports', 'Books') \
    and ws_sold_date_sk = d_date_sk \
  and d_date between cast('2001-01-12' as date) \
        and (cast('2001-01-12' as date) + interval '30' days) \
group by \
  i_item_id \
        ,i_item_desc \
        ,i_category \
        ,i_class \
        ,i_current_price \
order by \
  i_category \
        ,i_class \
        ,i_item_id \
        ,i_item_desc \
        ,revenueratio \
limit 100")

tables = as_pandas(cursor)

pandas.set_option("display.max_rows", None, "display.max_columns", 100)
print(tables)

Sungwoo Park

unread,
Aug 27, 2020, 3:32:57 AM8/27/20
to MR3
With hive.server2.use.SSL=false, we can use pyhive, both with Kerberos and without Kerberos.

This is part of Dockerfile for installing pyhive:

FROM python:2

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit

RUN pip install pandas 'six==1.12.0' 'bit_array==0.1.0' 'thrift==0.10.0' 'thrift_sasl==0.3.0' 'sasl==0.2.1' 'pyhive==0.6.3'

Cheers,

--- Sungwoo


Sungwoo Park

unread,
Aug 27, 2020, 7:32:45 AM8/27/20
to MR3
I added a new page 'Accessing from Python' to MR3docs. Hope it helps,

 
Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 27, 2020, 7:55:06 AM8/27/20
to MR3
Thank you, Sungwoo.

  1. PyHive works fine with HS2 (no-SSSL) and auth=NONE. Issue was with auth=CUSTOM. 
    1. PyHive was throwing error password can not be provided if auth=NONE used, change the hive authorization to NONE or do not include password. Connection was never initiated because of this check from PyHive.
    2. It means it can not be used with SSL enabled.
  2. Impyla :  I noticed you used kerberos . Will it work without kerberos? I will give it try without kerberos (and probably "PLAIN" as auth mechanism) and check how it works for our "CUSTOM" auth.
Thanks for the doc's link.

Sungwoo Park

unread,
Aug 27, 2020, 8:08:32 AM8/27/20
to MR3
  1. PyHive works fine with HS2 (no-SSSL) and auth=NONE. Issue was with auth=CUSTOM. 
    1. PyHive was throwing error password can not be provided if auth=NONE used, change the hive authorization to NONE or do not include password. Connection was never initiated because of this check from PyHive.
    2. It means it can not be used with SSL enabled.
In the source code at https://github.com/dropbox/PyHive/blob/master/pyhive/hive.py, it seems that you can password with auth=CUSTOM. 

        if (password is not None) != (auth in ('LDAP', 'CUSTOM')):
            raise ValueError("Password should be set if and only if in LDAP or CUSTOM mode; "
                             "Remove password or use one of those modes")

My understanding was that PyHive does not work with SSL because you cannot specify the certificate file in pyhive.connect() -- there is no parameter like ca_cert or use_ssl.
 
  1. Impyla :  I noticed you used kerberos . Will it work without kerberos? I will give it try without kerberos (and probably "PLAIN" as auth mechanism) and check how it works for our "CUSTOM" auth.
Yes, it works (I tested in our local cluster).

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 27, 2020, 9:16:49 AM8/27/20
to MR3
Thank you, Sungwoo.

I will use impyla and try - I read somewhere in forum (while searching a solution for PyHive to use SSL) that impyla does not support rollback/commit. I will verify it when I use it.

On a side note: I found this article : https://stackoverflow.com/questions/21245796/javax-net-ssl-sslhandshakeexception-remote-host-closed-connection-during-handsh/22629008 Will this prevent the error (Remove host closed connection)  related to liveness/readiness probes because of SSL ? I did not get time to use snapshot image as yet as I have to get diff to make corresponding changes - hopefully by this weekend I will be able to give it try.

Thanks.

pk.4...@gmail.com

unread,
Aug 27, 2020, 10:01:15 AM8/27/20
to MR3
fyi: impyla uses impala -- impala  rollback/commit raises NotSupportedError 

pk.4...@gmail.com

unread,
Aug 27, 2020, 10:10:42 AM8/27/20
to MR3
Thanks Sungwoo,

I am getting "SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076)" 

On search, I found it is related to secured vs non-secured connection attempt.

I verified that host, port, username/password are correct as I can connect via JDBC client.

--- python test driver code
---------------------------
from impala.dbapi import connect as impala_connect
from impala.util import as_pandas

ca_cert = '<location>/dev/mr3-ssl.pem'
use_ssl = True   # I had tried both 'true' and True

conn_impyla = impala_connect(
    host=<host>,
    port=9852,
    database='default',
    auth_mechanism='PLAIN',
    user='<user>',
    password='<password>',
    use_ssl=use_ssl,
    ca_cert=ca_cert
)

Let me try using different config-map for 2nd instance of HS2 - I have designed helm chart in.a way where both instances can have their own secrets and configmaps. That way, 2nd HS2 and metastore will be non-SSL based.

I will visit changing driver library (to impyla or ibis)  after integration is completed and working.

Sungwoo Park

unread,
Aug 27, 2020, 10:55:21 AM8/27/20
to MR3
I have just tested two settings again:

1) hive.server2.use.SSL=true, hive.metastore.use.SSL=true, HIVE_SERVER2_AUTHENTICATION=NONE
2)  hive.server2.use.SSL=true, hive.metastore.use.SSL=false, HIVE_SERVER2_AUTHENTICATION=NONE  

The following code works for both settings:

conn = connect(host='gold7',
  port=9852,
  database='tpcds_bin_partitioned_orc_1000',
  auth_mechanism='PLAIN',
  use_ssl='true',
  ca_cert='/opt/impyla/mr3-ssl.pem', user='gitlab-runner', password='gitlab-runner')

I have seen ' SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number', and I think it was when I installed dependencies of wrong version for pyhive. (Unfortunately I didn't leave a record of when this occurs.) This problem seems so tricky, and I didn't find a clear solution to the problem on the internet.

For running impyla, do you use a Docker container (to make sure that we are in sync with the version of every dependency)?

Cheers,

--- Sungwoo

pk.4...@gmail.com

unread,
Aug 27, 2020, 12:23:44 PM8/27/20
to MR3
Thank you, Sungwoo - that's promising result.

Yeah my environments/settings and use-cases are different:
1. I use CUSTOM for HS2 authentication: We have a Java plugin (developed in-house for RBAC) for HS2 authorizer and authentication.
2. I did not run it in docker - running it inside docker will work for deployment as seen from your test results. Our python module is a service that curates data for metastore then ingests (uses commits/rollbacks) into it. The developers need to run it on their environments (mostly mac based)

I simplest (to complete the integration asap) approach is  : 
1. TLS terminates at 1st HS2 instance.
2. HS2 (2nd instance for internal clients) uses it's own config-maps (to avoid SSL issue).
3. Remaning Metastore, DAGAppMaster and ContainerWorkers - non SSL enabled.

Later, we will revisit the driver issue (hopefully right after integration of HiveMR3) because PyHive needs to be replaced with impyla or some other driver (which supports transaction).

pk.4...@gmail.com

unread,
Aug 28, 2020, 1:41:41 AM8/28/20
to MR3
Hello Sungwoo,

I followed the simplest approach as mentioned above:
    1. TLS terminates at 1st HS2 instance
    1. HS2 (2nd instance for internal clients) uses it's own config-maps (to avoid SSL issue).
    1. Remaning Metastore, DAGAppMaster and ContainerWorkers - are not SSL enabled.
    Now, all pods are services are working fine. Our python service can curate data and ingest it into metastore. Simple as well as complex query (where worker nodes are launched for map/reduce) are returning results fine. I still have to resolve the issue with our custom auth endpoint https://foo.com . I have two options (as suggested by above) or rather 3 options:
           1) Retrieve the certificate for https://foo.com and merge it to key/hivemr3-ssl-certificate.jks, as described in my previous message.
           2) Extend custom Java module so that it ignores javax.net.ssl.trustStore. 

    The third option, I am thinking is to use the same cert for HiveServe2 's keystore/truststore (as both are our services - https://foo.com and HS2 host). This will fulfill the need of #1 above too. Do you think it will work? If yes, how it can be done : I foresee one problem with approach: Our certs are password less. Hadoop self-sign cert tool requires password. It generates its own if not provided. Can this requirement be skipped? If yes then I just need to import our cert into Keystore and populate Truststore. Correct? I will try it in few. hours - wanted to get your opinion about it.

    Last question about debugging: When I started testing nodeSelector for workers/master, I noticed (because of typos in the label), the log was printing following message continuously. Obviously, it was unable to get node scheduled but there was nothing to determine where and why it's unable to create mapper/reducer. Is there a way (Other than debug log level  - that requires restart) to get more info about it? 

    2020-08-28T03:13:43,995  INFO [HiveServer2-Background-Pool: Thread-46] SessionState: Map 1: -/- Map 2: 0(+1)/1
    Map 1: -/- Map 2: 0(+1)/1
    2020-08-28T03:13:47,000  INFO [HiveServer2-Background-Pool: Thread-46] SessionState: Map 1: -/- Map 2: 0(+1)/1
    Map 1: -/- Map 2: 0(+1)/1

    Thanks a lot - very close to finish line.

    - Praveen.

    Sungwoo Park

    unread,
    Aug 28, 2020, 3:18:46 AM8/28/20
    to MR3
    Now, all pods are services are working fine. Our python service can curate data and ingest it into metastore. Simple as well as complex query (where worker nodes are launched for map/reduce) are returning results fine. I still have to resolve the issue with our custom auth endpoint https://foo.com . I have two options (as suggested by above) or rather 3 options:
           1) Retrieve the certificate for https://foo.com and merge it to key/hivemr3-ssl-certificate.jks, as described in my previous message.
           2) Extend custom Java module so that it ignores javax.net.ssl.trustStore. 

    The third option, I am thinking is to use the same cert for HiveServe2 's keystore/truststore (as both are our services - https://foo.com and HS2 host). This will fulfill the need of #1 above too. Do you think it will work? If yes, how it can be done : I foresee one problem with approach: Our certs are password less. Hadoop self-sign cert tool requires password. It generates its own if not provided. Can this requirement be skipped? If yes then I just need to import our cert into Keystore and populate Truststore. Correct? I will try it in few. hours - wanted to get your opinion about it.

    As I don't have means to test your idea, I can make only the following comment. After creating the KeyStore file mr3-ssl.jks, you have created mr3-ssl.cert and mr3-ssl.pem. If I understand your approach correctly, you could try to use mr3-ssl.cert and mr3-ssl.pem in setting up the server at https://foo.com

    - If you run Apache server, you could use mr3-ssl.cert or mr3-ssl.pem (or both) to set it up.
    - If you run nginx, I think you should convert mr3-ssl.pem to an nginx key.

    If you try your approach, please share the result as I would like to learn more about this issue.
     
    Last question about debugging: When I started testing nodeSelector for workers/master, I noticed (because of typos in the label), the log was printing following message continuously. Obviously, it was unable to get node scheduled but there was nothing to determine where and why it's unable to create mapper/reducer. Is there a way (Other than debug log level  - that requires restart) to get more info about it? 

    2020-08-28T03:13:43,995  INFO [HiveServer2-Background-Pool: Thread-46] SessionState: Map 1: -/- Map 2: 0(+1)/1
    Map 1: -/- Map 2: 0(+1)/1
    2020-08-28T03:13:47,000  INFO [HiveServer2-Background-Pool: Thread-46] SessionState: Map 1: -/- Map 2: 0(+1)/1
    Map 1: -/- Map 2: 0(+1)/1

    HiveServer2 has no way of telling why mappers/reducers fail to launch because it just receives and reports DAG states and error messages from the execution engine, which is MR3 in your case. If something like this happens, you should analyze the DAGAppMaster log, the state of the cluster, and your configuration to find out why, because MR3 cannot tell why the K8s cluster refuses to create worker Pods. For example, MR3 does not know if the K8s cluster has already run out of resources due to external Docker containers.

    Cheers,

    --- Sungwoo

    pk.4...@gmail.com

    unread,
    Aug 28, 2020, 10:33:21 PM8/28/20
    to MR3
    Hi Sungwoo,

    After importing certs of https://foo.com everything is working as expected. Thanks a lot for the help.

    pk.4...@gmail.com

    unread,
    Sep 4, 2020, 1:11:39 PM9/4/20
    to MR3
    Hi Sungwoo,

    Few question: 
    1. We have a wildcard cert (say *.foo.com) and we would be deploying HS2 using this cert (Loadblanacer service hostname mr3.foo.com). I would like to use our cert in kubernetes/generate-hivemr3-ssl.sh without generating new self-signed-cert. How can change the script to import certs but not generate?
    2. I noticed in one of your examples above you showed how to import cert (MR3_S3_CERTIFICATE=/home/gitlab-runner/mr3-run/kubernetes/minio-public.crt). I had imported the public certificate provided by the S3 server into kubernetes/key/hivemr3-ssl-certificate.jks. It did not work. Did I use wrong public cert ? I downloaded cert from here: https://docs.cloudera.com/documentation/enterprise/5-15-x/topics/sg_aws_security.html#concept_vnr_y1b_kdb
    Thanks.

    Sungwoo Park

    unread,
    Sep 4, 2020, 1:46:16 PM9/4/20
    to MR3
    1. We have a wildcard cert (say *.foo.com) and we would be deploying HS2 using this cert (Loadblanacer service hostname mr3.foo.com). I would like to use our cert in kubernetes/generate-hivemr3-ssl.sh without generating new self-signed-cert. How can change the script to import certs but not generate?
    I don't know how to pass certificates to HiveServer2 without creating a KeyStore. This is how the scripts work:

    1. generate-hivemr3-ssl.sh creates a KeyStore mr3-ssl.jks (and a Credential file mr3-ssl.jceks).
    2. run-hive.sh creates a copy of mr3-ssl.jks in key/hivemr3-ssl-certificate.jks, and imports all certificates to it. The list of certificates is found in import_certificates() in config-run.sh.

        certificates="$MR3_RANGER_MYSQL_CERTIFICATE $MR3_METASTORE_MYSQL_CERTIFICATE $MR3_KMS_CERTIFICATE $MR3_S3_CERTIFICATE"

    You can edit the list to add/remove certificates to be added to key/hivemr3-ssl-certificate.jks. So, as far as I know, you need to create a KeyStore in any case. I think creating a KeyStore in this way should not affect the way HiveServer2 works.
     
    1. I noticed in one of your examples above you showed how to import cert (MR3_S3_CERTIFICATE=/home/gitlab-runner/mr3-run/kubernetes/minio-public.crt). I had imported the public certificate provided by the S3 server into kubernetes/key/hivemr3-ssl-certificate.jks. It did not work. Did I use wrong public cert ? I downloaded cert from here: https://docs.cloudera.com/documentation/enterprise/5-15-x/topics/sg_aws_security.html#concept_vnr_y1b_kdb

    Not sure if it is a wrong certificate. Is everything okay if you set fs.s3a.connection.ssl.enabled to false?

    Cheers,

    --- Sungwoo

    Reply all
    Reply to author
    Forward
    0 new messages