Going to do followings again:
- I will go through docs again to find where to provide KEYSTORE_PASSWORD
- List the keystore and trustore to check if password is valid or not
I have updated hive-site.xml and core-site.xml (attached them here only the diff):I have few questions:
- In hive-site.xml:
- I see property hive.server2.keystore.password for HS2 only not for metastore
- value is set to "_" : don't I have to replace with actual password (used while generating cert and keystore) ?
- I am not planning to use SSL on postgres. Do I have to do following: ?
- copied from doc for enable SSL on Metastore (The user should make a copy of the certificate file for connecting to the MySQL database for Metastore and set MR3_METASTORE_MYSQL_CERTIFICATE in kubernetes/config-run.sh to point to the copy.)
- In fact, I was not planning to enable SSL on metastore - terminating TLS at HiveServer2 (HS2). Looks like it will be easier to enable it for both HS2 & MS
- In hive-site.xml, for "javax.jdo.option.ConnectionURL" property, will query-param "createDatabaseIfNotExist=true" also work for postgres? I have seen in your doc for MySQL.
- I did not see any keystore/cert for Beeline services (as mentioned in the doc). Did I miss something while generating the cert/trustore/keystore? Or we can use HS2 files?
Going to do followings again:
- I will go through docs again to find where to provide KEYSTORE_PASSWORD
- List the keystore and trustore to check if password is valid or not
For running with Helm, the generated password is set only in config-run.sh and env-secret.sh:./helm/hive/env-secret.sh:HIVE_SERVER2_SSL_TRUSTSTOREPASS=8ecead69-2f02-4177-a988-9c0daee15980./helm/hive/env-secret.sh:export HADOOP_CREDSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980./config-run.sh:MR3_SSL_KEYSTORE_PASSWORD=8ecead69-2f02-4177-a988-9c0daee15980
If all things fail, could you reset the setting (to make sure that MR3 starts okay without SSL) and follow the instruction in 'Enabling SSL' in https://mr3docs.datamonad.com/docs/k8s/helm/run-metastore/? This page contains all necessary changes and pointers to other related pages (which have been tested).
Minor comments:1. hive.server2.keystore.password can be set to _. Setting it to the password seems okay, though.
2. No need to set hive.metastore.keystore.password in hive-site.xml.
Cheers,--- Sungwoo
If I understand it correctly:
- config-run.sh is used to perform SSL operations (like creating keystore, truststore, self-sign certs etc which are copied to key/ and eventually used in the HS2 deployment). config-run.sh is not used in the deployment. If I need to provide config-run.sh (with MR3_SSL_KEYSTORE_PASSWORD=<password>) where should I copy it in the helm chart? fyi: I have copied (customizable via java properties) hise-setup.sh, run-master/worker.sh into docker image (that means it's final and remain same for all our clusters. I do not have an option to bundle a config-run.sh with a hard coded password because it will have different password for different cluster. Our cluster:
- do not share HS2, Metastore, etc and completely independent HS2 service - one cluster per VPC.
- They are customized via secrets, configmaps, and/or java properties
- env.sh (or better/safer env-secret.sh) has settings for password.
- Short and easy route: disable SSL for S# by setting fs.s3a.connection.ssl.enabled = false
- What will be security implications of it - assuming that only component reachable by external client is HS2 ?
- Long approach:
- import public cert of AWS S3 into truststore
- Set fs.s3a.endpoint = https://orange0:9000 . What is https://orange0:9000 ? Metastore host and port or HS2:port?
Do I need to export ATS_SECRET_KEY=<password> in env-secret.sh to fix DAGAppMaster issue mentioned in above thread ? I am using only:1. HS2 with SSL2. Metastore w/ SSL3. Postgres w/o SSL
- To fix the SSL handshake error (it's polluting log way faster than "no data or no sasl" and making it difficult to troubleshoot any issue), do I need to set something like "-Dhttps.protocols=TLSv1.1,TLSv1.2" (if yes where in hive-setup.sh and run-master/worker.sh too) ? https://stackoverflow.com/questions/21245796/javax-net-ssl-sslhandshakeexception-remote-host-closed-connection-during-handsh/22629008
- JDBC client (external) is failing to connect to SSL-enabled HS2 host.
- The error is "Could not open client transport with JDBC Uri: jdbc:hive2://<HS2-host>:9852/default; Invalid status 21" (Details - Type: java.sql.SQLException (SQL State: 08S01) ). HS2 server log shows correct message "Unrecognized SSL message, plaintext connection?"
- Added SSL related info to the connection URI. Now, I am getting different error "Could not open client transport with JDBC Uri: jdbc:hive2://<HS2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/ws/keystore/dev;trustStorePassword=<password>: Error creating the transport. There is no activity in the HS2 log. It means the URI is not correct - I checked the format and it looks fine to me. I doubt that LoadBalancer is issue here and not letting it forward to HS2. Any idea?
- I will try creating truststore for beeline and test it. How do I copy truststore file to HS2 server ?
--
You received this message because you are subscribed to the Google Groups "MR3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hive-mr3+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hive-mr3/a1496b70-08a4-4c92-88b9-f5e3cb47c4c2n%40googlegroups.com.
1.I encountered another problem after enabling SSL (similar to S3). We use our custom authentication/authorization manager (PAM - pluggable auth module), which accesses another REST API service using https://foo.com/<end-point>. Now it appears it is using not finding cert for this service and throwing error (specially first 4 lines - attached log):2020-08-25T23:18:24,622 INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling endpoint:Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false2020-08-25T23:18:24,622 INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false2020-08-25T23:18:24,636 ERROR [HiveServer2-Handler-Pool: Thread-36] transport.TSaslTransport: SASL negotiation failurejavax.security.sasl.SaslException: Error validating the loginI thought SASL negotiation is used only for kerberos - why it is happening here? Do I need to disable something? fyi: it works without SSL.It could be similar issue similar to S3 (I may have to import cert here too - which was not needed w/o SSL HS2) - line # 15"Caused by: javax.security.sasl.AuthenticationException: General I/O Exception. General I/O Exception. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"
2.
My python client service also can not connect it returns "TTransportException: TSocket read 0 bytes" error. Any ideas how to provide SSL info? I thought it should work but it did not.connection url in python (using SQLAlchemy / pyhive)After SSL : hive://hive:<password>@<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;Before SSL : hive://hive:<password>@<hs2-host>:9852/default;
3. Both works (after and before SSL urls)connection url in JDBC Client (DBVisualizer) :After SSL : jdbc:hive2://<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;Before SSL : jdbc:hive2://<hs2-host>:9852/default;userId = hivepassword = <hive-password>
Is it possible to terminate SSL at HS2 (and everything else is non-SSL) or it's too much of effort? Any other ideas?
1.I encountered another problem after enabling SSL (similar to S3). We use our custom authentication/authorization manager (PAM - pluggable auth module), which accesses another REST API service using https://foo.com/<end-point>. Now it appears it is using not finding cert for this service and throwing error (specially first 4 lines - attached log):2020-08-25T23:18:24,622 INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling endpoint:Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false2020-08-25T23:18:24,622 INFO [HiveServer2-Handler-Pool: Thread-36] auth.PkumarAPIService: Calling API with URL: https://API.dev.pkumar-test.com/v1/auth?generate-tokens=false2020-08-25T23:18:24,636 ERROR [HiveServer2-Handler-Pool: Thread-36] transport.TSaslTransport: SASL negotiation failurejavax.security.sasl.SaslException: Error validating the loginI thought SASL negotiation is used only for kerberos - why it is happening here? Do I need to disable something? fyi: it works without SSL.It could be similar issue similar to S3 (I may have to import cert here too - which was not needed w/o SSL HS2) - line # 15"Caused by: javax.security.sasl.AuthenticationException: General I/O Exception. General I/O Exception. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"HiveServer2 add these Java options when it uses SSL:-Djavax.net.ssl.trustStore=/opt/mr3-run/key/hivemr3-ssl-certificate.jks -Djavax.net.ssl.trustStoreType=jks -Djavax.net.ssl.trustStorePassword=8ecead69-2f02-4177-a988-9c0daee15980My guess is that when it contacts https://foo.com, HiveServer2 tries to find the certificate in hivemr3-ssl-certificate.jks, but in your case, the certificate is missing in hivemr3-ssl-certificate.jks.You can manually add the certificate to hivemr3-ssl-certificate.jks (before executing HiveServer2), or update import_certificates() in kubernetes/config-run.sh and run 'kubernetes/run-hive.sh --generate-truststore' (see my previous message).So, my question is: can you obtain a certificate for https://foo.com?
2.
My python client service also can not connect it returns "TTransportException: TSocket read 0 bytes" error. Any ideas how to provide SSL info? I thought it should work but it did not.connection url in python (using SQLAlchemy / pyhive)After SSL : hive://hive:<password>@<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;Before SSL : hive://hive:<password>@<hs2-host>:9852/default;Not sure about this, but what happens if you remove 'hive:<password>' in the URL because I think it should be unnecessary? Also, check if HiveServer2 prints any message to see if it is a client-side issue.3. Both works (after and before SSL urls)connection url in JDBC Client (DBVisualizer) :After SSL : jdbc:hive2://<hs2-host>:9852/default;ssl=true;sslTrustStore=/Users/pkumar/keystore/dev/mr3-ssl.jks;trustStorePassword=<password>;Before SSL : jdbc:hive2://<hs2-host>:9852/default;userId = hivepassword = <hive-password>
Is it possible to terminate SSL at HS2 (and everything else is non-SSL) or it's too much of effort? Any other ideas?Let me look into this issue. (At the moment, I don't know how.) So, your goal is to enable SSL only for the connection from the client to HiveServer2, right?
Cheers,--- Sungwoo
Yes, I am planning to add it. But, I think it's not going to help me after finding PyHive does not support SSL (my python client service is using PyHive). Our python service client:
- uses pyhive to ingest data into metastore.
- uses internal HS2 service (2nd instance of HS2) and I am planning to not use SSL here (as long as it does not delay the release). But both internal and external HS2 services share metastore (which is now SSL enabled unless we terminate the SSL at HS2).
Let me look into this issue. (At the moment, I don't know how.) So, your goal is to enable SSL only for the connection from the client to HiveServer2, right?
Correct - it will be of immense help is we can terminate SSL at HS2. We can use current working cluster (python service client works, our CUSTOM auth/authorization works, two HS2 servers work - one for external and another one for internal use sharing metastore). In this working cluster everything is non-SSL and working fine. Only if we can terminate the TLS at HS2 it will avoid all other changes - which are a lot. I went through at least 50 search result to make our python service work with SSL and PyHive/SQLAlchemy and did not find any solution working.
But, internal python client is failing to connect to 2nd HS2 service (where SSL has been disabled). From 2nd instance of HS2's log, it appears that it is still trying to use SSL and throwing error "". I have attached log. I am debugging to check why SSL is being used on 2nd instance. Any idea how to quickly check it? Java system properties?
2. I did not find "-Djavax.net.ssl.trustStore" in mr3-site.xml . That property has been defined, as seen below.<property><name>mr3.am.launch.cmd-opts</name><value>-server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -Dlog4j.configurationFile=k8s-mr3-container-log4j2.properties -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.auth.login.config=/opt/mr3-run/conf/jgss.conf -Djava.security.krb5.conf=/opt/mr3-run/conf/krb5.conf -Dsun.security.jgss.debug=true</value></property>
3. Our custom (auth/authentioncation) module is in Java. I am thiking, in non-SSL mode HS2 managed to connect to https://foo.com because it was downloading cert during handshake? May be? Not sure.I am not expert on Java and thinking that following should unset "javax.net.ssl.trustStore":3.1 System.getProperties().remove("javax.net.ssl.trustStore") or3.2 System.getProperties().remove("javax.net.ssl.trustStore", '')
To make our life easier, I am thinking of adding nginx server in the front of HS2 and keep HS2 non-SSL. `nginx` server which routes everything to HS2. Let me know if that will work and if I have to be watchful of annything. Thanks.
- PyHive works fine with HS2 (no-SSSL) and auth=NONE. Issue was with auth=CUSTOM.
- PyHive was throwing error password can not be provided if auth=NONE used, change the hive authorization to NONE or do not include password. Connection was never initiated because of this check from PyHive.
- It means it can not be used with SSL enabled.
- Impyla : I noticed you used kerberos . Will it work without kerberos? I will give it try without kerberos (and probably "PLAIN" as auth mechanism) and check how it works for our "CUSTOM" auth.
Now, all pods are services are working fine. Our python service can curate data and ingest it into metastore. Simple as well as complex query (where worker nodes are launched for map/reduce) are returning results fine. I still have to resolve the issue with our custom auth endpoint https://foo.com . I have two options (as suggested by above) or rather 3 options:1) Retrieve the certificate for https://foo.com and merge it to key/hivemr3-ssl-certificate.jks, as described in my previous message.2) Extend custom Java module so that it ignores javax.net.ssl.trustStore.The third option, I am thinking is to use the same cert for HiveServe2 's keystore/truststore (as both are our services - https://foo.com and HS2 host). This will fulfill the need of #1 above too. Do you think it will work? If yes, how it can be done : I foresee one problem with approach: Our certs are password less. Hadoop self-sign cert tool requires password. It generates its own if not provided. Can this requirement be skipped? If yes then I just need to import our cert into Keystore and populate Truststore. Correct? I will try it in few. hours - wanted to get your opinion about it.
Last question about debugging: When I started testing nodeSelector for workers/master, I noticed (because of typos in the label), the log was printing following message continuously. Obviously, it was unable to get node scheduled but there was nothing to determine where and why it's unable to create mapper/reducer. Is there a way (Other than debug log level - that requires restart) to get more info about it?2020-08-28T03:13:43,995 INFO [HiveServer2-Background-Pool: Thread-46] SessionState: Map 1: -/- Map 2: 0(+1)/1Map 1: -/- Map 2: 0(+1)/12020-08-28T03:13:47,000 INFO [HiveServer2-Background-Pool: Thread-46] SessionState: Map 1: -/- Map 2: 0(+1)/1Map 1: -/- Map 2: 0(+1)/1
- We have a wildcard cert (say *.foo.com) and we would be deploying HS2 using this cert (Loadblanacer service hostname mr3.foo.com). I would like to use our cert in kubernetes/generate-hivemr3-ssl.sh without generating new self-signed-cert. How can change the script to import certs but not generate?
- I noticed in one of your examples above you showed how to import cert (MR3_S3_CERTIFICATE=/home/gitlab-runner/mr3-run/kubernetes/minio-public.crt). I had imported the public certificate provided by the S3 server into kubernetes/key/hivemr3-ssl-certificate.jks. It did not work. Did I use wrong public cert ? I downloaded cert from here: https://docs.cloudera.com/documentation/enterprise/5-15-x/topics/sg_aws_security.html#concept_vnr_y1b_kdb