When asking questions, please provide the following information:
Elasticsearch version 6.6
Searchguard: search-guard-6-6.6.1-24.1.zip
JVM: openjdk8 - Java 1.8.0
Searchguard configuration files - default shipped
Elasticsearch logs
Other plugins:
Timelion
The clsuter was initially setup with Elasticsearch 6.4 cluster and searchguard.
The scripts from searchguard (from some example scripts downloaded at the time I think) were used to generate the initial scripts:
gen_client_node_cert.sh
gen_root_ca.sh
etc
Then added the config for the keystore/truststores as such:
searchguard.enterprise_modules_enabled: true
searchguard.restapi.roles_enabled: ["sg_all_access"]
searchguard.authcz.admin_dn:
-"<sanitized, but works>"
#searchguard.ssl.http.clientauth_mode: REQUIRE
#searchguard.ssl.http.clientauth_mode: NONE
searchguard.ssl.http.clientauth_mode: OPTIONAL
#
# Transport layer SSL
#
searchguard.ssl.transport.enabled: true
searchguard.ssl.transport.keystore_type: JKS
searchguard.ssl.transport.keystore_filepath: /etc/elasticsearch/elk-certs/keystores/node1-keystore.jks
searchguard.ssl.transport.keystore_password: <password>
searchguard.ssl.transport.truststore_type: JKS
searchguard.ssl.transport.truststore_filepath: /etc/elasticsearch/elk-certs/keystores/truststore.jks
searchguard.ssl.transport.truststore_password: <password>
searchguard.ssl.transport.enforce_hostname_verification: false
#
# HTTP/REST layer SSL
#
#searchguard.ssl.http.enabled: false
searchguard.ssl.http.enabled: true
searchguard.ssl.http.keystore_type: JKS
searchguard.ssl.http.keystore_filepath: /etc/elasticsearch/elk-certs/keystores/node1-keystore.jks
searchguard.ssl.http.keystore_password: <password>
searchguard.ssl.http.truststore_type: JKS
searchguard.ssl.http.truststore_filepath: /etc/elasticsearch/elk-certs/keystores/truststore.jks
searchguard.ssl.http.truststore_password: <password>
searchguard.roles_mapping_resolution: BOTH
The cluster was/is running fine.
I stepped in and upgraded to ES 6.6 and the new version of searchguard with the same config. It still works fine.
******
Important: Now I want to replace our current certificates with a new set because the password for the root-ca key that was used the root CA generation has been lost so I can't seem to sign new certs to add more nodes, I seem to only have access to the store to use it to verify the current chain.
*******
I downloaded tls-tool-1.6 and generated new certs with the following sgtlstool.sh config:
ca:
root:
# The distinguished name of this CA. You must specify a distinguished name.
# example: dn: CN=root.ca.example.com,OU=CA,O=Example Com\, Inc.,DC=example,DC=com
dn: CN=root.ca-<cluster-name>,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
# The size of the generated key in bits
keysize: 2048
# The validity of the generated certificate in days from now
validityDays: 3650
# Password for private key
# Possible values:
# - auto: automatically generated password, returned in config output;
# - none: unencrypted private key;
# - other values: other values are used directly as password
pkPassword: <password>
# The name of the generated files can be changed here
file: root-ca.pem
# If you have a certificate revocation list, you can specify its distribution points here
# crlDistributionPoints: URI:https://raw.githubusercontent.com/floragunncom/unittest-assets/master/revoked.crl
defaults:
# The validity of the generated certificate in days from now
validityDays: 3650
# Password for private key
# Possible values:
# - auto: automatically generated password, returned in config output;
# - none: unencrypted private key;
# - other values: other values are used directly as password
pkPassword: <password>
# Specifies to recognize legitimate nodes by the distinguished names
# of the certificates. This can be a list of DNs, which can contain wildcards.
# Furthermore, it is possible to specify regular expressions by
# enclosing the DN in //.
# Specification of this is optional. The tool will always include
# the DNs of the nodes specified in the nodes section.
#
# Examples:
# - "CN=*.example.com,OU=Ops,O=Example Com\\, Inc.,DC=example,DC=com"
# - 'CN=node.other.com,OU=SSL,O=Test,L=Test,C=DE'
# - 'CN=*.example.com,OU=SSL,O=Test,L=Test,C=DE'
# - 'CN=elk-devcluster*'
# - '/CN=.*regex/'
nodesDn:
- '/CN=node[0-9]*/i,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com'
nodes:
# The node name is just used as name of the generated files
- name: node1.d1.d2.com
# The distinguished name of this node
dn: CN=node1,OU=<OU>,O=<ORG>,DC=d1,DC=d2,DC=com
# DNS names of this node. Several names can be specified as list
dns:
- node1.d1.d2.com
- alias.d1.d2.com
# The IP addresses of this node. Several addresses can be specified as list
ip:
- ip1
- ip2
# If you want to override the keysize, pkPassword or validityDays values from
# the defaults, just specify them here.
- name: node2.d1.d2.com
# The distinguished name of this node
dn: CN=node2,OU=<OU>,O=<ORG>,DC=d1,DC=d2,DC=com
# DNS names of this node. Several names can be specified as list
dns:
- node2.d1.d2.ca
# The IP addresses of this node. Several addresses can be specified as list
ip: ip1
# If you want to override the keysize, pkPassword or validityDays values from
# the defaults, just specify them here.
Repeated for a total of 5 nodes, all others matching node2
I ran the command for sgtlstool.sh
sgtlstool.sh -ca -crt -c cluster.yml -t ./output/
The snippets from the output had this:
searchguard.ssl.transport.pemcert_filepath: node5.d1.d2.com.pem
searchguard.ssl.transport.pemkey_filepath: node5.d1.d2.com.key
searchguard.ssl.transport.pemkey_password: <password>
searchguard.ssl.transport.pemtrustedcas_filepath: root-ca.pem
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: node5.d1.d2.com_http.pem
searchguard.ssl.http.pemkey_filepath: node5.d1.d2.com_http.key
searchguard.ssl.http.pemkey_password: <password>
searchguard.ssl.http.pemtrustedcas_filepath: root-ca.pem
searchguard.nodes_dn:
- /CN=node[0-9]*/i,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node1,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node2,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node3,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node4,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node5,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
searchguard.authcz.admin_dn:
- CN=sgAdmin,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
I replaced it with this:
searchguard.ssl.transport.pemcert_filepath: elk-certs/node5.d1.d2.com.pem
searchguard.ssl.transport.pemkey_filepath: elk-certs/node5.d1.d2.com.key
searchguard.ssl.transport.pemkey_password: <password>
searchguard.ssl.transport.pemtrustedcas_filepath: elk-certs/root-ca.pem
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: elk-certs/node5.d1.d2.com_http.pem
searchguard.ssl.http.pemkey_filepath: elk-certs/node5.d1.d2.com_http.key
searchguard.ssl.http.pemkey_password: <password>
searchguard.ssl.http.pemtrustedcas_filepath: elk-certs/root-ca.pem
searchguard.nodes_dn:
- /CN=node[0-9]*/i,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node1,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node2,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node3,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node4,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
- CN=node5,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
searchguard.authcz.admin_dn:
- CN=sgAdmin,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com
sgtlsdiah.sh lists no problems when run against this.
I shut down all the nodes. Replaced the certs and started back up. They didn't seem to have any problems with it starting up. However after a short time and they all started seeing the other nodes the logs got filled with certificate unknown errors:
[2019-03-28T20:28:38,925][ERROR][c.f.s.s.h.n.SearchGuardSSLNettyHttpServerTransport] [log1-op] SSL Problem Received fatal alert: certificate_unknown
javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634) ~[?:?]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1800) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1083) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) ~[?:?]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_181]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:405) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:372) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2019-03-28T20:28:38,925][ERROR][c.f.s.s.h.n.SearchGuardSSLNettyHttpServerTransport] [log1-op] SSL Problem Received fatal alert: certificate_unknown
javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634) ~[?:?]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1800) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1083) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) ~[?:?]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_181]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:405) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:372) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:474) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
interspersed with this every now and then:
[2019-03-28T20:28:41,208][WARN ][o.e.d.z.ZenDiscovery ] [node1] not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again
I assume I am doing something wrong, but I don't know what and I would appreciate any help to sort it out.
Thanks.