RabbitMQ on Kubernetes with FIPS

511 views
Skip to first unread message

Darren Ma

unread,
May 6, 2021, 6:16:09 PM5/6/21
to rabbitmq-users
I have a RabbitMQ cluster deployed on Kubernetes.  On non-FIPS systems, our RabbitMQ deployment works fine.  But on FIPS systems, we are seeing RabbitMQ crash with error:

bad argument in call to crypto:evp_generate_key_nif(x25519, undefined) in ssl_cipher:generate_client_shares/2 line 1390

For context, here are some of the lines in the rabbitmq log:

[error] <0.448.0> CRASH REPORT Process <0.448.0> with 1 neighbours crashed with reason: bad argument in call to crypto:evp_generate_key_nif(x25519, undefined) in ssl_cipher:generate_client_shares/2 line 1390

[debug] <0.271.0> Response: {error,{failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},{inet,[inet],{eoptions,{{badarg,[{crypto,evp_generate_key_nif,[x25519,undefined],[]},{ssl_cipher,generate_client_shares,2,[{file,"ssl_cipher.erl"},{line,1390}]},{tls_connection,init,3,[{file,"tls_connection.erl"},{line,590}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1166}]},{tls_connection,init,1,[{file,"tls_connection.erl"},{line,153}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]},{gen_statem,call,[<0.448.0>,{start,2250},infinity]}}}}]}}

In our "advanced.config" we have tried to enable FIPS with this setting:

[

    {crypto, [{fips_mode, true}]}

].

We are using RabbitMQ Server 3.8.9 and Erlang 23.0.2 from:

Does anyone have suggestions?  Thanks!

Michal Kuratczyk

unread,
May 6, 2021, 7:32:12 PM5/6/21
to rabbitm...@googlegroups.com
Hi,

I haven't used FIPS until I read your question but here are some things i found:
1. I successfully deployed a RabbitMQ cluster to GKE with FIPS enabled (using the Operator and your advanced.config to enable FIPS). I can see a similar GET line in the logs followed by a successfully returned content. I use the latest Docker image, which includes Erlang/OTP 23.3.2 and OpenSSL 1.1.1k.
2. Given the above, I'd suggest testing the latest Erlang and OpenSSL
3. RabbitMQ doesn't implement cryptography - Erlang does. Therefore, if the latest versions still fail for you, I'd suggest reporting it to the Erlang/OTP team: https://github.com/erlang/otp/

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/cb7caf81-c226-4bd4-aab3-398344ba238an%40googlegroups.com.


--
Michał
RabbitMQ team

Darren Ma

unread,
May 11, 2021, 1:20:56 PM5/11/21
to rabbitmq-users
Hi Michal,

Thank you so much for your reply.  I tried with the latest version of RabbitMQ (3.8.15-1),  Erlang (23.3.2) and OpenSSL (1.1.1g) that I could find for RHEL, but still encountered the same error from earlier.

I reached out to Erlang team, and still working to understand what I need to do get this work.  

Interestingly, from what I can find, it seems that OpenSSL 1.1.1 does not support FIPS:  https://github.com/openssl/openssl/issues/7582  . 

It seem that there is a FIPS Object Module for OpenSSL that is only supported with OpenSSL 1.0.2.  https://wiki.openssl.org/index.php/FIPS_modules  

I'm not sure how your RabbitMQ cluster to GKE with OpenSSL 1.1.1K worked with FIPS enabled.  Do you happen to know if you had a FIPS Object Module for OpenSSL in your environment?

Thanks again.

Darren Ma

unread,
May 11, 2021, 1:25:49 PM5/11/21
to rabbitmq-users
Hi Michal,

Do you also know if your container was FIPS enabled? 
eg Does
cat /proc/sys/crypto/fips_enabled
return 1?

Thanks

Michal Kuratczyk

unread,
May 12, 2021, 5:19:24 AM5/12/21
to rabbitm...@googlegroups.com
Sorry for giving you a false hope, /proc/sys/crypto doesn't even exist in my container. I guess in my environment, rather than crashing on startup, it was just silently ignoring this option.

According to https://erlang.org/doc/apps/crypto/fips.html, FIPS need to be enabled when compiling OTP but

How do you build Erlang?



--
Michał

Darren Ma

unread,
May 12, 2021, 5:46:30 AM5/12/21
to rabbitmq-users
Hi Michał,

No worries.  Thanks for following up.

Regarding your question, I'm not building erlang myself.  I'm pulling RabbitMQ and Erlang from the "rabbitmq" GitHub:

yum localinstall -y erlang-23.3.2-1.el8.x86_64.rpm

Based on the stacktrace, it seems like it failing in the Rabbitmq Kubernetes plugin for peer discovery ( https://www.rabbitmq.com/cluster-formation.html#peer-discovery-k8s )

And the error seems to be in erlang crypto library: bad argument in call to crypto:evp_generate_key_nif(x25519, undefined) in ssl_cipher:generate_client_shares/2 line 1390

I'm wondering if I could resolve this by listing the supported ciphers in the rabbitmq config, but I'm not sure which ciphers are supported for FIPS.

Thanks.


 


Darren Ma

unread,
May 12, 2021, 6:11:09 AM5/12/21
to rabbitmq-users
As an FYI, here is a response I got to my question in Erlang github: https://github.com/erlang/otp/issues/4818#issuecomment-839639031

Michal Kuratczyk

unread,
May 12, 2021, 7:09:06 AM5/12/21
to rabbitm...@googlegroups.com
That response from the Erlang team doesn't bode well - seems like you'd end up with an unsupported version of OpenSSL either way...
As for the k8s peer discovery:
1. I think the problem could be solved on the Kubernetes side - if I understand correctly, FIPS limits the ciphers that can be used. In your environment, Kubernetes expects Erlang to support a cipher that is not supported. I think you could try to configure Kubernetes to use a cipher that is supported by a FIPS-enabled Erlang.
2. k8s peer discovery is not a must-have for RabbitMQ to work on Kubernetes. In fact, over the last few days we've been testing the Operator without k8s plugin to see whether we could solve some issues. You can use this branch for testing https://github.com/rabbitmq/cluster-operator/pull/687 or just configure your RabbitMQ similarly (statically configure cluster members). If k8s peer discovery is the only thing that breaks for you with FIPS then I guess this could be the best solution.

Best,



--
Michał

Darren Ma

unread,
May 12, 2021, 3:01:34 PM5/12/21
to rabbitmq-users
Hi Michał,
Thanks.  WIth the config file based peer discovery approach, how does adding new nodes work? and what happens when nodes fail?  Do the existing nodes need to be restarted to pickup the config changes with new list of nodes?

Michal Kuratczyk

unread,
May 12, 2021, 5:36:48 PM5/12/21
to rabbitm...@googlegroups.com
Peer discovery only matters during the initial cluster formation - once a cluster is formed, your peer discovery configuration is irrelevant to nodes restarts/failures.

If you want to add more nodes to the cluster, new nodes need to discover the existing nodes. Existing nodes don't need to do anything special.
We have another PR that we are testing, where we don't use k8s plugin and instead, just use node-0 as the well-known host that all other nodes wait for during the initial cluster formation. In this case, node-0 would need to be available for new nodes to be added. It's trivial configuration-wise - you can see in the configmap.go of https://github.com/rabbitmq/cluster-operator/pull/689.





--
Michał
Message has been deleted

Darren Ma

unread,
May 14, 2021, 3:40:05 AM5/14/21
to rabbitmq-users
Thanks, Michał!

I was able to get RabbitMQ to start after switching to "classic_config" for the peer discovery.  However, I noticed some errors in the logs and wanted to confirm if it something we can ignore.

I'm using this config for my 2 node/pod RabbitMQ cluster:
    cluster_formation.peer_discovery_backend  = classic_config
    cluster_formation.classic_config.nodes.1 = rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
    cluster_formation.classic_config.nodes.2 = rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local

I'm seeing this in the "rabbitmq-ha-0" log:

2021-05-14 07:16:25.028 [info] <0.271.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2021-05-14 07:16:25.028 [info] <0.271.0> Configured peer discovery backend: rabbit_peer_discovery_classic_config
2021-05-14 07:16:25.028 [debug] <0.271.0> Peer discovery backend does not support initialisation
2021-05-14 07:16:25.028 [info] <0.271.0> Will try to lock with peer discovery backend rabbit_peer_discovery_classic_config
2021-05-14 07:16:25.028 [debug] <0.271.0> rabbit_peer_discovery:lock returned not_supported
2021-05-14 07:16:25.028 [info] <0.271.0> Peer discovery backend does not support locking, falling back to randomized delay
2021-05-14 07:16:25.028 [info] <0.271.0> Peer discovery backend rabbit_peer_discovery_classic_config supports registration.
2021-05-14 07:16:25.028 [debug] <0.271.0> Randomized startup delay: configured range is from 5000 to 60000 milliseconds, PRNG pick: 33446...
2021-05-14 07:16:25.028 [info] <0.271.0> Will wait for 33446 milliseconds before proceeding with registration...
2021-05-14 07:16:58.475 [info] <0.271.0> All discovered existing cluster peers: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
2021-05-14 07:16:58.475 [info] <0.271.0> Peer nodes we can cluster with: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
2021-05-14 07:16:58.483 [warning] <0.271.0> Could not auto-cluster with node rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local: {badrpc,nodedown}
2021-05-14 07:16:58.483 [error] <0.271.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 9 retries left...
2021-05-14 07:16:58.986 [warning] <0.271.0> Could not auto-cluster with node rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local: {badrpc,nodedown}
....
2021-05-14 07:17:02.509 [error] <0.271.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 1 retries left...
2021-05-14 07:17:03.011 [warning] <0.271.0> Could not auto-cluster with node rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local: {badrpc,nodedown}
2021-05-14 07:17:03.011 [error] <0.271.0> Trying to join discovered peers failed. Will retry after a delay of 500 ms, 0 retries left...
2021-05-14 07:17:03.512 [warning] <0.271.0> Could not successfully contact any node of: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local (as in Erlang distribution). Starting as a blank standalone node...


But I noticed that the cluster_status seems to show both nodes:
sh-4.4$ rabbitmqctl cluster_status
Cluster status of node rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local ...

Cluster name: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local

Running Nodes
rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local
rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local

Listeners
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rab...@icp4adeploy-rabbitmq-ha-0.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rab...@icp4adeploy-rabbitmq-ha-1.icp4adeploy-rabbitmq-ha-discovery.sp.svc.cluster.local, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS

Is it safe to ignore the errors?  Are they possible just to due timing issue because the 2nd RabbitMQ node/pod has not come online yet?

Thanks!

Michal Kuratczyk

unread,
May 14, 2021, 3:45:29 AM5/14/21
to rabbitm...@googlegroups.com
Hi,

Yes, if all nodes start at the same time then you'll see warnings about failed attempts to contact the other nodes.




--
Michał
Reply all
Reply to author
Forward
0 new messages