RabbitMQ Cluster failed to start

123 views
Skip to first unread message

Jai Park

unread,
Apr 17, 2023, 6:19:52 PM4/17/23
to rabbitmq-users
I am deploying RabbitMQ using Operator in Openshift 4.8 environment.   Using cluster-operator.yaml, I was able to deploy rabbitmq operator and rabbitmqcluster instance successfully in my prototype environment.    However, same exact code and image failed in Our Production Environment.   Operator deployed and does not seem to have any issue but rabbitmqcluster instance failed to start.   Only difference between two environment is Production is Air-Gap environment (No internet connection).   

When it is starting, all feature flags are unchecked and plug-ins failed to start.   

Help is very appreciated.   I attached the statefulset overrride yaml that I used to deploy.

Thanks.
rabbitmq.yaml

Michal Kuratczyk

unread,
Apr 18, 2023, 12:33:26 AM4/18/23
to rabbitm...@googlegroups.com
Please provide the startup logs.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/bc10ac9e-59dc-4824-abf9-56fcd96236e1n%40googlegroups.com.


--
Michał
RabbitMQ team

Jai Park

unread,
Apr 18, 2023, 3:58:54 PM4/18/23
to rabbitm...@googlegroups.com
I attached the Log File.   I checked endpoints of the service and endpoints for rabbitmq is in NotReadyAddress.   I assume that is because the rabbitmq did not start correctly.
Help is appreciated.

Thanks

rabbitmq-server-0.log
rabbit@rabbitmq-server-0.rabbitmq-nodes.rabbitmq.log

Michal Kuratczyk

unread,
Apr 18, 2023, 4:38:37 PM4/18/23
to rabbitm...@googlegroups.com
In one of the log files, there are many TLS errors related to k8s peer
discovery plugin attempting to connect to the Kubernetes API.
I don't know what causes this, but I'm fairly sure you have different
TLS settings between the clusters and that's why it works in one
environment but not the other.

Best,
> To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAP5qHHQ%2BG-A_vN%3DbRCvkdC-m%3DMhMYHaFhkojSM54UceQZ%2BctKg%40mail.gmail.com.



--
Michał
RabbitMQ team

Jai Park

unread,
Apr 18, 2023, 5:10:07 PM4/18/23
to rabbitm...@googlegroups.com
So, what do you recommend me to do?

Michal Kuratczyk

unread,
Apr 18, 2023, 5:30:18 PM4/18/23
to rabbitm...@googlegroups.com
Check what's different in terms of TLS configuration. Use wireshark or
similar tool to try to understand what's going on.
air-gapping doesn't seem to have anything to do with the problem. TLS
seems to be.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAP5qHHS%3DY6izN0YnUySme4GNktyEduViDmc%2B3sHBh9kn30fvXA%40mail.gmail.com.



--
Michał
RabbitMQ team

Jai Park

unread,
Apr 18, 2023, 7:15:52 PM4/18/23
to rabbitm...@googlegroups.com
I checked TLS setup on Openshift and did not find any difference in it.

Michal Kuratczyk

unread,
Apr 19, 2023, 3:54:52 AM4/19/23
to rabbitm...@googlegroups.com
I can only tell you that this error occurs in the Erlang runtime
during TLS key generation:
https://github.com/erlang/otp/blob/master/lib/crypto/c_src/evp.c#L136
I've never seen such an error and don't know what caused it. I'd
probably start by upgrading the image to get the latest
RabbitMQ/ErlangOpenSSL.
If that doesn't help, you can try different TLS configurations, other
ciphers, things like that, to see if it has any effect.

On the RabbitMQ side, it's a call to the Kubernetes API to get a list
of nodes: https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbitmq_peer_discovery_k8s/src/rabbit_peer_discovery_k8s.erl#L47
But it's been working well for a lot of people in all kinds of
environments so I doubt there's anything wrong with it code-wise. TLS
support is implemented in the Erlang runtime with crypto components
from OpenSSL.
RabbitMQ is just asking Erlang to make a request over HTTPS.

Best,
> To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAP5qHHS7%2BjLibnXg8O3n5g_KwrhNv%3DuZ7d8pEUuPKxF3tjFvWQ%40mail.gmail.com.



--
Michał
RabbitMQ team

Jai Park

unread,
Apr 20, 2023, 2:52:35 PM4/20/23
to rabbitm...@googlegroups.com
I checked TLS Profiles and they are same and it is out-of-box Openshift Setting.  I upgrade erLang to 25.2.3 but still not working

Another related question.   If you look at the log, all feature flags are unchecked without giving any details on why.   On my working environment, they are all checked during startup.   Do you have any idea why those are unchecked?

Jai Park

Michal Kuratczyk

unread,
Apr 21, 2023, 2:52:23 AM4/21/23
to rabbitm...@googlegroups.com
There may be a difference between a long-lived environment (upgraded from versions that didn't include these flags - new flags are not automatically enabled during upgrade) vs an environment that was deployed fresh with a recent version (initial deployment, if there is no cluster to join, has all flags enabled). 

Either way, just enable them. Feature flags are not meant to be configuration toggles, they are only there to allow a rolling upgrade. Once the upgrade is considered successful, all flags should be enabled.

Best,



--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages