RabbitMQ 4.1.0 {badmatch,{error,eacces}}

277 views
Skip to first unread message

Peter Drahoš

unread,
May 14, 2025, 9:14:48 AMMay 14
to rabbitmq-users
Hello,
we have on premise cluster with 4 nodes on windows server 2019 (VMware). 
Versions: RabbitMQ 4.1.0 (khepri_db), Erlang 27.3.4
All queues are of type quorum.
Avg Message rates: <5 /s
.Net 8 apps with easynetq

I did clean install, after few hours these errors started to appear on some nodes (tens of occurrences in 24h). Applications are not affected, just the errors in logs.
I noticed the same errors also with rabbitmq 4.0.7 but very sporadic, also without affect on apps.

What does it mean?
It's safe for production environment?

2025-05-13 15:20:03.868000+02:00 [warning] <0.28945.0> queue 'BaseNotification_WhitelistQueue_Worker' in vhost 'DEV': Snapshot write process <0.226939.0> exited with {{badmatch,{error,eacces}},[{ra_snapshot,'-promote_checkpoint/2-fun-0-',5,[{file,[115,114,99,47,114,97,95,115,110,97,112,115,104,111,116,46,101,114,108]},{line,420}]}]}
2025-05-13 15:20:03.868000+02:00 [error] <0.226939.0> Error in process <0.226939.0> on node 'rabbit@vs-pro-odapl02' with exit value:
2025-05-13 15:20:03.868000+02:00 [error] <0.226939.0> {{badmatch,{error,eacces}},
2025-05-13 15:20:03.868000+02:00 [error] <0.226939.0>  [{ra_snapshot,'-promote_checkpoint/2-fun-0-',5,
2025-05-13 15:20:03.868000+02:00 [error] <0.226939.0>                [{file,"src/ra_snapshot.erl"},{line,420}]}]}

Thank you,
Peter

Michal Kuratczyk

unread,
May 14, 2025, 9:31:00 AMMay 14
to rabbitm...@googlegroups.com
`eaccess` just means filesystem permissions don't allow some operations. RabbitMQ expects to have full access to its data folder.
Make sure your filesystem permissions are set correctly. Also, it's common on Windows for security software to interfere with
the software that's running (rejecting "suspicious" operations such as file access...), so it could be that.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/d759b58e-fb08-42ce-8f70-dd7dd38aea87n%40googlegroups.com.


--
Michal
RabbitMQ Team

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Peter Drahoš

unread,
May 15, 2025, 3:00:02 AMMay 15
to rabbitmq-users
Hello,
it's default installation, so the windows service runs as Local System account. Data folder permission are correct and these folders are the same for years.
We have Symantec Endpoint Protection on servers, we looked into every log and there are no info about blocking, rejecting at all.
And the files in db folders like snapshot.dat have curret modified timestamp.
First I tried to upgrade existing node from 4.0.7 to 4.1.0 and this caused frequency of error occurrence from 2-3 in month to tens in day. So the rabbitmq upgrade was the only change on the server.

Thanks,
Peter

Michal Kuratczyk

unread,
May 15, 2025, 3:30:39 AMMay 15
to rabbitm...@googlegroups.com
There were changes to the checkpointing/snapshotting logic, and indeed 4.1 like promotes snapshots more often than 4.0.7 did,
so that explains why the frequency of these errors changed. As for the error itself, there's really nothing we can do here:
eaccess means that RabbitMQ/Erlang tried to perform a filesystem operation and it was refused by the operating system.
We have a checkpoint file and we want to promote it to a snapshot status. On the filesystem level, it's just a file
rename/move operation: in the RabbitMQ's data folder, there is a folder for a given queue and it has "checkpoints"
and "snapshots" subfolders. This operation tries to move "checkpoints/SOME_FOLDER" to become "snapshots/SOME_FOLDER".
For some reason, your system doesn't allow that.


Peter Drahoš

unread,
May 15, 2025, 7:35:29 AMMay 15
to rabbitmq-users
Hello,
we removed symantec from 2 nodes, set permissions to db folder to everyone (with replace all child...). So security sw and permission problem can be excluded.
Problem is still there.
I traced snapshot folders for queue name which appeared in error log.
Error happend at 2025-05-15 12:38:28, snapshot folder with file has timestamp  2025-05-15 12:53. So move succeeded at later time. Is there some retry?
It seems to me like some moves are successfull and some not, so it seems like application problem (concurrency, ...).

One thing just for sure:
Before the symantec was removed, on one node in event viewer there were many logs like this:
Event: Tamper Protection Detection
Security risk detected: C:\WINDOWS\TEMP\HANDLE64.EXE

But there are no logs now and problem persists, so I guess it's not related to this problem. 
Does rabbitmq use it? 

Thanks,
Peter

Michal Kuratczyk

unread,
May 15, 2025, 7:50:28 AMMay 15
to rabbitm...@googlegroups.com
Yes, handle.exe is (can be) used by RabbitMQ: https://www.rabbitmq.com/docs/install-windows#handle-exe

The operation performed by RabbitMQ is quite literally an equivalent of `mv checkpoints/foo snapshots/foo`
and the operating system refuses to do that. There's no concurrency, or anything to get wrong really.
It's a call to the operating system, which the operating system refuses to perform.

We regularly hear about strange things happening on Windows, usually involving permissions, and it's almost
always some Windows security magic that decides that RabbitMQ is suspicious because it's creating files
too often or whatever. Based on the exact timing, it may allow some operations and reject some others,
hence some snapshots exist.

Karl Nilsson

unread,
May 15, 2025, 8:34:34 AMMay 15
to rabbitm...@googlegroups.com
windows (IIRC) disallows moving files if there are file handles open which is most likely what is causing this issue. What process has a file handle open to this file is unclear - you'd have to use some window specific tooling to see who has a handle open to this directory/file when RabbitMQ wants to move it.

To be honest, we mostly test on linux. I would recommend, if at all possible, to run RabbitMQ on linux for production.

We can add some code to wait and retry the move operation but there is still no guarantee of success so the error may remain. it is mostly harmless although it may delay segment compaction for the queue a bit.
Cheers
Karl



--
Karl Nilsson
Message has been deleted

Peter Drahoš

unread,
May 16, 2025, 10:14:47 AMMay 16
to rabbitmq-users
I did auditing (windows advanced security settings) on db folder and everything inside and search for all events in around interval where error occured. That means monitor processes that handle folders and files.
There are just 2 processes mentioned: erl.exe and System (PID 4).

If I understand it right, the error occures during some db cleaning and does not affect the functioning of the queue itself? So the error can be ignored?

Migration to linux is quite complicated in our windows positive company.

Thanks,
Peter

Vilius Šumskas

unread,
May 16, 2025, 1:14:50 PMMay 16
to rabbitm...@googlegroups.com

Hi,

 

have you tried removing handle.exe utility from the system completely to see if this fixed the problem?

 

--

    Vilius

Peter Drahoš

unread,
May 20, 2025, 6:58:56 AMMay 20
to rabbitmq-users
Hello,
I removed handle.exe, restarted cluster, nothing changed.

I would like to ask you for an official statement, if version 4.1 with khepri_db is supported on windows and is ready for production.
I'm asking because with version 3.12.6 we had serious cluster problems (sporadic and impossible to simulate), so I see use of current version with khepri_db as the only way.

Thanks,
Peter 

Michael Klishin

unread,
May 20, 2025, 10:26:53 AMMay 20
to rabbitmq-users
"eacces" means the node cannot access one of the files it needs to use. On Windows, this almost certainly means that a security tool of software prevents RabbitMQ from doing it.
This is not "some log noise that can be ignored", this is a serious issue NOT in RabbitMQ but rather in your environment.

Windows is a supported OS but we have never recommended it for production, in part due to such obscure effects from anti-virus and other security tools,
and because monitoring and deployment automation on Linux are much more mature and widely used.
You can use the WSL and containers to run RabbitMQ without any of the Windows-specific complications [2].

Khepri is listed as a stable feature flag in the docs and for RabbitMQ 4.2, it will be (already is in the `main` branch) the default metadata store. That suggests are certain
level of confidence from the core team. That does not, however, suggest that we support or recommend running RabbitMQ with Khepri in environments where nodes would not be able to reliably access their local files
because of security tools, which is an extremely common thing to see amongst the Windows users.

Finally, here is your official statement straight from the MPLv2 license you use RabbitMQ under [1]: the software is provided as is, without a warranty of
any kind, you assume all the risks and complications associated with using free, open source software.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages