Checksum errors filling up consumer server that does not host the resource

22 views
Skip to first unread message

Francisco Morales

unread,
Jul 2, 2024, 10:44:45 AM7/2/24
to iRODS-Chat
Hi, 

We have a situation where a checksum failed operation in the provider is filling the logs of a consumer that does not host the resource from which the checksum computation is failing.

Is there a way to disable such kind of logging behavior?, ie, disable the logs that are not related/generated in a consumer that is not directly participating in the failing operation.

Best regards, 
Francisco

Alan King

unread,
Jul 2, 2024, 11:21:21 AM7/2/24
to irod...@googlegroups.com
Hi Francisco,

I'm a little confused as to how/why the logs are appearing in the consumer if it is not at all involved.

Do you know what is triggering the checksums? Is there a client connecting to the consumer in question to carry out operations which may be triggering the checksums?

Alan

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/be30370a-7497-47da-8be6-8e97a2bbdda5n%40googlegroups.com.


--
Alan King
Senior Software Developer | iRODS Consortium

Francisco Morales

unread,
Jul 5, 2024, 8:26:13 AM7/5/24
to iRODS-Chat
Hi Alan, 

I guess I found the issue. There are several consumers/resource servers connected to the provider server. We executed an irods rule requesting the computation of all the missing checksums from one particular consumer server, expecting that only the operations related to this consumer 
and the data it hosts were logged. Some other consumer server is not able to read the data from the resource it has attached to it, and instead of logging the errors locally it seems to be sending the checksum computation errors to the consumer server that launched the checksum computation rule. Does this sound like expected behavior? If so, how could we avoid this beahvior?

Best regards, 
Francisco

Alan King

unread,
Jul 8, 2024, 10:54:29 AM7/8/24
to irod...@googlegroups.com
Well, we do not typically "send" log messages between servers (that is, instead of logging them in the local server log). Servers perform operations and if there are issues, these are recorded in the log local to that server. If there are no logs appearing on a server on which errors occur, that could be an issue with the particular API at play, but every API should be logging errors so that we can debug issues. In other words, that is not the expected behavior.

Just so I make sure I'm understanding things right, here's what I would expect to happen. Please comment saying whether you agree or disagree:

1. The detect / fix missing checksums job is launched on server A.
2. A missing checksum is detected for some replica in a resource attached to another server B.
3. The checksum job - via a microservice or otherwise - attempts to compute a checksum for that replica from server A. This necessarily would invoke an API in the server.
4. The checksum API needs to read the data to compute the checksum, so the request is redirected from the API being executed in server A to the storage hosting the data in server B.
5. The read fails due to the server being unable to access the storage. *This results in errors being recorded in the log on server B.*
6. The call returns to server A, with failure. As is typical of all APIs in iRODS, the calling API will log those errors in the log for the local server (in this case, server A).
7. If the checksum request was made through a microservice call in a rule, additional errors could be recorded in server A's log as it processes the errors on its way back up the call stack.

I think the step which is not happening in your setup is on step 5 (I've bolded it and put *asterisks* around it for emphasis). Additionally, the log messages which you would have expected to see in step 5 on server B are actually appearing in the logs for server A instead.

Does that all seem correct?

Thanks for your patience in my trying to understand the issue :) We will try to reproduce this on our end once confirmed that we have a handle on the problem.

Francisco Morales

unread,
Jul 8, 2024, 11:08:54 AM7/8/24
to iRODS-Chat
Hi Alan, 

The steps you describe are much more advanced than what I can assess myself. Basically, what is happening is:

1. Consumer server A executes an iRODS rule to compute all missing checksums in the zone
2. Files not accessible for the S3 resource (cacheless detached) on server B appear as failing checksum operations on server A

If I understand correctly, the failing operations indicate that server A is trying to open the files directly from the S3 bucket but it fails to do so; although, I would have expected that the resource server B was the one trying to open the files and compute the checksum.

Best regards, 
Francisco

Terrell Russell

unread,
Jul 8, 2024, 11:29:22 AM7/8/24
to irod...@googlegroups.com
Hi Francisco,

If it is truly 'cacheless detached', then serverA very well may be contacting the S3 bucket directly.

Terrell



Reply all
Reply to author
Forward
0 new messages