Proper way to ensure every newly created replica has a checksum

37 views
Skip to first unread message

tedgin...@gmail.com

unread,
May 6, 2022, 5:38:24 PM5/6/22
to iRODS-Chat
Hi everyone.

We require every data object replica to have a checksum. We are currently using iRODS 4.2.8 and have convoluted rule logic attached to the dynamic PEPs pep_database_mod_data_obj_meta_post and pep_database_reg_data_obj_post. I'm not sharing the logic, because it is too convoluted and embarrassing. 

We are preparing to upgrade to 4.2.11. During preliminary testing, our 4.2.8 checksum rule logic doesn't work on 4.2.11. A replica corresponding to a newly created data object does not receive a checksum.

Instead of hacking at our current convoluted mess until it works (think monkey at a typewriter in my case), I figured that someone in the iRODS community knows of a clean way to ensure that every newly created replica gets a checksum no matter what caused the replica to be created and to ensure that when a replica is modified its checksum is updated. Does anyone have a solution for this that works in 4.2.11?

Cheers,
Tony

jc...@sanger.ac.uk

unread,
May 10, 2022, 5:30:26 AM5/10/22
to iRODS-Chat
Hi Tony!

I can only share our partial approach with the caveats that;

1. its for 4.2.7, but I know some rules were being deprecated, so if you've not looked at those they may help.
2. its not as exhaustive as your list (which, is now giving me pause so... thanks? :-) )

#Replacement of acPostProcForPut with equivalent PEP, and msiSysChksumDataObj deprecated in favour of msiDataObjChksum
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
    msiDataObjChksum(*DATAOBJINP.obj_path, "ChksumAll=", *checksumOut);
}

This notably doesn't address irsync, and probably some others...

I wonder if using  variants of

acPostProcForParallelTransferReceived
acPostProcForDataCopyReceived

which we use to trigger free space updates as per the manual, would be a good idea, but I haven't tried it?

acPostProcForDataCopyReceived(*leaf_resource) {
    msi_update_unixfilesystem_resource_free_space(*leaf_resource);
}

I'd be interested to hear how you get on!

Tony Edgin

unread,
May 10, 2022, 11:34:31 PM5/10/22
to irod...@googlegroups.com
Hi John.

Thanks for your suggestions. I started out using pep_api_data_obj_put_post, and like you pointed out it doesn't cover all the ways a replica can be created.  From what I recall, there are on the order of 10 API calls that look like that could create a replica. I wasn't sure, since they aren't documented.  That meant that I would have to attach rules to 10 API dynamic PEPs. The database dynamic PEPs seemed more promising. Through experimentation, I found that the two I mentioned above would handle all cases. I'll get over my embarrassment and share a reduced and cleaned up form of the rules that work in iRODS 4.2.8.

pep_database_mod_data_obj_meta_post(*INSTANCE, *CONTEXT, *OUT, *DATA_OBJ_INFO, *REG_PARAM) {
  # If the data size was set in the catalog, then the final size of the replica is known
  # and the replica is in storage, so we can compute its checksum.
  if (errorcode(*REG_PARAM.dataSize) == 0) {
    # Don't checksum a replica twice.q
    *logicalPath = _ipc_getObjPath(*DATA_OBJ_INFO);
    *pathVar = _ipc_mkDataObjSessVar(*logicalPath);
    if (
      if errorcode(temporaryStorage.'*pathVar') == 0
      then temporaryStorage.'*pathVar' != 'HAS CHECKSUM'
      else true
    ) {
      _ipc_chksumRepl(*logicalPath, int(*REG_PARAM.replica_number));
    }
  }
}

pep_database_reg_data_obj_post(*INSTANCE, *CONTEXT, *OUT, *DATA_OBJ_INFO) {
  # When a data object is created due to file registration, the size of the file is known.
  # Otherwise, the size isn't known yet, and iRODS will report a size of zero. Compute the
  # checksum if the size is greater than zero.
  if (int(*DATA_OBJ_INFO.data_size) > 0) {
    *logicalPath = _ipc_getObjPath(*DATA_OBJ_INFO);
    _ipc_chksumRepl(*logicalPath, int(*DATA_OBJ_INFO.replica_number));
    *pathVar = _ipc_mkDataObjSessVar(*logicalPath);
    temporaryStorage.'*pathVar' = 'HAS CHECKSUM';
  }
}

For iRODS 4.2.11, when a file is transferred through streaming (e.g. iput -N0), pep_database_mod_data_obj_meta_post is not called. Maybe this is a bug. The ICAT entry is created, and it's size is updated in two separate SQL statements.

Cheers,
Tony


--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to a topic in the Google Groups "iRODS-Chat" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/irod-chat/qyE0HmCni40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/1a7b5d20-a8f1-4973-9ade-5af0b383b4b3n%40googlegroups.com.

tedgin...@gmail.com

unread,
May 11, 2022, 3:47:35 PM5/11/22
to iRODS-Chat
I think this is a regression bug in iRODS, so I created a bug report. Please see https://github.com/irods/irods/issues/6385.
Reply all
Reply to author
Forward
0 new messages