AutoIngest conflicting with Tiering plugin?

45 views
Skip to first unread message

Jan de Graaf

unread,
Apr 3, 2024, 9:31:19 AMApr 3
to iRODS-Chat
Hi,

We're setting up iRODS in the NKI and making use of both the Ingestion Plugin as the Tiering Plugin. But they seem to conflict eachother.

When storing data with IPUT the tiering works as espected. 
When doing a basic ingest with the Ingestion framework, tiering works as aspected.
When doing an ingest with eventhandler that adds meta data. The storing tiering stops working. So the tiering plugin will NOT tier the ingested data?

We've seen that when adding custom meta data via the eventhandler of the ingestion framework the meta data of the tiering plugin is not added anymore.
My first try to work around this issue was to just add the required meta data in the eventhandler. But this doesn't work.

Any ideas on how these 2 plugins can work together? Because this step prevents us from going into production ;-)

Best,
Jan de Graaf
NKI

Jan de Graaf

unread,
Apr 3, 2024, 9:36:25 AMApr 3
to iRODS-Chat
Hi,

Some further digging,... Just using an eventhandler with PUT_SYNC already causes the issue.

Best,
Jan

Op woensdag 3 april 2024 om 15:31:19 UTC+2 schreef Jan de Graaf:

Alan King

unread,
Apr 3, 2024, 11:00:14 AMApr 3
to irod...@googlegroups.com
Hi,

I have a few questions about the setup to see if I can help...

What do you mean by "[the] storage tiering stops working"? Do other items which were not ingested via PUT_SYNC continue to tier out/restage as expected after the PUT_SYNC ingest occurs? Or is it just that the PUT_SYNC-ingested items are not tiered out as expected?
Does the storage tiering plugin annotate the access_time metadata to the PUT_SYNC-ingested object(s) as it should? Are the data objects created with replicas on the resources annotated for storage tiering?
What type of storage is the ingest tool scanning? Filesystem or S3?

I'm assuming this is the latest release of the automated ingest tool, but please let us know what version it is as well as the iRODS server version.

Thanks,

Alan

--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/96f84e0e-f772-4210-a062-08b307527866n%40googlegroups.com.


--
Alan King
Senior Software Developer | iRODS Consortium

Jan de Graaf

unread,
Apr 4, 2024, 7:11:29 AMApr 4
to iRODS-Chat
Hi Allan,

Let me explain further
We run iRODS 4.3.1 with the Tiering Plugin (4.3.1) and Ingestion Plugin (0.4.2) installed.
Data is stored in Linux folder/mounts.

Files that are ingested via the ingestion plugin are being picked up by the storage tiering plugin if using NOT the eventhander option.

We have this behaviour (as expexted)

(rodssync) irods@p-irods-003:~/rodssync/bin$ python3 -m irods_capability_automated_ingest.irods_sync start /mntAC0212/Test3 /nkiI maging/home/rods/Test3 --synchronous --progress --ignore_cache
 [Elapsed Time: 0:00:02] |#######################| (Time:  0:00:02) count:      3 tasks: ------ failures: ------ retries: ------
(rodssync) irods@p-irods-003:~/rodssync/bin$ icd Test3
(rodssync) irods@p-irods-003:~/rodssync/bin$ ils -l
/nkiImaging/home/rods/Test3:
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01           47 2024-01-26.12:35 & _metadata.json
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01      8390588 2020-06-02.00:44 & test.tif
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01            0 2024-04-03.11:35 & text1.txt
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01            0 2024-04-03.11:35 & text2.txt
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01            0 2024-04-03.11:35 & text.txt
(rodssync) irods@p-irods-003:~/rodssync/bin$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/Test3/_metadata.json:
attribute: irods::access_time
value: 1712228273
units:
(rodssync) irods@p-irods-003:~/rodssync/bin$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/Test3/_metadata.json:
attribute: irods::access_time
value: 1712228273
units: irods::storage_tiering::migration_scheduled
(rodssync) irods@p-irods-003:~/rodssync/bin$ iqstat
id     name
99284 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["TG_Imaging"]}
99312 {"delay_conditions":"<INST_NAME>irods_rule_engine_plugin-unified_storage_tiering-instance</INST_NAME><EF>60s REPEAT UNTIL S UCCESS OR 5 TIMES</EF><PLUSET>14s</PLUSET>","destination-resource":"RES_IMG_Store_02","group-name":"TG_Imaging","md5":"a393b61025 19495d804bf4820cf0673e","object-path":"/nkiImaging/home/rods/Test3/_metadata.json","preserve-replicas":true,"rule-engine-instance -name":"irods_rule_engine_plugin-unified_storage_tiering-instance","rule-engine-operation":"irods_policy_data_movement","source-r eplica-number":"0","source-resource":"RES_IMG_Store_01","user-name":"rods","user-zone":"nkiImaging","verification-type":"catalog" }

Tiering rules fire, and data is being tiered... no problem so far

Files that are ingested via the ingestion plugin are NOT being picked up by the storage tiering plugin if using the eventhander option.
(we see that no anatations for storage tiering are done when using the ingestion eventhandler)

This where things go wrong....

(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ python3 -m irods_capability_automated_ingest.irods_sync start /mntAC0212/Test3 /nkiImaging/home/rods/Test3 --synchronous --progress --ignore_cache --event_handler ingestOnly.py
 [Elapsed Time: 0:00:09] |########################| (Time:  0:00:09) count:      3 tasks: ------ failures: ------ retries: ------
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ icd Test3
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ ils -l
/nkiImaging/home/rods/Test3:
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01           47 2024-04-04.11:03 & _metadata.json
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01      8390588 2024-04-04.11:03 & test.tif
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01            0 2024-04-04.11:03 & text1.txt
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01            0 2024-04-04.11:03 & text2.txt
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01            0 2024-04-04.11:03 & text.txt
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/Test3/_metadata.json:
None
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ iqstat
id     name
99284 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["TG_Imaging"]}
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ iqstat
id     name
99284 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["TG_Imaging"]}
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/Test3/_metadata.json:
None
(rodssync) irods@p-irods-003:~/scripts/auto_ingest$ cat ingestOnly.py
from irods_capability_automated_ingest.core import Core
from irods_capability_automated_ingest.utils import Operation
from irods.meta import iRODSMeta

class event_handler(Core):

    @staticmethod
    def operation(session, meta, **options):
        return Operation.PUT_SYNC

(rodssync) irods@p-irods-003:~/scripts/auto_ingest$

Files that are stored via the IPUT command are picked up by the storage tiering plugin as expected.

(rodssync) irods@p-irods-003:/mntAC0212/Test3$ iput _metadata.json
(rodssync) irods@p-irods-003:/mntAC0212/Test3$ ils -l
/nkiImaging/home/rods:
  rods              0 RES_IMG_Store_01;RES_IMG_Store_PT01;IMG_Store_01           47 2024-04-04.11:06 & _metadata.json
  C- /nkiImaging/home/rods/test4
(rodssync) irods@p-irods-003:/mntAC0212/Test3$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/_metadata.json:
attribute: irods::access_time
value: 1712228799
units:
(rodssync) irods@p-irods-003:/mntAC0212/Test3$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/_metadata.json:
attribute: irods::access_time
value: 1712228799
units:
(rodssync) irods@p-irods-003:/mntAC0212/Test3$ imeta ls -d _metadata.json
AVUs defined for dataObj /nkiImaging/home/rods/_metadata.json:
attribute: irods::access_time
value: 1712228799
units: irods::storage_tiering::migration_scheduled
(rodssync) irods@p-irods-003:/mntAC0212/Test3$ iqstat
id     name
99284 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["TG_Imaging"]}
99338 {"delay_conditions":"<INST_NAME>irods_rule_engine_plugin-unified_storage_tiering-instance</INST_NAME><EF>60s REPEAT UNTIL SUCCESS OR 5 TIMES</EF><PLUSET>18s</PLUSET>","destination-resource":"RES_IMG_Store_02","group-name":"TG_Imaging","md5":"820832b4936aa24b18fb31174922358b","object-path":"/nkiImaging/home/rods/_metadata.json","preserve-replicas":true,"rule-engine-instance-name":"irods_rule_engine_plugin-unified_storage_tiering-instance","rule-engine-operation":"irods_policy_data_movement","source-replica-number":"0","source-resource":"RES_IMG_Store_01","user-name":"rods","user-zone":"nkiImaging","verification-type":"catalog"}

Does this help to explain the behaviour we see?

It seems that using the event handler  / PUT is causing the issue this prevents a hook of the ingestion plugin from firing it seems? Don't know exactly how and on what the tiering plugin fieres for adding the meta data and triggering the rules for tiering?


Best,
Jan de Graaf

Op woensdag 3 april 2024 om 17:00:14 UTC+2 schreef Alan King:

Alan King

unread,
Apr 4, 2024, 11:15:27 AMApr 4
to irod...@googlegroups.com
Excellent, thanks for the detailed write-up. This does shed more light on the situation.

It seems that the access_time metadata is indeed not being applied when PUT_SYNC is used despite the fact that it's going to the right resource. I don't understand why this wouldn't work because it's using APIs which are covered by the storage tiering plugin. So, it's not really clear to me what's going on.

Can you try uploading the data using a simple python-irodsclient data_objects.put() call? That is what PUT_SYNC is supposed to be using under the hood, so I'd like to understand whether this is a problem in ingest or python-irodsclient. It might also be worth trying `istream` to write some data to the resource as well. It uses similar API endpoints as PRC's put operation.

Finally, may I ask whether there are any other rule engine plugins in place with implementations for dynamic API PEPs such as pep_api_data_obj_open_post/pep_api_data_obj_close_post? Just trying to see if there are any other "rules" at play here...


Jan de Graaf

unread,
Apr 4, 2024, 11:44:16 AMApr 4
to iRODS-Chat
What I also tried to see if a workaround for this issue, was to add the needed access_time meta data during with the eventhandler. But despite the meta data was now (manually) added (via the eventhandler script) the tiering rules did not want to fire.

I also tried an manual remove of the tiering rule via iqdel and then re-added the rule via the irule command, but als then whilst the meta is pressent on the file, the rule doesn't fire. 

I will see if a can do  data_objects.put() with the python client.

Further no other rules are in play. (we do have them, but I disabled all other rules to see if these were causing the issue)



Op donderdag 4 april 2024 om 17:15:27 UTC+2 schreef Alan King:

Alan King

unread,
Apr 4, 2024, 5:19:04 PMApr 4
to irod...@googlegroups.com
I didn't ask this before, but I was assuming that you are using storage tiering's default violating query which is time-based and checks the access_time metadata. Is that correct?

The fact that it does not annotate automatically is concerning as this has worked for as long as the plugin and the client have existed... I will try to reproduce this and write back with what I see.

If the PRC put() does the same thing, then this may be a problem with PRC, not ingest. In that case, we can try rolling back the PRC version since the ingest tool is compatible with older versions.

Jan de Graaf

unread,
Apr 16, 2024, 6:09:50 AMApr 16
to iRODS-Chat
Hi,

Sorry for the delay. Bit bussy with things (datamanagement in particular ;-) )

But yes. tiering plug in runs with default settings.

I wrote that adding the tiering meta data manualy via the ingestion eventhandler did not work. But it does. So I do have a workaround for now. During the ingestion I add the required meta data for the tiering plugin (it just takes a while before the tiering plugin pickes up the data for tiering #BePatient.... ;-) )

I will try the PRC put this week to see if the issue also is there.

In the mean time i just encountered another error related to the tiering plugin. One in a while the data movement fails and a stacktrace will be produced. It seems that an internal query is not correct. A trailing comma is pressent that should not be there for the DATA_RESC_ID selection in the where part.

 {"log_category":"legacy","log_level":"error","log_message":"data movement scheduling failed - [-1107000]::[iRODS Exception:\n    file: /irods_source/lib/core/include/irods/irods_query.hpp\n    function: irods::query<RcComm>::gen_query_impl::gen_query_impl(connection_type *, int, int, const std::string &, const std::string &, int) [connection_type = RcComm]\n    line: 176\n    code: -1107000 (NO_COLUMN_NAME_FOUND)\n    message:\n        query fill failed for [SELECT RESC_ID WHERE DATA_NAME = 'selection_prefs.py' AND COLL_NAME = '/nkiImaging/home/BioImaging/2024/Backup_WKS0128/BIF/jg_bif/get_mrxs_offsets/venv/Lib/site-packages/pip/_internal/models' AND DATA_RESC_ID IN ('35077','78840','78841',)]\nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# irods::query_processor<RcComm>::execute(irods::thread_pool&, RcComm&)::'lambda'()::operator()() in /usr/lib/irods/plugins/rule_engines/libirods_rule_engine_plugin-unified_storage_tiering.so\n 4# boost::asio::detail::executor_op<boost::asio::detail::binder0<irods::query_processor<RcComm>::execute(irods::thread_pool&, RcComm&)::'lambda'()>, std::__1::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) in /usr/lib/irods/plugins/rule_engines/libirods_rule_engine_plugin-unified_storage_tiering.so\n 5# boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) in /lib/libirods_server.so.4.3.1\n 6# boost::asio::detail::scheduler::run(boost::system::error_code&) in /lib/libirods_server.so.4.3.1\n 7# boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() in /lib/libirods_server.so.4.3.1\n 8# boost_asio_detail_posix_thread_function in /lib/libirods_server.so.4.3.1\n 9# 0x00007F52C097F609 in /lib/x86_64-linux-gnu/libpthread.so.0\n10# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"172.31.32.83","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"p-irods-001","server_pid":343542,"server_timestamp":"2024-04-16T09:56:00.325Z","server_type":"agent","server_zone":"nkiImaging"}
 {"log_category":"legacy","log_level":"error","log_message":"data movement scheduling failed - [-1107000]::[iRODS Exception:\n    file: /irods_source/lib/core/include/irods/irods_query.hpp\n    function: irods::query<RcComm>::gen_query_impl::gen_query_impl(connection_type *, int, int, const std::string &, const std::string &, int) [connection_type = RcComm]\n    line: 176\n    code: -1107000 (NO_COLUMN_NAME_FOUND)\n    message:\n        query fill failed for
 
 [SELECT RESC_ID WHERE DATA_NAME = 'selection_prefs.py' AND COLL_NAME = '/nkiImaging/home/BioImaging/2024/Backup_WKS0128/BIF/rh_bif/get_mrxs_offsets/venv/Lib/site-packages/pip/_internal/models' AND DATA_RESC_ID IN ('35077','78840','78841',)]
 
 \nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# irods::query_processor<RcComm>::execute(irods::thread_pool&, RcComm&)::'lambda'()::operator()() in /usr/lib/irods/plugins/rule_engines/libirods_rule_engine_plugin-unified_storage_tiering.so\n 4# boost::asio::detail::executor_op<boost::asio::detail::binder0<irods::query_processor<RcComm>::execute(irods::thread_pool&, RcComm&)::'lambda'()>, std::__1::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) in /usr/lib/irods/plugins/rule_engines/libirods_rule_engine_plugin-unified_storage_tiering.so\n 5# boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) in /lib/libirods_server.so.4.3.1\n 6# boost::asio::detail::scheduler::run(boost::system::error_code&) in /lib/libirods_server.so.4.3.1\n 7# boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() in /lib/libirods_server.so.4.3.1\n 8# boost_asio_detail_posix_thread_function in /lib/libirods_server.so.4.3.1\n 9# 0x00007F52C097F609 in /lib/x86_64-linux-gnu/libpthread.so.0\n10# clone in /lib/x86_64-linux-gnu/libc.so.6\n\n]","request_api_name":"EXEC_RULE_EXPRESSION_AN","request_api_number":1206,"request_api_version":"d","request_client_user":"rods","request_host":"172.31.32.83","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"p-irods-001","server_pid":343542,"server_timestamp":"2024-04-16T09:56:00.327Z","server_type":"agent","server_zone":"nkiImaging"}
 {"log_category":"legacy","log_level":"error","log_message":"iRODS Exception:\n    file: /irods_plugin_source/storage_tiering.cpp\n    function: void irods::storage_tiering::migrate_violating_data_objects(rcComm_t *, const std::string &, const std::string &, const std::string &, const std::string &)\n    line: 674\n    code: -35000 (SYS_INVALID_OPR_TYPE)\n    message:\n        scheduling failed for [2] objects for query [
 
 SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < '1713260905' AND META_DATA_ATTR_UNITS <> 'irods::storage_tiering::migration_scheduled' AND DATA_RESC_ID IN ('35069',)]
 
 \nstack trace:\n--------------\n 0# irods::stacktrace::dump() const in /lib/libirods_common.so.4.3.1\n 1# irods::exception::assemble_full_display_what() const in /lib/libirods_common.so.4.3.1\n 2# irods::exception::what() const in /lib/libirods_common.so.4.3.1\n 3# irods::log(irods::exception const&) in /lib/libirods_common.so.4.3.1\n 4# 

Op donderdag 4 april 2024 om 23:19:04 UTC+2 schreef Alan King:

Terrell Russell

unread,
Apr 16, 2024, 7:32:57 AMApr 16
to irod...@googlegroups.com
Hi Jan,

Unfortunately, in this particular case, you happen to have hit this bug...

It's the substring 'select' rather than the trailing comma.  The trailing comma is legal/allowed.  And the bug is in the GenQuery parser, not the tiering plugin.

Terrell



Jan de Graaf

unread,
Apr 16, 2024, 9:31:07 AMApr 16
to iRODS-Chat
Okay. Good to know.
Will this cause the tiering plugin not (completly) working? Or will files that will be missed due to this error will be resumed/retried in a later run of the plugin?

Op dinsdag 16 april 2024 om 13:32:57 UTC+2 schreef Terrell Russell:

Alan King

unread,
Apr 17, 2024, 6:31:12 PMApr 17
to irod...@googlegroups.com
If the violating query is run on a recurring basis, I think it will get picked up on the next run once the issue is resolved, yes. Our plan to fix the issue for GenQuery is to replace its usage with another parser which we are calling GenQuery2. Unfortunately, this will only be an option starting in the next server version (4.3.2) as we don't want to tether the storage tiering plugin to the external GenQuery2 API.

You may be able to avoid this problem by adding a specific query to your zone which does the same thing as the GenQuery, but avoids the faulty parser. Let us know if you want help with that.

Will await results on the PRC put. No rush. :)

Alan King

unread,
May 15, 2024, 9:53:31 AMMay 15
to iRODS-Chat
I finally got around to trying this out and I was not able to reproduce this issue. I think this one may just be the apostrophe problem.

For completeness, I'll document what I did here:

I used 3 basic unixfilesystem resources in a single tier group:
tier0_A:unixfilesystem
tier1_A:unixfilesystem
tier2:unixfilesystem

Here's the event handler I used:

from irods_capability_automated_ingest.core import Core
from irods_capability_automated_ingest.utils import Operation

class event_handler(Core):
    @staticmethod
    def operation(session, meta, **options):
        return Operation.PUT_SYNC

    @staticmethod
    def to_resource(session, meta, **options):
        return "tier0_A"


The AVUs were annotated after ingesting the data and the tier-out occurs as expected. I was using iRODS 4.3.2, storage tiering 4.3.2.0, and automated ingest 0.4.2 (with its default dependencies, including python-irodsclient 1.1.9).
Reply all
Reply to author
Forward
0 new messages