irods_capability_indexing not working with irods 5.0.1

20 views
Skip to first unread message

Laura Lo Gerfo

unread,
Nov 25, 2025, 11:00:56 AM (13 days ago) Nov 25
to iRODS-Chat

Dear iRODS Consortium team,

I am writing to request assistance regarding several issues we are encountering with the iRODS Indexing capability in our environment. We are currently using iRODS 5.0.1 in a Dockerized setup and we are following the instructions provided in the official documentation at https://docs.irods.org/5.0.1/capabilities/indexing/.

1.Document-type rule engine plugin: in the 5.0.1 documentation, the following rule engine plugin is still referenced in the server configuration:

{"instance_name": "irods_rule_engine_plugin-document_type-instance","plugin_name": "irods_rule_engine_plugin-document_type","plugin_specific_configuration": {} }

However, in the March 2024 development update (https://irods.org/2024/03/irods-development-update-march-2024/), it is stated that:

"the document-type rule engine plugin is no longer provided by the Indexing capability plugin and as a result, you'll need to remove the document-type rule engine plugin from your server_config.json."

This suggests that the plugin has been deprecated or removed, but the 5.0.1 documentation still includes it.
Could you please clarify whether the document-type plugin should or should not be used with iRODS 5.0.1?

2. Mapping file: in the irods_capability_indexing GitHub repository, the es_mapping.json file cannot be applied using the documented curl commands.
Elasticsearch returns an error unless I manually remove the top-level "mappings" keyword and keep only the "properties" section.

 3. Indexing not working: following the documentation, I created and tagged a collection:

imkdir indexme
imeta add -C indexme irods::indexing::index metadata::metadata elasticsearch

Although the queue seems to process tasks (iqstat  seems to work), Elasticsearch returns zero hits for every search query.
I am not able to get it to work.

This happens with both Elasticsearch 7.17.24 and Elasticsearch 8.12.2 versions.

Could you please confirm which Elasticsearch versions are officially supported to work with iRODS 5.0.1?

4. Logs: to understand the problem, can you suggest how to  investigate the issue with logs or something else? Elastic and inserting an index via curl work correctly.

Thanks in advance,

Best regards,
Laura

Alan King

unread,
Nov 25, 2025, 2:14:46 PM (13 days ago) Nov 25
to irod...@googlegroups.com
Hi Laura,

Apologies about the documentation there. It looks like most if not all of the things on that page are out of date, so we will need to update it significantly. In the meantime, the advanced training slides should have more up-to-date instructions to demonstrate using the Indexing plugins: https://slides.com/irods/ugm-2025-indexing In addition to the Advanced Training slides, the README on the GitHub repo for the Indexing Capability should be a good resource as well: https://github.com/irods/irods_capability_indexing/tree/5.0.1

Take a look at those and see if you have a better time.

I will also try to address each concern specifically so that you can get past these hurdles:

Could you please clarify whether the document-type plugin should or should not be used with iRODS 5.0.1?

Correct. The document-type plugin should not be used for any version of the Indexing plugin with iRODS 5.0.1 (or any version of the plugin after 4.3.1.0). We need to update that documentation page to reflect this. We have an issue to update that page specifically regarding the document-type plugin here: https://github.com/irods/irods_docs/issues/343

(For reference, here is the commit where the document-type mentions in the Indexing Capability's GitHub README document were removed: https://github.com/irods/irods_capability_indexing/commit/fbe8c662c045b3742f23536361490b7d35bbba78)

2. Mapping file: in the irods_capability_indexing GitHub repository, the es_mapping.json file cannot be applied using the documented curl commands.
Elasticsearch returns an error unless I manually remove the top-level "mappings" keyword and keep only the "properties" section. 

We use a similar JSON structure in our Advanced Training and it seems to work with 8.12.1. See: https://slides.com/irods/ugm-2025-indexing#/11. We might need to dig into this one a little bit more to find the cause of the failure.

Could you please confirm which Elasticsearch versions are officially supported to work with iRODS 5.0.1?

We don't have an official/documented/tested compatibility matrix with Elasticsearch versions, but the Indexing and Elasticsearch plugins for version 5.0.1 were tested against Elasticsearch 8.12.2https://github.com/irods/irods_capability_indexing/tree/5.0.1?tab=readme-ov-file#plugin-testing. The version of Elasticsearch we used in the most recent Advanced Training for Indexing was 8.12.1: https://slides.com/irods/ugm-2025-indexing#/9

4. Logs: to understand the problem, can you suggest how to  investigate the issue with logs or something else? Elastic and inserting an index via curl work correctly.

Any logs which the Indexing Capability's rule engine plugins emit should appear in the iRODS server log alongside other messages. It looks like it is using the "legacy" log category, unfortunately, but if it has any logs to emit whenever these errors occur, they should appear in the iRODS server log. If you're using rsyslog with the rsyslog configuration shown here (https://docs.irods.org/5.0.1/system_overview/server_log/#rsyslog-configuration), you should be able to find the iRODS server log in /var/log/irods/irods.log. Finding/Reading the Elasticsearch logs will depend on how Elasticsearch has been configured and deployed.

Hopefully that helps. We will make sure that the Indexing Capability documentation is updated in the next release. Thanks for pointing that out.

Alan


--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/6114b009-cd6a-40c7-85a8-adcadd44a6e9n%40googlegroups.com.


--
Alan King
Senior Software Developer | iRODS Consortium

Laura Lo Gerfo

unread,
Dec 2, 2025, 3:08:30 PM (6 days ago) Dec 2
to irod...@googlegroups.com

Hi Alan,

 

Thank you for your instructions. I followed the slides you provided (https://slides.com/irods/ugm-2025-indexing), but I am still experiencing the same issues with the Elasticsearch indexing capability.

 

My setup:

-iRODS 5.0.1 running in a Docker container (Ubuntu 24.04) with the following plugins installed: irods-rule-engine-plugin-indexing, irods-rule-engine-plugin-elasticsearch

-Elasticsearch 8.12.2 running in a separate Docker container ( I tried with 8.12.1 too)

-A CLI container authenticated as the rods user, from which we execute curl calls to initialize indices (the Elasticsearch indices full_text_index and metadata_index have been created successfully with the proper mappings). I can now use the JSON mapping file as specified in the slides.

 

Both plugins are configured in server_config.json:

 

{

    "instance_name": "irods_rule_engine_plugin-indexing-instance",

    "plugin_name": "irods_rule_engine_plugin-indexing",

    "plugin_specific_configuration": {

        "job_limit_per_collection_indexing_operation": 1000,

        "maximum_delay_time": 30,

        "minimum_delay_time": 1

    }

},

{

    "instance_name": "irods_rule_engine_plugin-elasticsearch-instance",

    "plugin_name": "irods_rule_engine_plugin-elasticsearch",

    "plugin_specific_configuration": {

        "bulk_count": 100,

        "hosts": [

            http://elasticsearch:9200

        ],

        "read_size": 4194304

    }

},

 

 

After the Docker containers are up, I used the CLI to perform the following:

-imkdir indexed_collection

-imeta set -C indexed_collection irods::indexing::index full_text_index::full_text elasticsearch

-iput -r ./books indexed_collection/books0

 

The AVU metadata is correctly applied to the collection, and iqstat shows that 104 delayed jobs have been queued. However, when querying Elasticsearch, the hits array remains empty—no documents appear to have been indexed.

I don’t see any errors in legacy category logs.

Is there a way to manually trigger the processing of the delayed queue? It seems the jobs remain pending and the documents are not being indexed even after waiting.

 

Best regards,

Laura

Alan King

unread,
Dec 2, 2025, 4:14:05 PM (6 days ago) Dec 2
to irod...@googlegroups.com
Hi Laura,

It seems the jobs remain pending and the documents are not being indexed even after waiting.

This makes me wonder whether the delay server is running and properly executing rules. Can you confirm that irodsDelayServer appears in the list of processes alongside the main irodsServer process? Here's what mine looks like (iRODS 5.0.90, Ubuntu 24.04):
$ ps aux | grep "irods.*Server"
irods        742  0.3  0.0  85656 37820 ?        S    21:02   0:00 /usr/sbin/irodsServer -d
irods        757  0.0  0.0 167076 35072 ?        Sl   21:02   0:00 /usr/sbin/irodsDelayServer irods_hostname_cache_742_1764709366 irods_dns_cache_742_1764709366
irods       1114  0.0  0.0   4092  2048 pts/0    S+   21:06   0:00 grep irods.*Server


If that looks good, please try manually running a delay rule and see if that machinery is working. I've included a minimal example to try following the message (so as to not clog it up). Once we've confirmed that delay rules in general are working, we can assess what might be going on with indexing.

It may be helpful to share a sample of one of the indexing jobs to see what parameters are being used, just in case something has gone awry. Please be sure to exclude any sensitive information, if applicable.

Alan

---

Here's a minimal example to try:
$ irule -r irods_rule_engine_plugin-irods_rule_language-instance 'delay("<PLUSET>1s</PLUSET><INST_NAME>irods_rule_engine_plugin-irods_rule_language-instance</INST_NAME>") { writeLine("serverLog", "Delayed Execution"); }' null ruleExecOut
$ iqstat
id     name
10015  writeLine("serverLog", "Delayed Execution");  


After a little while, the rule no longer appeared in the iqstat output and this message appeared in my server log:
{
  "log_category": "legacy",
  "log_level": "info",
  "log_message": "writeLine: inString = Delayed Execution\n",
  "request_api_name": "EXEC_RULE_EXPRESSION_AN",
  "request_api_number": 1206,
  "request_api_version": "d",
  "request_client_user": "rods",
  "request_host": "172.18.0.3",
  "request_proxy_user": "rods",
  "request_release_version": "rods5.0.90",
  "server_host": "68e38719da00",
  "server_pid": 859,
  "server_timestamp": "2025-12-02T21:03:53.267Z",
  "server_type": "agent",
  "server_zone": "tempZone"
}

Laura Lo Gerfo

unread,
Dec 3, 2025, 10:20:23 AM (5 days ago) Dec 3
to irod...@googlegroups.com

Hi Alan,

ps aux | grep "irods.*Server" shows only the irodsServer process, but no irodsDelayServer process.

 

Delayed rule remain indefinitely in the queue and are never executed :


$ iqstat
id     name
10015  writeLine("serverLog", "Delayed Execution");  

 

Therefore, no delay server logs appear in the log files.

 

For additional context, the server_config.json and server_config section (inside the irods provider Docker container) contains the following settings:

 

"delay_rule_executors": []
"delay_server_sleep_time_in_seconds": 30,

"maximum_size_of_delay_queue_in_bytes": 0,

"migrate_delay_server_sleep_time_in_seconds": 5,

"number_of_concurrent_delay_rule_executors": 4,



This suggests irodsServer isn't launching irodsDelayServer as a child process automatically, which should happen during normal startup for Irods 5.0.1, correct?.
We use an unattended installation to install iRODS in our container.

Alan King

unread,
Dec 3, 2025, 11:20:42 AM (5 days ago) Dec 3
to irod...@googlegroups.com
Okay, this is good - we've identified the main problem, I think.

This suggests irodsServer isn't launching irodsDelayServer as a child process automatically, which should happen during normal startup for Irods 5.0.1, correct?.

That's correct - the delay server is supposed to launch automatically. We need to make sure that the delay server "leader" hostname can resolve to the hostname that the iRODS server thinks it is using.

Please run iadmin get_delay_server_info and compare it to the value of the "host" key in the server_config.json. Here's what mine looks like:
$ iadmin get_delay_server_info
{
    "leader": "8f8942654d85",
    "successor": ""
}
$ cat /etc/irods/server_config.json | grep '"host":' --context 1
    "graceful_shutdown_timeout_in_seconds": 30,
    "host": "8f8942654d85",
    "host_access_control": {
--
        "database": {
            "host": "49ddb4d63f02",
            "name": "ICAT",


(The database "host" is not relevant, so I included a line of context in the grep output and bolded the relevant lines for clarity.)

If these do not match, please modify one or the other so that they match. You may also consider implementing a host_resolution stanza in the server_config.json to map hostnames, as described here: https://docs.irods.org/5.0.1/system_overview/configuration/#host-resolution Once done, restart the iRODS server to see whether the delay server spawns.

Once this is resolved, we need to make sure that the unattended install JSON input accounts for the ever-shifting hostnames in containers when setting up the iRODS server. You might consider doing something like we have for the irods_demo project: https://github.com/irods/irods_demo/blob/ef0ea55fd7e0052ff0e9cc37b10710c01c51cac8/irods_catalog_provider/unattended_install.json#L84-L91 "THE_HOSTNAME" is replaced later as part of starting the container the first time. There are of course other approaches, but the basic idea remains the same.

Alan

Laura Lo Gerfo

unread,
Dec 3, 2025, 3:26:21 PM (5 days ago) Dec 3
to irod...@googlegroups.com

Hi Alan,

the hostnames do match in our environment:

 

$ cat /etc/irods/server_config.json | grep '"host":' --context 1

    "graceful_shutdown_timeout_in_seconds": 30,

    "host": "irodscp-dev1.iit.local",

    "host_access_control": {

--

        "database": {

            "host": "irods-catalog",

            "name": "ICAT",


and:

$ iadmin get_delay_server_info

 

{

    "leader": "irodscp-dev1.iit.local",

    "successor": ""

}

Are there any other configuration aspects or logs we should examine to troubleshoot irodsDelayServer?

Laura

Kory Draughn

unread,
Dec 3, 2025, 3:50:43 PM (5 days ago) Dec 3
to irod...@googlegroups.com
Hi Laura,

Try increasing the log level for the main server process to see what it is checking for in regard to the delay server.

To do that, update server_config.json's "server" entry like so:

  "log_level": {
      ...
      "server": "debug",
      ...
  },


You can also try bumping the "delay_server" log level to see if anything appears in the log for it.

Next, reload configuration by executing the following as the service account:

  kill -HUP $(cat /var/run/irods/irods-server.pid)

That should result in the server logging messages like the following (the output below is filtered by jq so it's easier to read):

  "apply_access_time_updates: Number of access time updates before deduplication is [0]."
  "apply_access_time_updates: Checking if Agent Factory is ready to accept client requests before processing access time updates."
  "is_server_listening_for_connections: Connecting to (host, port) = ([<host>], [<port>])"
  "is_server_listening_for_connections: Connected to server. Sending HEARTBEAT message."
  "is_server_listening_for_connections: Reading response from server."
  "is_server_listening_for_connections: Received [HEARTBEAT] from server."

The goal of this is to see which host the main server process is looking for.

Can you post your server_config.json? Please mask sensitive info before posting.

Kory Draughn
Chief Technologist
iRODS Consortium


Reply all
Reply to author
Forward
0 new messages