Dear iRODS Consortium team,
I am writing to request assistance regarding several issues we are encountering with the iRODS Indexing capability in our environment. We are currently using iRODS 5.0.1 in a Dockerized setup and we are following the instructions provided in the official documentation at https://docs.irods.org/5.0.1/capabilities/indexing/.
1.Document-type rule engine plugin: in the 5.0.1 documentation, the following rule engine plugin is still referenced in the server configuration:
{"instance_name": "irods_rule_engine_plugin-document_type-instance","plugin_name": "irods_rule_engine_plugin-document_type","plugin_specific_configuration": {} }However, in the March 2024 development update (https://irods.org/2024/03/irods-development-update-march-2024/), it is stated that:
"the document-type rule engine plugin is no longer provided by the Indexing capability plugin and as a result, you'll need to remove the document-type rule engine plugin from your server_config.json."
This suggests that the plugin has been deprecated or removed, but the 5.0.1 documentation still includes it.
Could you please clarify whether the document-type plugin should or should not be used with iRODS 5.0.1?
2. Mapping file: in the irods_capability_indexing GitHub repository, the es_mapping.json file cannot be applied using the documented curl commands.
Elasticsearch returns an error unless I manually remove the top-level "mappings" keyword and keep only the "properties" section.
3. Indexing not working: following the documentation, I created and tagged a collection:
imkdir indexmeAlthough the queue seems to process tasks (iqstat seems to work), Elasticsearch returns zero hits for every search query.
I am not able to get it to work.
This happens with both Elasticsearch 7.17.24 and Elasticsearch 8.12.2 versions.
Could you please confirm which Elasticsearch versions are officially supported to work with iRODS 5.0.1?
4. Logs: to understand the problem, can you suggest how to investigate the issue with logs or something else? Elastic and inserting an index via curl work correctly.
Thanks in advance,
Best regards,
Laura
Could you please clarify whether the document-type plugin should or should not be used with iRODS 5.0.1?
2. Mapping file: in the irods_capability_indexing GitHub repository, the es_mapping.json file cannot be applied using the documented curl commands.
Elasticsearch returns an error unless I manually remove the top-level "mappings" keyword and keep only the "properties" section.
Could you please confirm which Elasticsearch versions are officially supported to work with iRODS 5.0.1?
4. Logs: to understand the problem, can you suggest how to investigate the issue with logs or something else? Elastic and inserting an index via curl work correctly.
--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/6114b009-cd6a-40c7-85a8-adcadd44a6e9n%40googlegroups.com.
Hi Alan,
Thank you for your instructions. I followed the slides you provided (https://slides.com/irods/ugm-2025-indexing), but I am still experiencing the same issues with the Elasticsearch indexing capability.
My setup:
-iRODS 5.0.1 running in a Docker container (Ubuntu 24.04) with the following plugins installed: irods-rule-engine-plugin-indexing, irods-rule-engine-plugin-elasticsearch
-Elasticsearch 8.12.2 running in a separate Docker container ( I tried with 8.12.1 too)
-A CLI container authenticated as the rods user, from which we execute curl calls to initialize indices (the Elasticsearch indices full_text_index and metadata_index have been created successfully with the proper mappings). I can now use the JSON mapping file as specified in the slides.
Both plugins are configured in server_config.json:
{
"instance_name": "irods_rule_engine_plugin-indexing-instance",
"plugin_name": "irods_rule_engine_plugin-indexing",
"plugin_specific_configuration": {
"job_limit_per_collection_indexing_operation": 1000,
"maximum_delay_time": 30,
"minimum_delay_time": 1
}
},
{
"instance_name": "irods_rule_engine_plugin-elasticsearch-instance",
"plugin_name": "irods_rule_engine_plugin-elasticsearch",
"plugin_specific_configuration": {
"bulk_count": 100,
"hosts": [
],
"read_size": 4194304
}
},
After the Docker containers are up, I used the CLI to perform the following:
-imkdir indexed_collection
-imeta set -C indexed_collection irods::indexing::index full_text_index::full_text elasticsearch
-iput -r ./books indexed_collection/books0
The AVU metadata is correctly applied to the collection, and iqstat shows that 104 delayed jobs have been queued. However, when querying Elasticsearch, the hits array remains empty—no documents appear to have been indexed.
I don’t see any errors in legacy category logs.
Is there a way to manually trigger the processing of the delayed queue? It seems the jobs remain pending and the documents are not being indexed even after waiting.
Best regards,
Laura
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/CADnp3x4fvktm%2BC_afRLH7KaOTbxce6OgM-zkhsyU6G8CY_tW-g%40mail.gmail.com.
It seems the jobs remain pending and the documents are not being indexed even after waiting.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/VI1PR06MB8927E5F5439A10C0CA6CD21082D8A%40VI1PR06MB8927.eurprd06.prod.outlook.com.
Hi Alan,
ps aux | grep "irods.*Server" shows only the irodsServer process, but no irodsDelayServer process.
Delayed rule remain indefinitely in the queue and are never executed :
$ iqstat
id name
10015 writeLine("serverLog", "Delayed Execution");
Therefore, no delay server logs appear in the log files.
For additional context, the server_config.json and server_config section (inside the irods provider Docker container) contains the following settings:
"delay_rule_executors": []
"delay_server_sleep_time_in_seconds": 30,
"maximum_size_of_delay_queue_in_bytes": 0,
"migrate_delay_server_sleep_time_in_seconds": 5,
"number_of_concurrent_delay_rule_executors": 4,
This suggests irodsServer isn't launching irodsDelayServer as a child process automatically, which should happen during normal startup for Irods 5.0.1, correct?.
We use an unattended installation to install iRODS in our container.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/CADnp3x4LD4O7eeBKddxxVXs5PeKci7JypGr5m3_LG-oGnHSHZA%40mail.gmail.com.
This suggests irodsServer isn't launching irodsDelayServer as a child process automatically, which should happen during normal startup for Irods 5.0.1, correct?.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/VI1PR06MB8927455884B6B8330CFC75F682D9A%40VI1PR06MB8927.eurprd06.prod.outlook.com.
Hi Alan,
the hostnames do match in our environment:
$ cat /etc/irods/server_config.json | grep '"host":' --context 1
"graceful_shutdown_timeout_in_seconds": 30,
"host": "irodscp-dev1.iit.local",
"host_access_control": {
--
"database": {
"host": "irods-catalog",
"name": "ICAT",
and:
$ iadmin get_delay_server_info
{
"leader": "irodscp-dev1.iit.local",
"successor": ""
}
Are there any other configuration aspects or logs we should examine to troubleshoot irodsDelayServer?
Laura
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/CADnp3x4MvHJySg3RumY_0OEKQaLv55a_Yz1qEmZnHwgiUWe7mw%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/VI1PR06MB8927A6D43718AC21FCF62B4B82D9A%40VI1PR06MB8927.eurprd06.prod.outlook.com.
Hi,
Apologies for the late response and thank you for your patience.
I increased the log level for the “server” process and reloaded the configuration by restarting the containers from scratch - running the installation process inside the container.
Below are the log entries from the main server process:
"log_message":"apply_access_time_updates: Number of access time updates before deduplication is [0].","server_host":"irodscp-dev1.iit.local"
"log_message":"migrate_and_launch_delay_server: Checking if Agent Factory is ready to accept client requests before launching the Delay Server.","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Connecting to (host, port) = ([irodscp-dev1.iit.local], [1247])","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Connected to server. Sending HEARTBEAT message.","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Reading response from server.","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Received [HEARTBEAT] from server.","server_host":"irodscp-dev1.iit.local"
"log_message":"apply_access_time_updates: Checking if Agent Factory is ready to accept client requests before processing access time updates.","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Connecting to (host, port) = ([irodscp-dev1.iit.local], [1247])","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Connected to server. Sending HEARTBEAT message.","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Reading response from server.","server_host":"irodscp-dev1.iit.local"
"log_message":"is_server_listening_for_connections: Received [HEARTBEAT] from server.","server_host":"irodscp-dev1.iit.local"
"log_message":"apply_access_time_updates: Number of access time updates before deduplication is [0].","server_host":"irodscp-dev1.iit.local"
I’ve also attached the current server_config.json (with sensitive information masked) for reference.
Please let me know if you’d like me to enable additional logging or check any other components.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/CAA-7h7kC-aDz-UZ1k3iFH4NJJQW-d7J%2B2dnjHo_mvg27%2BYPCmg%40mail.gmail.com.
Dear Consortium team,
We believe we may have identified the root cause of the delay server not starting properly.
In our unattended installation, we initially relied on the THE_HOSTNAME placeholder. We later replaced it with the hardcoded container name irods_catalog_provider (similar to what is done in the iRODS demo repository irods/irods_demo).
When using the hardcoded value, the irods_delay_server starts successfully.
For clarity, the THE_HOSTNAME placeholder is used during the unattended installation in the following three locations:
1. In catalog_provider_hosts:
"catalog_provider_hosts": [
"THE_HOSTNAME"
]
2. In the top-level host field:
"host": "THE_HOSTNAME"
3. In host_resolution:
"host_resolution": {
"host_entries": [
{
"address_type": "local",
"addresses": ["irods-catalog-provider", "THE_HOSTNAME"]
}
]
}
Additionally, the hostname value substituted for THE_HOSTNAME is passed via docker-compose.yaml, where it is set as the container hostname:
irods-catalog-provider:
hostname: irodscp-dev1.iit.local
In the new installation (which is working), the unattended installation is configured as follows:
1. In catalog_provider_hosts:
"catalog_provider_hosts": [
"irods-catalog-provider"
]
2. In the top-level host field:
"host": "irods-catalog-provider"
3. In host_resolution:
"host_resolution": {
"host_entries": [
{
"address_type": "local",
"addresses": ["irods-catalog-provider", "THE_HOSTNAME"]
}
]
}
The docker-compose.yaml configuration remains the same:
irods-catalog-provider:
hostname: irodscp-dev1.iit.local
The resulting server_config.json is attached.
What puzzles us is that when using the THE_HOSTNAME placeholder, the generated server_config.json appears to be correct (we performed the verification Alan suggested), yet the delay server fails to start. Only when the container name is hardcoded does the service start successfully.
Do you have any suggestions on what might be causing this discrepancy, or what we should investigate further?
Thank you very much for your assistance.
Best regards,
Laura
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/VI1PR06MB89270336A81D2B3F79BF9E6982A9A%40VI1PR06MB8927.eurprd06.prod.outlook.com.
Hi Terrel,
Thanks for the explanation. It helped clarify a point I hadn't fully understood, specifically the meaning and role of the addresses in the host_resolution stanza and how the first entry actually acts as a reference name.
So, based on what you've described, I confirm that the behavior we are observing is consistent with the intended design. We don't seem to be encountering any bugs; everything seems to be working as expected.
Thanks for the clarification.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/CAFaqteZsu-FbvKHYR%3DTXGHqzUp89nhGAs6FHKt_iKNOx6H0NOA%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/irod-chat/VI1PR06MB89275B30655D074274CAF47A82B4A%40VI1PR06MB8927.eurprd06.prod.outlook.com.