irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server

174 views
Skip to first unread message

Francisco Morales

unread,
Nov 6, 2023, 10:24:25 AM11/6/23
to iRODS-Chat
Hi, 

I have two irods servers configured, one as provider and one as consumer. Also, the s3 plugin is configured in both servers and the s3 credentials area available for the two servers.

The S3 resource is configured  with ARCHIVE_NAMING_POLICY=decoupled; however, the provider server still uses ARCHIVE_NAMING_POLICY=consistent, which can be evidenced when checking the S3 path of the data object and the S3 data object DATA_PATH is in sync with the S3 bucket file path. This seems to be a bug in the S3 plugin. 

I've managed to reproduce this issue when:
1. Irods provider and consumer are installed in different servers
2. Both servers have S3 plugin installed
3. Do an iput -R S3Resource from each server
4. Check the DATA_PATH of the uploaded files with ils -L

The environment where I've tested:
SO: Centos 7.9
root# cat /etc/*-release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)

root# rpm -qa | grep -E "irods|postgres"
irods-externals-clang-runtime6.0-0-1.0-1.x86_64
postgresql15-server-15.4-1PGDG.rhel7.x86_64
irods-rule-engine-plugin-python-4.2.12.0-1.x86_64
irods-externals-boost1.67.0-0-1.0-1.x86_64
irods-runtime-4.2.12-1.x86_64
irods-sudo-microservices-4.2.12_1.0.0-1.x86_64
irods-externals-fmt6.1.2-1-1.0-1.x86_64
irods-externals-nanodbc2.13.0-1-1.0-1.x86_64
postgresql15-libs-15.4-1PGDG.rhel7.x86_64
postgresql11-libs-11.21-1PGDG.rhel7.x86_64
postgresql-odbc-09.03.0100-2.el7.x86_64
irods-resource-plugin-s3-4.2.12.0-1.x86_64
irods-externals-libarchive3.3.2-1-1.0-1.x86_64
postgresql15-contrib-15.4-1PGDG.rhel7.x86_64
irods-server-4.2.12-1.x86_64
irods-externals-libs3c0e278d2-0-1.0-1.x86_64
irods-externals-zeromq4-14.1.6-0-1.0-1.x86_64
irods-icommands-4.2.12-1.x86_64
postgresql11-odbc-16.00.0000-1PGDG.rhel7.x86_64
irods-uu-microservices-4.2.12_0.8.2-2.x86_64
irods-externals-avro1.9.0-0-1.0-1.x86_64
postgresql15-15.4-1PGDG.rhel7.x86_64
irods-database-plugin-postgres-4.2.12-1.x86_64

Thanks for your help

James, Justin Kyle

unread,
Nov 6, 2023, 10:54:01 AM11/6/23
to iRODS-Chat
I'm confused by this.  The DATA_PATH should always be the <bucket_name>/<object_key> in either attached or detached mode.

The difference between attached and detached modes are that the requests will not be re-routed to the server tagged for the resource in detached mode.  However, to make that work the S3 backend must be reachable from all servers that clients are connecting to.

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Monday, November 6, 2023 10:24 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: [iROD-Chat:21497] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 
You don't often get email from nido...@gmail.com. Learn why this is important
--
--
The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
 
iROD-Chat: http://groups.google.com/group/iROD-Chat
---
You received this message because you are subscribed to the Google Groups "iRODS-Chat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to irod-chat+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/irod-chat/aa268013-50f3-4cc7-9c86-e8627d3bec0an%40googlegroups.com.

Francisco Morales

unread,
Nov 7, 2023, 2:54:41 AM11/7/23
to iRODS-Chat

Hi jjames, 

Are you maybe refering to the setting HOST_MODE that can be set to e.g. cacheless_attached? For a file myfile.txt the setting ARCHIVE_NAMING_POLICY has the following effect:
1. When set to ARCHIVE_NAMING_POLICY=consistent:
 COLL_NAME/DATA_NAME =  /zone/path/to/collection/myfile.txt
 DATA_PATH (S3) = /bucket_name/path/to/collection/myfile.txt

2. When set to ARCHIVE_NAMING_POLICY=decoupled:
 COLL_NAME/DATA_NAME =  /zone/path/to/collection/myfile.txt
 DATA_PATH (S3) = /bucket_name/<integer_id>/myfile.txt

The issue that I'm describing is that the irods provider server keeps using the consistent naming policy regardless of the actual configuration in the context parameter of the S3 resource.

Best regards, 
Francisco

Francisco Morales

unread,
Nov 8, 2023, 10:43:06 AM11/8/23
to iRODS-Chat
Hi, 

As a additionally detail, I have tested changing the configuration of the S3 resource. Initially, the host was the server acting as the consumer; then I cahnged it to the provider. Now, the behaviour is that a file uploaded to the S3 resource via the provider follows the decoupled naming scheme as expected, but a file uploaded via de consumer follows the decoupled naming scheme. Thus, the naming policy seems correctly followed by the server = host configured in the S3 resource, and the other host seems to ignore the decoupled policy and uses the consistent naming convention.

Is this behaviour someone else has experienced?

Best regards, 
Francisco

James, Justin Kyle

unread,
Nov 8, 2023, 10:48:47 AM11/8/23
to iRODS-Chat
Oh, I misread your first email.  I confused decoupled with detached.  Apologies for that.  I will test this out and see if I can reproduce it.


From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Wednesday, November 8, 2023 10:43 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21504] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 

James, Justin Kyle

unread,
Nov 8, 2023, 11:13:36 AM11/8/23
to iRODS-Chat
I was able to reproduce this issue.  I'll write an issue on it and look into it.

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Wednesday, November 8, 2023 10:43 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21504] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 

James, Justin Kyle

unread,
Nov 8, 2023, 11:26:15 AM11/8/23
to iRODS-Chat
Here is the issue.

When a client is connected to one server and the S3 resource is attached to another server and in decoupled mode, an iput does not honor decoupled mode. It instead uses the path in iRODS for the ke...


From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of James, Justin Kyle <jja...@renci.org>
Sent: Wednesday, November 8, 2023 11:13 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21506] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 

Francisco Morales

unread,
Nov 8, 2023, 11:31:40 AM11/8/23
to irod...@googlegroups.com

Francisco Morales

unread,
Jan 17, 2024, 1:59:13 AM1/17/24
to iRODS-Chat
Hi, 

I've seen that the github issue tracking this bug has been closed, so thanks for fixing it! Is there a expected date where an rpm package will be available to install in a RedHat-like system?

Best, 
Francisco

Francisco Morales

unread,
Jan 17, 2024, 5:48:34 AM1/17/24
to iRODS-Chat
Hi James, 

I've compiled the source code for the s3 plugin following the guidelines on https://github.com/irods/irods_resource_plugin_s3 to test the patch regarding the Archive Naming policy = decoupled, but I'm still seeing the same behaviour as before the patch. Here the details of what I've done:

OS: CentOS 7
irods server: 4.2.12

  • Compiled the irods s3 plugin from source code, last commit d93db0b4c1e51e14bb65e931844ff7f9f599b7aa. 
  • Generated an rpm file, with dependency: irods-externals-libs3e8457e09-0-1.0-1.x86_64
  • Installed the generated rpm
  • Created a s3 irods resource: 
```
resource name: surfObjStore1
id: 10018
zone: yoda
type: s3
location: yodafm02.yodadtap.src.surf-hosted.nl
vault: /yodadtap
free space:
free space time: : Never
status:
info:
comment:
create time: 01705486000: 2024-01-17.10:06:40
modify time: 01705486065: 2024-01-17.10:07:45
context: S3_DEFAULT_HOSTNAME=xxxxxxxxxxx;S3_AUTH_FILE=/etc/irods/s3.keypair;S3_REGIONNAME=NL;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTPS;S3_ENABLE_MD5=1;ARCHIVE_NAMING_POLICY=decoupled;HOST_MODE=cacheless_attached;S3_CACHE_DIR=/s3_cache.surfObjStore1
parent:
parent context: 
```

  • Uploaded a file from the provider -> the path name of the file is still in consistent mode: /bucket/home/rods/test100_icat3.bin
  • Uploaded a file from the consumer -> the path name of the file is still decoupled mode (expected): /bucket/22001/test100_resc2.bin
Are there any obvious step(s) I'm missing?

James, Justin Kyle

unread,
Jan 17, 2024, 10:09:34 AM1/17/24
to iRODS-Chat
Did you install the new version on both the resource you are connected to and the one being redirected to?  The updated code needs to be on both as there is some necessary processing that happens before the redirect.


From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Wednesday, January 17, 2024 5:48 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21666] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 

Francisco Morales

unread,
Jan 17, 2024, 10:27:33 AM1/17/24
to iRODS-Chat
Yes,  I've installed the plugin in both the provider and the consumer servers. What other details could I provide you to see if things are configured properly?

Best, 
Francisco

James, Justin Kyle

unread,
Jan 17, 2024, 10:45:05 AM1/17/24
to iRODS-Chat
I just realized that when I tested this in 4.2.12, I had the resource in detached mode so the redirect didn't happen.  Unfortunately, the testing environment doesn't support automated testing for plugins in topology so this scenario has to be manually tested at the moment.

I will have to look into this.  Sorry about that.


From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Wednesday, January 17, 2024 10:27 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21669] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 

James, Justin Kyle

unread,
Jan 18, 2024, 11:48:01 AM1/18/24
to iRODS-Chat
Francisco,

I noticed on the 4-2-stable pull request that I had replaced logical_path() to physical_path() on that one when cherry-picking the code from main (4.2.12).

I fixed that and the pull request has been merged.  However, I later noticed that this fix doesn't work for parallel transfers in either 4.2.12 or 4.3.1 so I'll have to keep looking into that.

The latest pull will be merged shortly if you wanted to try it with a file smaller than 32MiB.  You might just want to wait until everything is fixed.  That's up to you.

From: irod...@googlegroups.com <irod...@googlegroups.com> on behalf of Francisco Morales <nido...@gmail.com>
Sent: Wednesday, January 17, 2024 10:27 AM
To: iRODS-Chat <irod...@googlegroups.com>
Subject: Re: [iROD-Chat:21669] irods S3 plugin ARCHIVE_NAMING_POLICY configuration differs per irods server
 

Francisco Morales

unread,
Jan 18, 2024, 12:31:15 PM1/18/24
to irod...@googlegroups.com
Hi James,

As we want to test this with large files, I'll wait. Thanks for the update.

Francisco 

Reply all
Reply to author
Forward
0 new messages