Reg: iquest query

47 views
Skip to first unread message

kovid bhardwaj

unread,
Apr 15, 2025, 9:54:15 AM4/15/25
to iRODS-Chat
Hi team,

Could you please help me with :
1. an iQuest query that will allow me to search for files that exist in Resource A but not in Resource B
2.iQuest query that will help me identify duplicate files between two resources in our iRODS environment. Specifically, I would like to find files that exist in Resource A but also have duplicates in Resource B.
3. We have two copies of most of the data, one in resource A, one in resource B. Do we have two copies of the metadata as well?  Or do we have just one, and is there a way to tell if there are any differences between the two? 

Waiting for response 
Thanks
Kovid

Jean-Yves Nief

unread,
Apr 15, 2025, 10:24:16 AM4/15/25
to irod...@googlegroups.com, kovid bhardwaj
hello,

kovid bhardwaj wrote:
> Hi team,
>
> Could you please help me with :
> 1. an iQuest query that will allow me to search for files that exist
> in /Resource A/ but /not/ in /Resource B/?
for this one, I did not find a way to implement this with the existing
pseudo SQL syntax.
However, it is possible to create a sql alias fulfilling your request above.
And this sql alias can be executed through iquest later ("iquest -h").
If you want, I can provide it to you (or anyone interested).
cheers,
JY

> 2.iQuest query that will help me identify duplicate files between two
> resources in our iRODS environment. Specifically, I would like to find
> files that exist in Resource A but also have duplicates in Resource B.
> 3. We have two copies of most of the data, one in resource A, one in
> resource B. Do we have two copies of the metadata as well?  Or do we
> have just one, and is there a way to tell if there are any differences
> between the two?
>
> Waiting for response
> Thanks
> Kovid
> --
> --
> The Integrated Rule-Oriented Data System (iRODS) - https://irods.org
>
> iROD-Chat: http://groups.google.com/group/iROD-Chat
> ---
> You received this message because you are subscribed to the Google
> Groups "iRODS-Chat" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to irod-chat+...@googlegroups.com
> <mailto:irod-chat+...@googlegroups.com>.
> To view this discussion visit
> https://groups.google.com/d/msgid/irod-chat/4aa54fd1-f49a-4643-bc84-aa883d3ef089n%40googlegroups.com
> <https://groups.google.com/d/msgid/irod-chat/4aa54fd1-f49a-4643-bc84-aa883d3ef089n%40googlegroups.com?utm_medium=email&utm_source=footer>.

jc...@sanger.ac.uk

unread,
Apr 15, 2025, 11:41:31 AM4/15/25
to iRODS-Chat
Hi Kovid!

As Jean-Yves has already responded to (1), I'll limit myself to (2) and (3) until the grown ups from RENCI arrive!

(2)
I am assuming you dont use composite trees (https://docs.irods.org/4.3.4/plugins/composable_resources/#tree-metaphor) as that would handle this more elegantly (usually).

One way to do this to disambiguate file names vs contents would be to generate a list of checksums for all the data objects in the first resource and then look for matches;

For example
$ iquest --no-page "%s %s/%s" "SELECT DATA_CHECKSUM, COLL_NAME, DATA_NAME WHERE DATA_RESC_NAME = 'demoResc'" | grep example
bf870b64a3009de48b144189f4dda31e /training/home/jc18/examplefile.txt

To build on that, you could use the first command to generate a test file of commands to check the other resource, for example;

$ iquest --no-page 'iquest --no-page " SELECT  COLL_NAME, DATA_NAME WHERE DATA_CHECKSUM = "%s" AND DATA_RESC_NAME = 'other_demoResc'"' "SELECT DATA_CHECKSUM WHERE DATA_RESC_NAME = 'demoResc'" > other_demoResc_checksum.sh
$ head other_demoResc_checksum.sh
iquest --no-page " SELECT  COLL_NAME, DATA_NAME WHERE DATA_CHECKSUM = "660d082d3a6975693f08e508ad53f4a9" AND DATA_RESC_NAME = other_demoResc"
iquest --no-page " SELECT  COLL_NAME, DATA_NAME WHERE DATA_CHECKSUM = "835c46ecf2ad9365c79194d0d724c920" AND DATA_RESC_NAME = other_demoResc""

So you pipe the output into a file and then run that, and it should produce a list of all the files in 'other_demoResc' that match a checksum in demoResc...

However at this point it starts getting tricky managing the bash escaping, so I would reach for the python iRODS client. ChatGPT gave me something that looked workable, so I'd suggest starting there if you went that way!


(3)

As long as both resources are in the same Zone, and the replicas are part of the same in object, then there will be one set of metadata in the catalog for that data object

For example, both replica 0 and 1 for jc18_irods_test_5 are on different resources, but are the same 'file'

$ ils -L jc18_irods_test_5
  jc18              0 root;replicate;sanger;sanger-random;sanger-blueroom;sanger-blueroom-random;irods-seq-sb24-sdd            0 2022-11-09.15:06 & jc18_irods_test_5
    d41d8cd98f00b204e9800998ecf8427e    generic    /irods-seq-sb24-sdd/home/jc18#Sanger1/jc18_irods_test_5
  jc18              1 root;replicate;remote;remote-random;arke;arke-random;irods-seq-i31-sdf            0 2022-11-09.15:06 & jc18_irods_test_5
    d41d8cd98f00b204e9800998ecf8427e    generic    /irods-seq-i31-sdf/home/jc18#Sanger1/jc18_irods_test_5


The Docs have an overview here; https://docs.irods.org/4.3.4/system_overview/data_objects/#data-objects-and-replicas

Hope that helps?


John


kovid bhardwaj

unread,
Apr 15, 2025, 11:49:16 AM4/15/25
to iRODS-Chat
Hi @ Jean-Yves Nief,

Thanks for your response.
Could you please provide the sql alias that can be used with iquest ?
Thanks

Jean-Yves Nief

unread,
Apr 16, 2025, 10:53:51 AM4/16/25
to irod...@googlegroups.com, kovid bhardwaj
kovid bhardwaj wrote:
> Hi @ Jean-Yves Nief,
>
> Thanks for your response.
> Could you please provide the sql alias that can be used with iquest ?
here we go:
> iadmin asq "select r_coll_main.coll_name, r_data_main.data_name from
r_data_main join r_coll_main on r_data_main.coll_id =
r_coll_main.coll_id where r_data_main.coll_id in (select coll_id from
r_coll_main where coll_name like '*WHAT_EVER_U_LIKE*') and
r_data_main.resc_id = (select resc_id from r_resc_main where resc_name =
? ) and r_data_main.data_id not in (select data_id from r_data_main
where resc_id = (select resc_id from r_resc_main where resc_name = ? )
)" yourSQL
and then:
> iquest --sql yourSQL Resource_A Resource_B
It will give you the list of collection name + data name available in
Resource_A but not in Resource_B within *WHAT_EVER_U_LIKE *collection.*
*cheers,*
*JY*
*
> https://groups.google.com/d/msgid/irod-chat/078dbedd-600f-4dfa-aad2-c4592d70aa50n%40googlegroups.com
> <https://groups.google.com/d/msgid/irod-chat/078dbedd-600f-4dfa-aad2-c4592d70aa50n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages