Bug or inefficent query on update-handle-prefix script Dspace 7.6

70 views
Skip to first unread message

Evelio Metric

unread,
Sep 12, 2023, 8:22:26 AM9/12/23
to DSpace Technical Support

Hello 

We are migrating from DSpace 5 to 7.6.
Everything goes smooth but we have copied production database to test environment  and when we change handle prefix it takes toooo loooong  and finally does not update.
After 17 hours we see the below dialog...

# time ./dspace update-handle-prefix 10230 123456789

Getting information about handles from database...

In your repository will be updated 47316 handles to new prefix 123456789 from original 10230!


Servlet container (e.g. Apache Tomcat, Jetty, Caucho Resin) must be running.

If it is necessary, please make a backup of the database.

Are you ready to continue? [y/n]: y

No changes have been made to your data.



real    1032m21,431s


The offending query is


 select count(distinct handle0_.handle_id) as col_0_0_ from public.handle handle0_ cross join public.handle handle1_ where handle0_.handle like $1

 DETALLE:  parámetros: $1 = '10230%'


Any ideas or workaround?


Thanks in advance

Evelio


DSpace Technical Support

unread,
Sep 13, 2023, 11:27:08 AM9/13/23
to DSpace Technical Support
Hi Evelio,

I responded to your message on Slack as well.  But, I've verified that the query itself is definitely inefficient and created a bug ticket for this at https://github.com/DSpace/DSpace/issues/9066

While I was investigating this, I also realized there may be a "quick fix" of removing the "cross join" portion of that query.  So, I created a small PR for that: https://github.com/DSpace/DSpace/pull/9067

This small PR improves performance slightly on a small database (I only have 1,000 handles to test with, not 47,000 like you).  If you (or someone else) has time to test this small fix on a larger database, I'd be curious to find out if it improves the performance there as well.  If it does, we can get this fix merged immediately (and included in 7.6.1).  But, if it's not a complete fix, then I might need to find others to help me investigate this ticket further to determine what else might be going on.

Thanks,

Tim

Evelio Metric

unread,
Sep 15, 2023, 8:55:37 AM9/15/23
to DSpace Technical Support
Hi Tim
Your patch works. At least the scripts ends in 56 minutes instead of not ending properly after 17 hours.

Thanks a lot

This is the dialog

# time ./bin/dspace update-handle-prefix 10230 123456789


Getting information about handles from database...
In your repository will be updated 47316 handles to new prefix 123456789 from original 10230!

Servlet container (e.g. Apache Tomcat, Jetty, Caucho Resin) must be running.
If it is necessary, please make a backup of the database.
Are you ready to continue? [y/n]: y

Updating handle table... 47316 items updated
Updating metadatavalues table... 831 metadata values updated
Handles successfully updated in database.

Re-creating browse and search indexes...
The script has started
(Re)building index from scratch.
Done with indexing
The script has completed

real    56m18,467s
user    16m35,114s
sys     5m25,811s
Reply all
Reply to author
Forward
0 new messages