Physical storage list curation

68 views
Skip to first unread message

jesus san miguel

unread,
Feb 9, 2021, 11:37:49 AM2/9/21
to AtoM Users
Hi,

After importing a CSV, I end up with typically 1000 identical physical storage locations.
Is there any way to trim this list on atom's front end?
Or any mysql recipe to this end?

Best,
Jesus

Dan Gillean

unread,
Feb 9, 2021, 12:04:28 PM2/9/21
to ICA-AtoM Users
Hi Jesus, 

If you are running AtoM 2.6, then there are two command-line tasks that should help you clean up your physical storage containers. See: 
One task will allow you to merge duplicate physical storage locations, if they share an identical container name, location, and type. There's also an option to consolidate just based on container name, if you want to ignore the other properties. Much like the taxonomy normalization task, the oldest container will be preserved, all relations to descriptions will be moved to the original container, and then duplicates will be deleted

The second task will allow you to delete any physical storage containers that are not linked to any descriptions. 

Both tasks have a --dry-run option, meaning you can run the task with this option to see a summary output of what will be changed (i.e. what the outcome will be) without actually making the changes. This is a good way to double-check if the tasks will give you the result you want, and I recommend you use this option with each task before running it without. 

Hope this helps!

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/1a02b283-9c39-4c06-98dc-89ed2d08c65en%40googlegroups.com.

jesus san miguel

unread,
Feb 10, 2021, 11:00:12 AM2/10/21
to AtoM Users
Thank you Dan, this worked like a charm:

[atomadmin@svatomp ~]$ docker exec -it docker_atom_worker_1 bash
bash-5.0# php symfony physicalobject:normalize
Are you sure you'd like to normalize physical object data?
y
Data before clean-up:
- 2205 physical objects
- 2548 physical object relations
Detecting duplicates and updating relations to duplicates...
(Using physical object name, location, and type as the basis for duplicate detection.)
- 2062 relations updated
Deleting duplicates...
- 2064 duplicates deleted
Data after clean-up:
- 141 physical objects
- 2548 physical object relations
Normalization completed successfully.
bash-5.0#

jesus san miguel

unread,
Jun 9, 2021, 8:10:13 AM6/9/21
to ica-ato...@googlegroups.com
Hi again,

I am having issues when normalising a large set of data, well not so large, around 360K...

The task fails with the following error, after 2 hours of heavy mysqld cpu use:

root@atom:/usr/share/nginx/atom# php -d memory_limit=-1 symfony physicalobject:normalize

 Are you sure you'd like to normalize physical object data?
y
Data before clean-up:
 - 360781 physical objects
 - 360780 physical object relations

Detecting duplicates and updating relations to duplicates...
(Using physical object name, location, and type as the basis for duplicate detection.)
Killed

I also tried the --force switch to no avail. Any idea?

Best,
Jesus

You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/5TKE4sJN-Q8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/65523d5d-6be0-450b-a9ec-704cf46ef4dbn%40googlegroups.com.

José Raddaoui

unread,
Jun 9, 2021, 3:15:46 PM6/9/21
to AtoM Users

Hi Jesús,

Maybe it's running out of memory and it's being killed by the OOM manager? I'd try to check if that's the case and give it a bit more memory if that's the issue.

Best,
Radda.

jesus san miguel

unread,
Jun 10, 2021, 3:56:27 AM6/10/21
to ica-ato...@googlegroups.com
Hi José,

That was my first thought too, hence the memory_limit=-1 switch in my command. If I'm not mistaken, this should override any other memory limit.
Anyway there is no memory ballooning in the system during the 2 hours it does its processing, so I feel that not to be the case.
For what it is supposed to do, it seems a bit strange the way it seems to be working, because no physical object data is pruned whatsoever. Correct me if I'm mistaken, but the way it should be working is:
Take a physical location record, check in the i18n table if there is another (older) one with  identical values in the container name, location, and type. If that is the case, locate the relations that the duplicated physical location record has and copy them with the older physical location as target before finally deleting the duplicate record.
So, this should be constructed as a loop working on a physical location at each time, but it seems to be building some kind of structure to perform all the changes at once, hence the extended time and lack of changes that I'm seeing.

Best,
Jesus

José Raddaoui

unread,
Jun 10, 2021, 11:36:46 AM6/10/21
to AtoM Users

Hi Jesús,

Even without memory limit it could be hitting the entire system memory and being killed by the OOM manager, but "I'm glad" that's not the case. I'd need to look deeper at this task to see what may be going on, if you could pass me a dump of that database to my personal email, it may help me reproduce the issue.

Best regards,
Radda.

José Raddaoui

unread,
Jun 16, 2021, 11:59:21 PM6/16/21
to AtoM Users
For anyone that may come around, 

We tested this issue and, while we didn't get to the "Killed" point, the task takes a lot of time and it requires some optimizations to process that amount of data efficiently. We have created the following thicket to keep track of the issue:


Bests.
Reply all
Reply to author
Forward
0 new messages