Mass delete of archival descriptions

328 views
Skip to first unread message

Jonathan Chu

unread,
Apr 11, 2018, 9:45:11 AM4/11/18
to AtoM Users
Hi! I have just started using AtoM, and was testing csv importing archival description. Unfortunately, I have included a lot of blank rules in it which resulted in 900 odd archival descriptions being created. Is there a fast way to delete all these? 

Thanks :)
Screen Shot 2018-04-11 at 9.43.27 PM.png

Dan Gillean

unread,
Apr 11, 2018, 11:06:20 AM4/11/18
to ICA-AtoM Users
Hi Jonathan, 

Oh no! 

This has possibly occurred because your CSV file was not using the correct line endings and/or character encodings. This can often happen if you use Microsoft Excel, for example - Excel makes it difficult to save a CSV with UTF-8 encoding and unix style line ending charaters. Instead, it tends to default to using its own Win-Latin character encodings, and Windows-style line endings, which can cause issues on import. For more info, see: 

We generally find that LibreOffice Calc is a better spreadsheet application for preparing AtoM CSV data, since it allows you to set the encoding every time you open the file, and will maintain the correct line ending characters. 

That said, you are not the first person to run into this issue - and ideally, AtoM should be smart enough not to import completely blank rows! 

Another community member ran into the same issue in this thread. We ended up filling a bug ticket for the blank rows, and we have a fix that will be included in our 2.5 release, to help prevent this from happening in the future. You can see the ticket here: 
We didn't end up having to help Darryl delete the blank records, because he was creating nightly backups, so he simply rolled back and fixed the CSV prior to the next import. We do strongly recommend that you get in the habit of creating backups on a regular basis, especially before making bulk changes (like imports) into a production environment! We haven't tested this method ourselves, but Darryl did share the following: 

  I’m doing nightly backups of the Percona databases (using something I found on GitHub https://github.com/sixninetynine/surrogate) 

You could probably also use a cron job to schedule nightly backups. 

In the meantime, that doesn't help you! And I do think it would be useful for us to have a SQL query on hand to delete blank descriptions, so I will speak to our developers and see if we can craft something. Note that it may be complicated to do so, however, because data is added to several MySQL tables for each row imported, e.g. object, information_object, information_object_i18n, object_term_relation, slug, status.


If we're able to provide you with a SQL query you can use for this, then first you will need to access the MySQL database from your command-line terminal, to be able to run the query. To do so, you will need to know the username, password, and database name you used during the installation process. Here is the basic structure of the command to open the MySQL terminal: 
  • mysql -u username -pPASSWORD dbName
For example, assuming your user and password are root and your database name is atom, you can access the MySQL terminal like so:
  • mysql -u root -proot atom
This should open the MySQL prompt. If you don't recall the database name and credentials you used during installation, you can double-check them in the config/config.php configuration file. You could use nano to open this file: 
Now we should be ready to run any relevant SQL query. 



If we're not able to provide such a query, then you do have one more workaround option: go nuclear! :)

The following command will purge ALL data from your database. If you only have a few descriptions (minus the ones in your CSV, which you should fix and reimport after making a backup), then exporting them first via the Clipboard might be a viable option. Remember that if you have detailed authority records and repository records you might want to export them separately, since only minimal details for these related entities are included in the archival description CSV export. When you re-import, I'd start with the repository record(s), then the authority records, then the descriptions last. 

If you want to purge all data in your site, use the following command: 
Anyway, I will see if we are able to craft a query that will help you target only these blank rows. Stay tuned. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Wed, Apr 11, 2018 at 9:45 AM, Jonathan Chu <jonatha...@gmail.com> wrote:
Hi! I have just started using AtoM, and was testing csv importing archival description. Unfortunately, I have included a lot of blank rules in it which resulted in 900 odd archival descriptions being created. Is there a fast way to delete all these? 

Thanks :)

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/caf2dd1d-26c6-4bcb-b176-2d38bf1ee73c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted

Dan Gillean

unread,
Apr 11, 2018, 1:30:45 PM4/11/18
to ICA-AtoM Users
Hi Jonathan, 

I'm an archivist, not a developer, so the assistance I can personally provide is limited, but I did try to include a list of what I think are all the affected tables in my email: 
  •  object, information_object, information_object_i18n, object_term_relation, slug, status. 
If it helps, we have some Entity Relationship Diagrams on our wiki, here: 
My hope is that we may be able to delete from the object table (which is a root table for all main entities in AtoM), and simply have the deletes cascade across other related tables... but again, I'm not a developer, so I'm not sure! I hope to have a response from our developers for you soon. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Wed, Apr 11, 2018 at 11:59 AM, Jonathan Chu <jonatha...@gmail.com> wrote:
Hi Dan,

Thanks for your detailed reply! :)

I'm familiar with MySQL query command on shell. But could I ask whether they would give out all table involved for removing data??

Best,
Jonathan
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

Dan Gillean

unread,
Apr 13, 2018, 5:25:07 PM4/13/18
to ICA-AtoM Users
Hello Jonathan! 

I'm attaching a script that you should be able to use to delete blank descriptions in your site. You can also find it here:
Our developer and I have both given this a test, but I still *strongly* encourage you to make a backup of your database first if there is any data in there you do not wish to risk losing if something goes wrong! 

Basic instructions on use: 
  • Download the script - make sure you save it with a .php extension (not a .txt one!). You can easily do this from the gist link if you first click the "Raw" button, and then right-click and select Save as. Remember to manually give it the correct extension (.php) when saving; your browser will probably want to save it as a text file. 
  • Place the script in your root AtoM directory (e.g. /usr/share/nginx/atom)
  • Use the tools:run command line task to execute the script - AKA: 

  •  php symfony tools:run /usr/share/nginx/atom/delete_empty_descrs.php
If you run the script without making any changes, is has a dry run variable set currently - meaning it will print out matching descriptions (aka those with no title, repository, identifer, and a slug that matches the pattern of AtoM's automatic slug rules when no data like a title is provided to base the slug on), but will not delete them. If you want make the script go live and delete the descriptions (AKA once you've tried it without modification and looked over the output to see if it appears correct), then open the script in a text editor and change line 33 from: 
  • $dryRun = True;
To:
  • $dryRun = False;
And now execute the script again. This time it should delete the empty descriptions. 

Let us know how it goes!

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

delete_empty_descrs.php

Awais Mehmood Khalid

unread,
Jun 3, 2019, 10:20:20 AM6/3/19
to AtoM Users
Hi Dan, 

Thanks for the script, We have around half a million records to delete in the database and this script is too slow for that. It is taking around 20 seconds to delete just one record. Is there any way to delete that many records quicker ?

Thank you for your help. 

Mehmood. 
To unsubscribe from this group and stop receiving emails from it, send an email to ica-ato...@googlegroups.com.

To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

Steve Breker

unread,
Jun 3, 2019, 6:38:05 PM6/3/19
to AtoM Users
Hi Mehmood

I have made some changes to the script Dan sent to you. This new version might be faster for you.

Again, as Dan pointed out above:  I *strongly* encourage you to make a backup of your database first if there is any data in there you do not wish to risk losing if something goes wrong!  This script is untested.



Steve

Awais Mehmood Khalid

unread,
Jun 5, 2019, 2:32:33 PM6/5/19
to AtoM Users
Hi Steve, 

Thanks so much for your quick response. Much appreciated. I have tried this script and it doesnt delete the records and gets killed after a while. But when I comment following lines 

if (null !== $children = getChildren($row['id']))
    {
      print(" SKIPPING - record has child records.\n");
      continue;
    }

It starts deleting the empty records but still taking the same amount of time to delete the records. 

Any idea why ?

Thanks in advance for your help.

Steve Breker

unread,
Jun 6, 2019, 6:46:02 PM6/6/19
to AtoM Users
Hello! I would not recommend commenting out those lines. They protect against deletion of records that have existing child records - these will not be records created by the CSV import. Basically these lines are a safety precaution.

One reason the records may not be deleting is that there is a $dryRun variable that you have to set to 'False' to disable.  When $dryRun is True, the script will loop over the data and output what it would have done, but it will not actually change anything - this is meant for testing beforehand.

Hope this helps.

Steve

Awais Mehmood Khalid

unread,
Jun 12, 2019, 4:26:32 PM6/12/19
to AtoM Users
Hi Steve, 

Thanks for answering, 

I have tried it many ways on our dev server instance and without commenting out those lines, it doesnt delete the records (I do change $dryRun variable). after running for like 20 minutes, it just shows killed on the command line. I know that those lines are safety precautions, but if I remove them it gets to the deletion part but still taking about 20 seconds time to delete a record. I am not going to comment the lines any more but then it gets killed without deleting the empty records. Do you have any idea or solution to this ? It would be greatly appreciated. 

thanks a bunch. 

Awais Mehmood Khalid

unread,
Jun 19, 2019, 10:17:04 AM6/19/19
to AtoM Users
Hi, 

Just trying to ping here to see if there is anything that can help us here. 

Thanks so much. 

Dan Gillean

unread,
Jun 21, 2019, 3:03:49 PM6/21/19
to ICA-AtoM Users
Hi Mehmood, 

Steve has made one final update to the script that may help. Before starting it will now disable the Elasticsearch index first. After you've run it, you'll need to manually repopulate the search index using: 
The newest version (now v.03) of the script can be found at the same URL:
If it still doesn't work at scale, you may need to explore other options on your own. Unfortunately, I don't think our developers can spend any more time reviewing, refining, and troubleshooting a script developed in response to a public user forum issue. The amount of free support we are able to provide via this forum is limited by the the ongoing client project we need to focus on, so we can continue maintaining and developing AtoM, and keep the lights on here at Artefactual. 

If you have half a million blank records to delete and this is more than the content you DO want to keep, you might consider something like exporting the records you want to keep, purging your database, and then reimporting?

In terms of the script - I'm NOT a developer, but I'll include a couple thoughts on things you could try below. As ever, these are untested, Artefactual takes no responsibility for any errors or issues encountered, and we strongly encourage you to make backups of your data before you do anything

First, if you know that all the descriptions you want to keep have identifiers, levels of description assigned, and are linked to repository records, then you could try simplifying the SQL query by removing some of the criteria in lines 55-57. Without those criteria, the query would be looking for records that: 
  • Are top level descriptions
  • Have no title
Make sure that these two things are enough criteria to delete the blank records but NOT to delete records you want to keep! 

Second, it may be possible to add a LIMIT clause to the SQL query, and run the script in batches of say 1,000 records at a time. This would mean you'd need to run the script a number of times, but it may prevent the script from becoming unresponsive. Just an idea! 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

lunaria...@gmail.com

unread,
Jun 11, 2020, 8:28:54 AM6/11/20
to AtoM Users
I used the script, as I had about 90 records exactly as described - archival descriptions with "Untitled" (draft), almost certainly the result of uploading CSV file(s) with blank rows.
It identified them all, but wouldn't delete as it identified "record has child records".
It wasn't obvious what these child records were. In the database, i ran a query:
SELECT * FROM `information_object` WHERE id BETWEEN xxxxx AND yyyyy

In the result set. all fields were NULL except oai_local_identifier (numeric sequence), parent_id = 1, lft and rgt which were both numeric.
I ran a similar query against the table `information_object_i18n` with same number of results but all fields NULL except id.
It occurred to me that I could simply delete these records and the search would no longer find them, but this isn't the case (I tried).
The relationship diagram is a bit difficult to navigate, so I'm stuck trying to find the child records that I need to delete before the script will remove the offending entries.
Any help much appreciated.

Steve Breker

unread,
Jun 19, 2020, 3:02:31 PM6/19/20
to AtoM Users
Hello - It seems your records have child records. In order to delete these parents with children, the children must be removed first.


In addition to a 'dry-run' setting, there is also a new 'deleteChildDescriptions' settings. If this is set to TRUE and DRY-RUN is False, then it should delete the child records too.

Warning - this script and its changes have not been tested with your dataset and should be considered experimental - use at your own risk. PLEASE MAKE BACKUPS OF YOUR DATA BEFORE RUNNING THIS SCRIPT.


Steve

lunaria...@gmail.com

unread,
Jun 23, 2020, 9:04:38 AM6/23/20
to AtoM Users
Hi Steve, thanks for the updated script. I pulled the revision and ran it again. Unfortunately it still isn't deleting the records. Here's a sample of my output - this is with:
$dryRun = False;
$deleteChildDescriptions = True;

root@arc:/usr/share/nginx/atom-2.5.4# php symfony tools:run delete_empty_descrs.php
DELETING info obj records...
Deleting information object. ID: 48479 6bnf-79hq-4s5r
Child recs found:
Deleting information object. ID: 48480 tk2g-7ce3-qmnz
Child recs found:
Deleting information object. ID: 48481 x5h3-b55a-bs2p
Child recs found:
Deleting information object. ID: 48482 x9r7-p9dc-besp
Child recs found:
Deleting information object. ID: 48484 cwe2-ma5h-z9ak
Child recs found:
Deleting information object. ID: 48485 8dcm-fg97-h724
Child recs found:
Deleting information object. ID: 48488 xxe2-4rfk-359h

Steve Breker

unread,
Jun 25, 2020, 8:37:15 PM6/25/20
to AtoM Users
Hello!

I think I've found the reason. I have updated the gist here:


PLEASE MAKE BACKUPS OF YOUR DATA BEFORE RUNNING THIS SCRIPT.
Warning - this script will permanently remove data from your AtoM database. It has not been tested with your dataset and should be considered experimental - use at your own risk. 

Steve

lunaria...@gmail.com

unread,
Jun 27, 2020, 10:37:09 AM6/27/20
to AtoM Users
Ding Dong


Thanks Steve, your help is much appreciated.
I still have a few hundred drafts but these can safely be handled using:

php symfony tools:delete-drafts


Reply all
Reply to author
Forward
0 new messages