Import and sort order

268 views
Skip to first unread message

thearch...@gmail.com

unread,
Apr 26, 2018, 4:17:59 PM4/26/18
to AtoM Users
Greetings all,

I've come across this issue a handful of times earlier, but it hasn't struck me to ask for an easier solution until now (and my apologies if this has been covered on the forum earlier or in the documentation - I just haven't had any luck in locating the solution). When adding new file-level descriptions to a series or subseries through the CSV import (command-line), the default behaviour always places new descriptions ahead of older descriptions in our treeview (we prefer the full width, which I suspect contributes to this issue without having a sort by reference code option). Since we use a sequential reference code, where the earliest material may run from f1 to f49, and the newest material added to the same series would run f50 to f99, it would be great to have this import into the system as such. Unfortunately, I am currently getting f50 to f99, followed by f1 to f49 under the series.

In the past, I've only had a small number out of sequence, and I've just used the drag and drop feature (in the grand scheme of things, the sort order is really not essential, but I do get requests from staff to reorder these). With larger imports, drag and drop is a little time consuming. Is there any way to ensure imported records show up at the end of a series/subseries (i.e. is there a parameter I can use on import, or are there recommendations for other creative solutions)?

Thanks in advance!

Jeremy

Dan Gillean

unread,
Apr 27, 2018, 4:32:59 PM4/27/18
to ICA-AtoM Users
Hi Jeremy, 

I'm going to have to wait to get some input from a developer, but I actually think that part of the issue here might be best defined as a bug. 

I tried running a quick test locally in my 2.5 development environment - I created a collection with file-level descendants f1-f15, exported it, modified the CSV so it included new children for import (f16-f30), and then imported it again. 

As I was creating the original records in the user interface, the full-width treeview was immediately putting them in the wrong place - and when I tried to drag and drop to rearrange them, I got the following error message in the related job: 

Job 2003488 "arObjectMoveJob": Mismatch in current position

I tried rebuilding the nested set, clearing the cache, repopulating the index, and then dragging again, only to get the same error. So I tried changing the treeview to the sidebar - and voila, the items were in the correct order. 

Upon import of the additions, the sidebar treeview displayed them correctly immediately - but again, everything was out of order in the full-width treeview:


I have asked a developer to take a look at this thread, and report back on the the treeview code  before I file a bug ticket. However, here is what I think is happening: 

It seems that the full-width treeview may be using its own sort order, which displays the records in a different order than how they are preserved in AtoM's nested set model (the model used to manage hierarchical information in a relational database). This is why I think I'm seeing the sort order change when the treeview type is changed, and why trying to drag and drop to correct the full-width sort display is producing an error - AtoM's database is saying the record is already there, or at least in a different place than I'm being shown. 

From the looks of the image above, it seems the full-width treeview is using an ASCIIbetical sort against the identifier - while the sidebar treeview's manual sort option preserves the order that records are added (which is what you want, in fact). This is just my supposition based on what I'm seeing - I will have our developer review and supplement this thread. 

In any case, the short-term solution may be to use the sidebar treeview - not ideal! Otherwise, I'm going to file a bug ticket when I have more information, and hopefully we can get the sort in the full-width treeview to behave the same as the Manual sort in the sidebar treeview, as a starting point (which should also resolve drag/drop errors). 

Ideally, the sort options would apply to both trees, so users have options as to how the full-width treeview's hierarchy is displayed - but I suspect this might be a rabbit hole, since there has been a long-standing issue with the sort options even for the sidebar treeview, described in greater detail in this thread: 
I suspect we'd have to fix the sort before we'd want to apply it to both trees, and I'm not sure that we'd want to fix it with Option 1 in the thread above, since this would have clearly negative impacts on current functionality. Since the other option is on the expensive side and so far no one has contacted us to sponsor it, I think that for now, we'll have to aim for the easier-to-reach bug fix, and hope we can get that in 2.4.1. 

It's possible our developer might have other workarounds for you in the meantime, or local code changes you could make to help address it until we get a fix into an upcoming release. We'll see! 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/580f75cb-0821-4130-8b11-dd4886a2eeec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

José Raddaoui

unread,
Apr 27, 2018, 11:44:18 PM4/27/18
to ica-ato...@googlegroups.com
Hi Jeremy,

Are you using the development branch (qa/2.5.x) or the 2.4 release (or stable/2.4.x branch) in that instance?

We're currently making changes on the full width treeview for 2.5 and it looks like Dan found a couple of regressions caused by those changes. Specially, we're now sorting by the entire node text by default, which causes all the draft descriptions to be on top in Dan's example and it's also the reason of the mismatch on the move job. However, in 2.4, the order is determined by the `lft` column and new records should have a higher value than their existing siblings.

I may be missing something, so I'd appreciate if you can confirm what version are you using before I take a deeper look.

Regards.

To post to this group, send email to ica-ato...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

thearch...@gmail.com

unread,
Apr 30, 2018, 9:23:51 AM4/30/18
to AtoM Users
Hi José,

We are on 2.4.0 v156 at the moment.

And thanks, Dan, for checking this out further - I appreciate the effort! As far as moving to the side view, it's a small number of descriptions for now, so I think we will just wait for any further fixes and developments here.

Cheers,

Jeremy
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.

To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

José Raddaoui

unread,
May 1, 2018, 1:22:26 PM5/1/18
to ica-ato...@googlegroups.com
Thanks Jeremy,

I'll try to take a look at this tomorrow.

Regards.

To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.

José Raddaoui

unread,
May 3, 2018, 10:37:19 AM5/3/18
to AtoM Users
Hi Jeremy,

I have been able to reproduce the issue in stable/2.4.x but it seems to only happen when the import is done through the CLI. Doing the imports with the GUI (and the job scheduler) the order is maintained. Could that be an option for you in the meantime?

I have created https://projects.artefactual.com/issues/12183 to keep track of the issue.

Thank you for bringing this to our attention.

Best regards.

jesus san miguel

unread,
Sep 3, 2020, 1:42:17 PM9/3/20
to AtoM Users
Hi José,

I just imported from the GUI a collection of descriptions and the order in the full with treeview is not maintained: the new records are on top of the old ones: 00093 was the top identifier before import.

Screenshot 2020-09-03 at 19.37.37.png

I understand that the order options are only for the sidebar presentation, but that is not usable for us.
I couldn't understand after reading the tread if this is an open bug or a missing feature.
We are in 2.6.0 (docker)

Best,
jesus

José Raddaoui

unread,
Sep 4, 2020, 9:51:30 AM9/4/20
to AtoM Users
Hola Jesús,

Unfortunately, recent changes made to avoid transaction deadlocks in the CSV import (see ticket) have re-introduced the issue mentioned in here. I have created a new ticket to follow-up:


Thanks for mentioning it.

Best,
Radda.

jesus san miguel

unread,
Sep 6, 2020, 12:15:11 PM9/6/20
to AtoM Users
Hi José,

Another thing I have stumbled upon is that if I export an archival description (CSV), fill up some data (i.e. digitalObjectPath) and then import back with the option "Update matches ignoring blank fields in CSV" AtoM will create a new record instead of updating the current one. Is this part of the reintroduced bug or am I doing something wrong?

Best,

Dan Gillean

unread,
Sep 8, 2020, 10:41:41 AM9/8/20
to ICA-AtoM Users
Hi Jesus,

I've tried to update the documentation in 2.6 to better reflect the functionality, please take a look:
I've also written a few detailed summaries in previous threads of why it works the way it does currently, and some strategies for using it for roundtripping. See:
Quick summary of some key points: 
  • Because of the way the feature was designed and the original use case for it, sometimes matching when roundtripping in a single system is hard
  • AtoM uses 2 cascading matching patterns. The first depends on matching the legacyID value and the sourcename (i.e. file name when not specified by the user on the command-line - there's no GUI option to specify a sourcename). If that fails, then AtoM next looks for an exact match on title, repository, and identifier. This has some further implications, such as:
    • The legacyID value on export is NOT the same by default as the one you imported with. On export, AtoM will use the internal database's objectID as the legacyID value. If you don't reintroduce the original legacyID values that you used for the first import, then the first level of matching will fail
    • If you have edited title, identifier, and/or repository in your updated descriptions, then the second level of matching will also fail!
    • If you have changed the filename of the CSV, this could also cause the first level of matching to fail
  • There are other fields that will support additional values, but not replacements of existing values - generally these are linked entities, such as access points, notes, etc
  • If you don't want to accidentally create duplicate records when there are no matches, try using the "Skip unmatched" option during import
  • As noted in the first linked summary forum thread, there is a command-line option that will match exclusively on objectID, which may be better suited for roundtripping. See the --roundtrip option described here:
Let me know if you have further questions after reviewing the resources linked above! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.

jesus san miguel

unread,
Sep 8, 2020, 1:14:29 PM9/8/20
to AtoM Users
Hi Dan,

I read carefully the updated 2.6 documentation, and now I can see the rationale behind certain behaviors.
Nonetheless, in my one item import/export stunt, I can see the first level of matching is failing: Both the original legacyID is longtime gone and the original csv filename never crossed my head to use the same.
But my point is that it should have matched on the second level: I just replaced the field digitalObjectURI by digitalObjectPath and filled up the path for the image. No change whatsoever on title, identifier, and/or repository in my untouched descriptions.
--roundtrip option is fantastic, but I need something for my users on the GUI... They insist on using Excel despite my sage advice.

Best,
Jesus

Dan Gillean

unread,
Sep 8, 2020, 5:28:19 PM9/8/20
to ICA-AtoM Users
Hi Jesus, 

Hmm, that's unfortunate. Without knowing more about exactly what your data looked like before and in the spreadsheet, it's difficult for me to guess why this was the case. 

Did these descriptions already have a digital object attached, that you were trying to change? If yes, then this could be it. I noticed that I did not include the digital object columns in my tests and subsequent documentation. I know it's possible to use the update functionality to append a digital object where there was not one before, but I haven't tested replacing one with a different one, and I suspect this would not work. 

One thing you might try doing is removing unnecessary columns from the spreadsheet - i.e. just keeping legacyID, parentID, title, identifier, repository, digitalObjectPath, and perhaps culture. 

I would love to see support for the roundtrip option added to the user interface in the future, but this will require analysis and development beyond what Artefactual can offer without community support. If your institution might be interested in sponsoring such development, feel free to contact me off-list and we can prepare some estimates. 

Otherwise, I suggest experimenting some more with a very small subset of 1-2 records, to see if you can narrow down the cause of whatever's causing the matching to fail. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

jesus san miguel

unread,
Sep 15, 2020, 5:11:16 AM9/15/20
to AtoM Users
I can confirm that first stage matching works as expected, and I would suggest adding the original intake legacyID and filename to the CSV export job, to more easily keep track of records...
Second stage doesn't work for me, that's for sure.

Best,
Jesus

Emily Sommers

unread,
Feb 2, 2021, 2:10:49 PM2/2/21
to AtoM Users
Hi,

We are looking to upgrade to 2.6, so we've installed 2.6 on our test server and I've been testing common tasks, before we upgrade our production site.

I just want to make sure what I am experiencing is the bug in question mentioned earlier in the thread? When I try to import new child records to existing descriptions, the import doesn't keep anything in order. Attaching a screenshot and my CSV.
Screenshot_2021-02-02 Keith McLeod fonds - Discover Archives.png

Do you have a timeline as to when this bug may be fixed? Or will it require some sponsorship?

Thanks in advance for your help!
Best,
Emily
atom_test_9.csv

Emily Sommers

unread,
Feb 2, 2021, 2:33:25 PM2/2/21
to AtoM Users
And here's another example - where I am starting from scratch with a csv and the order is not maintained either. Thanks for your help!!
Screenshot_2021-02-02 Football Clubs fonds - Discover Archives.png
atom_test_10.csv

Kelli Babcock

unread,
Feb 3, 2021, 5:54:28 PM2/3/21
to AtoM Users
Hi everyone, 

As a temporary workaround for this issue, our developer, Sunny, removed the disableNestedSetUpdating change in csvImportTask.class.php and turned it back to $import->disableNestedSetUpdating = false; (https://github.com/artefactual/atom/pull/1125/files)

Ensuring that the hierarchy order can be maintained during csv imports is priority for us so we are fine reverting this change until another solution for nested sets can be found. Our situation may be different from other institutions because Sunny also facilitates big batch deletions or large csv imports for archivists so they can avoid time outs from nested sets updates when deleting/updating descriptions with lots of children in the UI. 

It's not perfect but csv files are now ingesting with accurate hierarchies - which means we are able to upgrade to 2.6.1 (woo). 

Many thanks to Emily for her quick testing and flagging this issue for us.

Take care,
Kelli

Kelli Babcock | she / her
Digital Initiatives Librarian 
Information Technology Services
University of Toronto Libraries
130 St. George Street | Toronto, Ontario 

Dan Gillean

unread,
Feb 4, 2021, 9:13:22 AM2/4/21
to ICA-AtoM Users
Hi Emily and Kelli, 

Thanks for sharing your current workaround, Kelli. 

Emily - yes, issue #13414 describes the bug you encountered in your import tests. I'll include a bit of context and next steps below. 

Background

 In 2.5 we found that running the nested set update on a per-row basis was causing major performance degradation, leading to timeouts and/or database deadlocks. With our move to MySQL 8, there is a new type of query (WITH queries, or Common Table Expressions) that can potentially replace the use of a Nested set model to manage hierarchical relationships in a table-like relational database - and because the nested set has often been the source of performance issues, we began the process of reducing our dependence on the nested set and replacing it where possible, with the goal of eventually removing its use from AtoM entirely. 

Issue #13354 describes the errors we were trying to address with the CSV import, and the solution we implemented. Unfortunately, this has had the unintended side effect of breaking the CSV import ordering: normally, as the nested set updates, each description is given a lft value that is used to maintain order. By running the nested set update at the end of the task, the performance of the import is greatly improved... but the side effect is that there's no ordering criteria available at the end of the process. 

Our long-term goal is still to remove AtoM's use of the nested set entirely (as per issue #13240). However, we didn't anticipate the knock-off effects of doing this work in parts, and will need to reconsider how we can address this bug without undoing the performance gains from #13414.

Next steps

Given that yours is the third institution now to bring this issue to our attention in the forum, we are bumping up priority on this issue and investigating how we might provide a fix in the 2.6.3 release. 

Ideally this solution will not involve reinstating the nested set build with every row of the import, as this will also bring back the import performance issues we were trying to resolve. We can likely craft a solution by adding new database columns to be used for maintaining sort order during an import - but typically we aim to avoid database schema changes in minor bugfix releases, which would position any such fix for 2.7. 

Our team will review this issue further with 2.6.3 in mind, aiming to find something that balances performance, release requirements, and restoring import ordering. I will try to post an update on this thread if and when we find an acceptable interim solution for 2.6.3. 

In the meantime, if you are re-enabling the nested set build during the import process, we recommend avoiding large imports as much as possible, and trying to break up large CSV files with thousands of rows into multiple imports. 

More soon, hopefully! Thanks again for raising this issue and letting us know how you've been working around it locally. 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Dan Gillean

unread,
Feb 12, 2021, 5:26:29 PM2/12/21
to ICA-AtoM Users
Hi again Emily and Kelli, 

Just wanted to give a short update on this thread: our developers have found a solution for the import sort order bug that doesn't require database schema changes  - meaning we can put the fix in the upcoming 2.6.3 release. 

See: 
It's currently going through internal code review. Once that's complete, we'll do some feature testing, before adding the commit to our stable branch for eventual regression testing, packaging, and release of 2.6.3

Thanks for bringing this up again! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages