Some corner of a foreign field

97 views
Skip to first unread message

John Hewson

unread,
Sep 20, 2017, 10:27:59 AM9/20/17
to AtoM Users
Excuse me if this is documented somewhere I haven't noticed:

While testing an upgrade to AtoM 2.4, I noticed that searching in _all fields no longer includes creator history or notes. For us, at least, this is undesirable. It seems that "which foreign type i18n fields we'll include when searching _all" is now set for an information_object in mapping.yml, so I added creators.history and generalNotes.content to i18nIncludeInAll and that appears to have fixed it. Alternatively, if _all means _all, I guess one could comment out the return in the handleI18nStringFields function in arElasticSearchPluginUtil.class.php, but there's probably a more elegant way of doing that.

Dan Gillean

unread,
Sep 20, 2017, 12:44:00 PM9/20/17
to ICA-AtoM Users
Hi John, 

The reason for this change is best outlined in the following issue ticket: 
There was a related issue with repository records as well: 
Essentially, if you have the word "house" in your authority record, then when searching "house" in archival descriptions, you would end up returning any record that links this as either a creator or a name access point - which could be hundreds of descriptions, depending on the use of the authority record - not one of which will include the word "house" directly. This was leading to confusing and less accurate search results. This issue was even more egregious with repository records - if "house" is in the repository record, then essentially every description linked to that repository was being returned. 

Part of the rationale, beyond what is described in the tickets, was that when you hit return on a search in the global search box, you get archival descriptions - thus users are expecting to find records that contain their search terms. The institution who sponsored the search overhaul was seeing a lot of noise in search results and wanted them to be more accurate. Don't forget that each entity type also has its own dedicated search box on the corresponding browse page. 

The details of all of the search changes are best captured in this development ticket: 
We've also updated the Advanced search documentation with a lot more details - see especially here: 
I'm NOT a developer, but it does look to me like making these changes in the Elasticsearch mapping.yml file is probably the way to go:
In case it helps, here is the original pull request where these changes were made: 
Another option, if you end up deciding not to include these in _all based on the above, might be to tailor how the global autcomplete drop-down results appear (i.e. when all matching entities are shown before you hit enter in the global search box). Right now, I believe this is only going on authorizedFormOfName for authority records, but you could potentially make this include the actor history so matches will appear in the autocomplete without affecting the archival description results directly? Again, not sure if this is all the right places, but see: 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Wed, Sep 20, 2017 at 10:27 AM, John Hewson <iwit...@icloud.com> wrote:
Excuse me if this is documented somewhere I haven't noticed:

While testing an upgrade to AtoM 2.4, I noticed that searching in _all fields no longer includes creator history or notes. For us, at least, this is undesirable. It seems that "which foreign type i18n fields we'll include when searching _all" is now set for an information_object in mapping.yml, so I added creators.history and generalNotes.content to i18nIncludeInAll and that appears to have fixed it. Alternatively, if _all means _all, I guess one could comment out the return in the handleI18nStringFields function in arElasticSearchPluginUtil.class.php, but there's probably a more elegant way of doing that.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/41b582ab-8c08-4580-bd8b-bff5ec80fdd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Hewson

unread,
Sep 21, 2017, 4:14:09 AM9/21/17
to AtoM Users
Hi Dan,
Thanks for your rapid reply. I agree with your reasoning that "When an actor is added as a creator, we should be indexing the biographical/administrative history, because users find it displayed in the body of the archival description view page, and expect to be able to search on that information." The same is, I think, true of general notes. For us, both of these fields return a significant number of relevant results in common searches. Unless we hack mapping.yml to add these fields back to _all (as above), the 'bugs' fixed in 2.4 have been replaced by what is for us a more significant 'bug'. I understand that there are issues with setting include_in_all false on the problem fields, but ideally one would specify the exclusions rather than the inclusions.

Dan Gillean

unread,
Sep 21, 2017, 11:48:00 AM9/21/17
to ICA-AtoM Users
Hi John, 

Thanks for bringing this to our attention, and sharing your well reasoned thoughts. 

I was not aware that notes were no longer returning global search results, but I've tested and confirmed this in 2.4. I've filed a bug ticket for this here, and asked our team to see if I can get a full list of what is included in global search results - I'm not a developer so staring at the mapping.yml file only gets me so far. 
As to creators and histories, you bring up valid points, and I think it's possible the original issue we were trying to solve got a bit lost in all the changes made. I would love to hear from more of the community on this, but in the meantime, I have filed a separate ticket for considering changing this here: 
Ideally, we can index actor histories in _all only when they are directly linked as a creator - not inherited at lower levels, and not when only added as a name access point. In such a case, I think it would be better to return matching results, since the history is pulled into the description, and users will be able to see why the result was returned by looking at the description. I have no idea what's possible vs what limitations we were facing when this was originally implemented this way for 2.4, but if the above conditions can be met, I think it would be a good idea to correct this. 

Anyone else following this thread have an opinion on this they would like to share?


Note that in the meantime, you *can* still target searches on notes fields by querying the field directly, like so:

generalNotes.i18n.en.content:term 

or 

generalNotes.i18n.en.content:"search term"

...where term and "search term" represent whatever you want to search for. If the culture you want to search in is not English, you can swap out the "en" in the field name for the relevant 2-letter ISO language code. More instructions on searching Elasticsearch fields directly in AtoM here: 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Thu, Sep 21, 2017 at 4:14 AM, John Hewson <iwit...@icloud.com> wrote:
Hi Dan,
Thanks for your rapid reply. I agree with your reasoning that "When an actor is added as a creator, we should be indexing the biographical/administrative history, because users find it displayed in the body of the archival description view page, and expect to be able to search on that information." The same is, I think, true of general notes. For us, both of these fields return a significant number of relevant results in common searches. Unless we hack mapping.yml to add these fields back to _all (as above), the 'bugs' fixed in 2.4 have been replaced by what is for us a more significant 'bug'. I understand that there are issues with setting include_in_all false on the problem fields, but ideally one would specify the exclusions rather than the inclusions.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

John Hewson

unread,
Sep 22, 2017, 4:08:20 AM9/22/17
to AtoM Users
Thanks Dan,
From the albeit-non-exhaustive testing I did before posting the suggested fix, and from the way the code appears to work, creators.history is indexed only when the actor is "directly linked as a creator - not inherited at lower levels, and not when only added as a name access point". So, adding creators.history and generalNotes.content to i18nIncludeInAll appears to work as desired. However, it might also be worth considering reversing the logic of handleI18nStringFields to skip excluded fields rather than not skipping included fields.
Reply all
Reply to author
Forward
0 new messages