Unmatched double quotes results in error

564 views
Skip to first unread message

UNBC Library

unread,
Mar 15, 2016, 7:11:21 PM3/15/16
to ICA-AtoM Users
Hi there;

I have noticed that our AtoM instance is giving users a 500 server error when the search string has an unmatched double quote character ( " ) contained within it. ie. a single double quote `"` if that makes any sense. :)

This happens whether the single 'double' quote is contained within another string, or when it is alone in the search field. When using the double quote character as part of a matched pair, it functions as expected. Likewise if only a single apostrophe character ( ' ), no 500 server error is observed.

I am using elasticsearch version 1.7.5, running with AtoM version 2.2, on Ubuntu Jessie.

In the logs I see mention of Apache Lucene haveing a parse exception thrown. An example of this is in the text attachment.
 
Any ideas how I might be able to avoid this issue? Has anyone else found that AtoM exibits this behavior? Or is there some configuration item I can do to escape this character in the NGINX settings or AtoM settings?

I understand that having unmatched quotes will be a parse error - but I would like to prevent AtoM from displaying a 500 error when this happens.

Thank you for your consideration folks!

Brad
AtoM-query-lucene-error.txt

Dan Gillean

unread,
Mar 15, 2016, 7:21:36 PM3/15/16
to ICA-AtoM Users
Hi Brad,

My guess is that this is due to reserved characters in Elasticsearch, as you have surmised - so the parse error is not unexpected. ES expects 2 double-quotes to be able to parse the query as an operator, and freaks out when it doesn't find the closing quote. Out of curiousity, do you have records that include a single double quote in the title?

I will ask a developer to take a look at this thread, but I am guessing that avoiding a 500 error would require development at this time - either you would need to sanitize the search (thereby possibly altering the results?), or add some kind of new Error page in AtoM that would be returned with more information, and whatever would be needed for the search query to be intercepted and redirected to the warning page before a parse error is actually caused. It's possible there are further changes you could make in ES itself (further analysers etc) that might better handle this case; I would have to do more digging around in the ES documentation to determine this.

Either way, I think code changes would be required to achieve a better outcome at this point. If you'd like me to get the developers to offer some suggestions, let me know and I'll get them on the thread to point you in the right direction.

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

--
You received this message because you are subscribed to the Google Groups "ICA-AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/bd98d33c-55a2-4f6c-98fb-97e7941b51d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

UNBC Library

unread,
Mar 16, 2016, 7:32:32 PM3/16/16
to ICA-AtoM Users
Thanks for your quick reply Dan!

How do I go about getting the list of records so I can review for that character in the title of the record? I've been trying to do this by reading the schema directly, but a more efficient approach may be to ask you folks for a link to the documentation if such exists.

It would be undesirable to filter the search string to remove the (") character I think - as you say that would most likely alter the results. I've been reading in the ES documentation trying to discover a way to do this "right" - which would likely involve "escaping" those characters so they can be parsed effectively, but I haven't found the correct approach just yet.

I appreciate your help Dan!

Brad Dondale

Mike G

unread,
Mar 16, 2016, 7:48:47 PM3/16/16
to ICA-AtoM Users
To find records with quotes in the title, it'll be easier to just query the MySQL database I think.

SELECT io.title, s.slug FROM information_object_i18n io JOIN slug s ON io.id=s.object_id WHERE io.title like '%"%';

That will spit out any records with " in the title, as well as their corresponding slugs.
Hope this helps!
Reply all
Reply to author
Forward
0 new messages