EAD export issues

33 views
Skip to first unread message

Bojan Marinkovic

unread,
Feb 15, 2016, 11:50:04 AM2/15/16
to ICA-AtoM Users
Dear all,

I have seen that you are already aware about the issues with the special characters during the export of EAD files from AtoM, and that you are planning to solve this issue in the following releases.

I have just wanted to mention couple of interesting situations that you are happening in current version:

- if you replace & with &amp you can solve this issue for short period. If you edit some other field of the same description where you have &amp it will become just &; (final solutions should be just before exporting to check if there is some special character and to replace it with its "code", for example using:

function replace_special_characters($str)
{
$str1 = str_replace('"','"',$str);
$str2 = str_replace('&','&',$str1);
$str3 = str_replace("'",''',$str2);
$str4 = str_replace('<','&lt;',$str3);
$str5 = str_replace('>','&gt;',$str4);

return $str5;
}

)

- there is no check if the content of some field is empty, namely for bioghist field for sure (skip to export this field)

Kind regards
Bojan

Dan Gillean

unread,
Feb 16, 2016, 2:32:16 PM2/16/16
to ICA-AtoM Users
Hi Bojan,

Interesting - I see what you mean! I tested in our 2.3 version - on the first save &amp; remained escaped and visible literally as &amp; in the user interface. After editing another field, the character escaping actually disappeared from the edit field, and was replaced by a literal "&" character again! Hmmm.

Even stranger - I created the attached sample file some time ago - so there are some import warnings as our EAD mappings have changed - but it imports. In the EAD file, I populated every field with a string of special characters for testing - including both a literal & and the escaped &amp;, as well as  &lt; and &gt;. During import, both the literal and escaped ampersands were removed from the import, as were the escaped greater and lesser than symbols!

I have filed the following issue ticket for review and discussion:

I would be very interested to hear more from you (and others) about the ideal behavior here. The general application escaping for security reasons introduced in #7647 (in AtoM 2.2) will not be removed, but all the other tickets I listed as related in #9448 relate to character escaping for EAD roundtripping, and could likely be modified.

Should character escapes (such as &amp;) be interpreted and displayed as the character they represent in the user interface? Should non-escaped characters be automatically converted upon saving? During import, should we swap unescaped characters for ecaped ones? etc.

Keep in mind as well that we always welcome pull requests, if you would like to try to resolve this issue! :D Though I've filed a ticket, I can offer no guarantees that we will be able to address it for the next public release without community involvement or sponsorship.

Cheers,




Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

--
You received this message because you are subscribed to the Google Groups "ICA-AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/3482caaf-edc9-4f75-a009-380e285b3475%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

escaping-test-fonds;ead.xml

Natasa Bulatovic

unread,
Feb 18, 2016, 11:50:44 AM2/18/16
to ica-ato...@googlegroups.com
Hi Dan, Bojan,

eventually, the EAD export has to be valid XML - we encountered invalid XML  if characters are not escaped properly. This bothers other services which expect valid XML :) - thus reject the invalid one for further processing.

* so if one enter "&amp" via User interface  -> one should get back "&amp" in the user interface - and "&amp;amp" in the XML Export.
* if one enters "&" via User interface -> one should get back "&" in the user interface - and "&amp;" in the XML Export.

There are pretty much standard methods how to escape dangerous HTML elements provided within form fields, and to ensure proper values in the database (security issue solved).
For Java apps, i was pretty fine with the  JSoup  library- for PHP one may surely find similar libraries ..
Further you may think as well about the more restrictive workflow - reject any ingest (save) if input contains unsecure tags - and ask users to correct it .. - and do not do anything automatically.

Cheers
Natasa

For more options, visit https://groups.google.com/d/optout.

-- 
--
Natasa Bulatovic
Max Planck Digital Library (MPDL)
Amalienstrasse 33
80799 Munich, Germany
http://www.mpdl.mpg.de

e-Mail: bula...@mpdl.mpg.de
phone: +49-89-38602-223
fax: +49-89-38602-280 

Dan Gillean

unread,
Feb 18, 2016, 3:26:21 PM2/18/16
to ICA-AtoM Users
Hi Natasa,

Thanks for this! 

I agree with you about your use cases. The security escaping we are doing as of 2.2 should remain in place - I would not want to add exceptions so that character escapes display as literals in the user interface. If a user adds &amp; in the edit forms, they should see &amp; when they save, and get &amp;amp; in the EAD XML. Literals should be escaped at export time however. We do need to consider converting proper character escapes in EAD into literals during import though, so they display correctly in the UI after the import is complete - if the user exports again, they will be escaped properly by AtoM when the EAD is generated.

I've added your comments to the public issue ticket I made - thanks again for your input. We have included a number of standard security enhancements for what you've referred to as "dangerous" HTML elements in the 2.2 release - I think it was the process of adding per-case exceptions to this for character escaping and XML generation, rather than a consistent global solution, that lead to the current bug.

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

Reply all
Reply to author
Forward
0 new messages