[DuraSpace JIRA] (DS-4135) Citation_* tags should contain plain UTF-8, without escaping

1 view
Skip to first unread message

Bram Luyten (Atmire) (DuraSpace JIRA)

unread,
Jan 5, 2019, 8:13:01 AM1/5/19
to dspace-...@googlegroups.com
Bram Luyten (Atmire) created an issue
 
DSpace / Bug DS-4135
Citation_* tags should contain plain UTF-8, without escaping
Issue Type: Bug Bug
Affects Versions: 5.10, 6.3, 4.9
Assignee: Unassigned
Created: 05/Jan/19 7:12 AM
Fix Versions: 6.4, 5.11, 4.10
Priority: Minor Minor
Reporter: Bram Luyten (Atmire)

Following was reported by folks at Google Scholar.

Observed behaviour

Look at an item where one of the authors has special characters in the name, for example, 

http://demo.dspace.org/xmlui/handle/10673/314

Go into "view source" and go to the HEAD section where the META tags are exposed. Compare what you see for authors in dc.creator, with the citation_* tags, like citation_author

In creator, it's correct:

meta name="DC.creator" content="ètésteinî" 

In citation_author, unwanted escaping is added.

meta content="ètésteinî" name="citation_author"

Desired behaviour

meta name="DC.creator" content="ètésteinî" 

meta content="ètésteinî" name="citation_author"

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.10.0#710001-sha1:0399717)

Bram Luyten (Atmire) (DuraSpace JIRA)

unread,
Jan 5, 2019, 8:13:02 AM1/5/19
to dspace-...@googlegroups.com

Bram Luyten (Atmire) (DuraSpace JIRA)

unread,
Jan 5, 2019, 8:14:01 AM1/5/19
to dspace-...@googlegroups.com
Bram Luyten (Atmire) updated an issue
Following was reported by folks at Google Scholar.

*Observed behaviour*


Look at an item where one of the authors has special characters in the name, for example, 

[http://demo.dspace.org/xmlui/handle/10673/314]

Go into "view source" and go to the HEAD section where the META tags are exposed. Compare what you see for authors in dc.creator, with the citation_* tags, like citation_author

In creator, it's correct:

meta name="DC.creator" content="ètésteinî" 

In citation_author, unwanted escaping is added.
{code:java}
< meta content="&egrave;t&eacute;stein&icirc;" name="citation_author" >
{code}
*Desired behaviour*


meta name="DC.creator" content="ètésteinî" 

meta content="ètésteinî" name="citation_author"

Bram Luyten (Atmire) (DuraSpace JIRA)

unread,
Jan 5, 2019, 9:48:01 AM1/5/19
to dspace-...@googlegroups.com

Bram Luyten (Atmire) (DuraSpace JIRA)

unread,
Jan 5, 2019, 9:48:01 AM1/5/19
to dspace-...@googlegroups.com
Bram Luyten (Atmire) commented on Bug DS-4135
 
Re: Citation_* tags should contain plain UTF-8, without escaping

DSpace-6_x

https://github.com/DSpace/DSpace/pull/2317

Writing the tags as text was the only approach I could find, as disabling output escaping is not allowed on attributes in xslt by default, cfr 

https://stackoverflow.com/questions/2921123/xsl-how-to-disable-output-escaping-for-an-attribute

Bram Luyten (Atmire) (DuraSpace JIRA)

unread,
Mar 19, 2019, 5:20:01 AM3/19/19
to dspace-...@googlegroups.com

Recap of the Github PR discussion:

  1. JSPUI is not affected
  2. Mirage 1 is affected & I added the fix there as well

After the merge for 6.x, also remains to be fixed in 5.x and 4.x

Alan Orth (DuraSpace JIRA)

unread,
Feb 26, 2020, 8:06:01 AM2/26/20
to dspace-...@googlegroups.com
Alan Orth commented on Bug DS-4135

Good catch, Bram Luyten (Atmire)! I've just tested the 6.4 patch with one item in our repository with non-ASCII (Vietnamese) text and confirm that it works.

Before:

<meta content="Thu hoạch v&agrave; bảo quản c&agrave; ph&ecirc; ch&egrave; đ&uacute;ng kỹ thuật (Harvesting and storing Arabica coffee)" name="citation_title">

After:

<meta name="citation_title" content="Thu hoạch và bảo quản cà phê chè đúng kỹ thuật (Harvesting and storing Arabica coffee)" />
This message was sent by Atlassian Jira (v8.4.1#804002-sha1:94e96d6)
Atlassian logo

Alan Orth (LYRASIS JIRA)

unread,
Jul 17, 2021, 5:49:01 AM7/17/21
to dspace-...@googlegroups.com
Alan Orth commented on Bug DS-4135

There is a pull request that partially fixes this for some use cases:

https://github.com/DSpace/DSpace/pull/2317

I tested the pull request with 6.4-SNAPSHOT and it fixes non-ASCII (Vietnamese) text in the citation tags, though it was pointed out that text with quotes needs to be escaped. I proposed merging this for now and creating a new issue for the quotation bug.

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages