Issue with harvested items with Thai characters in filenames

22 views
Skip to first unread message

euler

unread,
Oct 14, 2022, 12:14:12 AM10/14/22
to DSpace Technical Support
Dear All,

I have an issue with harvested items from one of the repositories that I am working on. The harvested items contain Thai characters in their filename(s). There is no problem downloading the actual bitstream in the source repository, but when harvested, the link to the bitstream with Thai characters was converted to question marks when clicked.

To make it clear, here is the item from the source or original repository: https://repository.seafdec.or.th/handle/20.500.12067/1677

And here is the link to the harvested item: https://repository.seafdec.org/handle/20.500.12066/6814

I tried cleaning the OAI cache of the source repository and running dspace oai import LANG=en_US.UTF-8, did this after editing this item to trigger an item update to the OAI but the issue still persists.

UTF encoding is enabled in Tomcat both in the source and the harvesting repository (DSpace 6.3 XMLUI).


I believe this is where the harvesting repository gets its link for the bitstreams.

Notice that the Thai text ปะการังเทียมลอยน้ำ_ผลสำเร็จ.pdf was URL encoded into %e0%b8%9b%e0%b8%b0%e0%b8%81%e0%b8%b2%e0%b8%a3%e0%b8%b1%e0%b8%87%e0%b9%80%e0%b8%97%e0%b8%b5%e0%b8%a2%e0%b8%a1%e0%b8%a5%e0%b8%ad%e0%b8%a2%e0%b8%99%e0%b9%89%e0%b8%b3_%e0%b8%9c%e0%b8%a5%e0%b8%aa%e0%b8%b3%e0%b9%80%e0%b8%a3%e0%b9%87%e0%b8%88.pdf

Navigating to that link found in the generated ore file will result in "Resource not found" error because the text was transformed into question marks: 
thai-text.PNG
Is there a way to resolve this except by changing the ORIGINAL file name (which is not an option because I have no idea how many of these files contain Thai text)?

Thanks in advance and best regards,
euler
Reply all
Reply to author
Forward
0 new messages