Dear All,
I have an issue with harvested items from one of the repositories that I am working on. The harvested items contain Thai characters in their filename(s). There is no problem downloading the actual bitstream in the source repository, but when harvested, the link to the bitstream with Thai characters was converted to question marks when clicked.
I tried cleaning the OAI cache of the source repository and running dspace oai import LANG=en_US.UTF-8, did this after editing this item to trigger an item update to the OAI but the issue still persists.
UTF encoding is enabled in Tomcat both in the source and the harvesting repository (DSpace 6.3 XMLUI).
I believe this is where the harvesting repository gets its link for the bitstreams.
Notice that the Thai text ปะการังเทียมลอยน้ำ_ผลสำเร็จ.pdf was URL encoded into %e0%b8%9b%e0%b8%b0%e0%b8%81%e0%b8%b2%e0%b8%a3%e0%b8%b1%e0%b8%87%e0%b9%80%e0%b8%97%e0%b8%b5%e0%b8%a2%e0%b8%a1%e0%b8%a5%e0%b8%ad%e0%b8%a2%e0%b8%99%e0%b9%89%e0%b8%b3_%e0%b8%9c%e0%b8%a5%e0%b8%aa%e0%b8%b3%e0%b9%80%e0%b8%a3%e0%b9%87%e0%b8%88.pdf
Navigating to that link found in the generated ore file will result in "Resource not found" error because the text was transformed into question marks:
Is there a way to resolve this except by changing the ORIGINAL file name (which is not an option because I have no idea how many of these files contain Thai text)?
Thanks in advance and best regards,
euler