Korean Language for Digital Object Links in AtoM

61 views
Skip to first unread message

Anthony Lee

unread,
May 10, 2024, 8:15:15 AMMay 10
to AtoM Users
Greetings, 

I am studying AtoM in Korea.

While looking into AtoM, I realized that Korean characters were not recognized properly when uploading a digital object.

To resolve this, we referenced the group's knowledge and modified the code in a way that complements 'sanitize', and as a result, we succeeded in modifying the code to recognize Hangul normally.

20240510_161141.png

However, I've encountered a new challenge: the links to the registered Digital Objects contain Korean characters, preventing the links from working correctly with the uploaded files.20240510_161256.png

If you have any advice on how to resolve this issue, I would greatly appreciate your assistance.

Additionally, one approach I considered was differentiating between the sanitization applied to the Digital Object saved on the server and the sanitization used to name the Digital Object 'item' displayed in AtoM. However, it appears that the sanitization process is currently executed only once, after which the filename is reused.

Since my coding knowledge is at a basic level, I may not fully understand the structure or algorithms involved, and this could be why I have this impression. If you have any helpful insights or guidance that I may have missed, I would greatly appreciate it.

If there is anything helpful that I am not able to understand, please give me some advice on this as well.

The specifications of the server I configured are as follows:
---
AWS t3.medium
> 2CPU / 4GIB Memory
> Linux - Ubuntu 20.04 LTS
> nginx
> AtoM 2.8.1 - 193 version
---

Thank you for your assistance.

Kind regards,
Anthony Lee

Dan Gillean

unread,
May 10, 2024, 12:04:15 PMMay 10
to ica-ato...@googlegroups.com
Hi Anthony, 

Welcome to the AtoM community! 

As a first check: make sure that any Korean characters you are using are UTF-8 encoded. If you have manually typed them in AtoM's user interface then this should be fine - and files that have been named on a linux filesystem will likely use UTF-8 in the filename as well. However, if any of your metadata is copied from elsewhere (say, a Word document, a spreadsheet application, some other similar source, etc), then there is a chance that the characters are not properly encoded, which is why AtoM fails to render them. 

As a sort of short term workaround, you can perhaps try some of the following: 

First, try turning on the permissive slug setting. By default, AtoM will also sanitize the slugs (or permalinks) used in URLs associated with descriptions - i.e. remove special characters, spaces, capitalization, etc. However, the default ends up making many changes that are in fact allowed in URIs, per RFC 3987. We have added a setting that, when enabled, will instead allow any valid unicode character supported in the RFC. This means that capitalization will be preserved, many more symbols and special characters can be used, spaces will be percent encoded instead of replaced with dashes, and more. See: 
If you try regenerating your slugs (be sure to use the --delete option, so that existing slugs are replaced), this may allow AtoM to generate a description slug that matches however the filename is actually named. I would suggest that you try uploading a new file after making this setting, as a test to see if it will help going forward. 

If making that change and regenerating the slugs doesn't work for the current file you can't access, then I would suggest that, using the URL in your screenshot to the digital object original as a guide, try to find it in the uploads directory and see what the current correct name is. 

Note that AtoM's rename module will allow you to rename a linked digital object's filename directly in the user interface - so if you can still get to the related description, you could try editing the filename (i.e. manually sanitizing it) just to see if you can get it to work. See: 
Finally, I just want to clarify what you are asking, so I can try following up with our developers. 

From what I understand: 
  • You modified this function to include some Hangul characters that were causing upload problems in the filename sanitization process
  • This allowed the file to upload
  • However, the function did not seem to properly return the updated $filename parameter, and use it in the application when providing the filepath to the original digital object like you expected
  • You are wondering what you can do to fix this, and/or ensure that the link to the original object matches the filename after this function sanitizes the name
Is that correct?

In the meantime, please let me know if any of the above suggestions or workarounds help!

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/7d030e9a-2a32-4989-a741-a17fb4a6d237n%40googlegroups.com.

Anthony Lee

unread,
May 17, 2024, 8:07:32 AMMay 17
to AtoM Users
Hi Dan,

Thank you for your kind guidance.

I followed the guidance you provided, but unfortunately, I was not able to resolve the issue. However, I did discover something new in the process.

It seems that the issue with the Digital Object URL may not be due to the inclusion of Korean characters in the slug.

After testing by changing the slug as per your instructions, I found that even when the path includes a Korean slug within a Collection, the page connects correctly.
20240517_140658.png
Therefore, it seems challenging to resolve this issue with the provided procedure. Are there any additional code modifications or settings, beyond configuring the slug allowance in the admin page, required in the process of generating the Digital Object URL?

Additionally, regarding the previous request for confirmation, here are more detailed changes I made:

1. I modified Line 3253 in the /lib/model/QubitDigitalObject.php file to recognize Korean characters (and Unicode).
2024-05-13_11.49.56.png  2024-05-13_1.18.54.png
Original: return preg_replace('/[^a-z0-9_\.-]/i', '_', $filename);
Modified: return preg_replace('/[^a-z0-9_\.-ㄱ-ㅎ가-힣]/iu', '_', $filename);

2. After this modification, I confirmed that the Digital Object is ingested correctly, but in AtoM, the URLs for thumbnails and files do not link properly.
2024-05-13_1.21.29.png  2024-05-13_1.22.13.png20240517_143655.png

3. I also verified that even in an environment where the admin settings allow Korean characters in the AtoM path without causing URL issues, the problem persists.

If there are any other methods I can try, please let me know. I will attempt them and share the results.

Best regards,
Anthony

2024년 5월 11일 토요일 오전 1시 4분 15초 UTC+9에 Dan Gillean님이 작성:
Reply all
Reply to author
Forward
0 new messages