Hi Community,
Before I dive into my questions, let me introduce myself. My name is Marek Tilgner, and I am a self-employed IT technician as well as a member of various public-channel media groups in Germany.
Since the end of 2023, I have been working on implementing AtoM at a local institution that owns thousands of AV media assets in various formats, including:
Analog audio and video tapes
Printed media
Audio CDs
Magnetic tapes
DVDs, MiniDiscs, and more
About 10 years ago, they began digitizing all of their media onto disk—but without a structured system, resulting in a chaotic collection. Today, they have around 50TB of AV media, and I am using AtoM to organize and catalog it properly.
Currently, I am working with AtoM 2.8.2, but I have encountered several issues, which led me to implement some quick and dirty fixes to bypass them.
Progress & ChallengesI have successfully imported around 17,000 items, including digital media, into AtoM, and I am now refining the catalog. One major challenge I am facing is assigning a broadcasting event date to each existing item that represents a broadcast tape—essentially a compilation of different films aired on a specific date.
As a temporary solution, I added the broadcast date (YYYY-MM-DD format) to the finding_aids field in information_object_i18n. This allows me to update other fields with contextual data from another database. However, this approach is not effective for searching by date, so I need a solution to automatically create an event (type 221) and assign the correct date from the finding_aids field. Unfortunately, I haven’t fully understood how AtoM generates these events yet.
Optimizing Derivative CreationI also have some suggestions and code extensions to improve the derivative creation process in AtoM.
Since much of the media I work with consists of uncompressed SD video files, some lasting over 3 hours with file sizes exceeding 100GB, using software-based x264 encoding within AtoM is simply too slow. To address this, I modified QubitDigitalObject.php so that FFmpeg now leverages hardware acceleration wherever possible.
Additionally, I discovered that downloading large files leads to memory overflows, and temporary files aren’t deleted after completion. To mitigate this, I patched the relevant dependencies. While some of these issues have been discussed on GitHub, I now want to contribute patches and open up a discussion about them.
Since I am relatively new to collaborative coding, I would love feedback on my approach.
Future EnhancementsI am working on expanding the derivative creation process by integrating:
OpenAI’s Whisper to generate transcripts for audio and video material.
EasyOCR to extract text from video frames, improving searchability.
My goal is to enhance contextual search within AV media by generating rich-text documents and SRT subtitle files that function as chapter data and closed captions attached to the video files.
Next StepsThe next step in my project is upgrading my AtoM instance to version 2.9.1 and integrating my custom modifications.
Your Thoughts?I would love to hear your feedback and suggestions regarding these ideas and improvements. Do you see any potential challenges or alternatives I should consider?
Sincerely, Marek
Hello,
About half a year has passed, and I missed to read or answer your responses.
Thank you for your previous responses.
I wanted to give you an update on my progress. First of all, we are already using the RAD template, as it was the closest to our needs. Because our data collection is not well structured, I had to import archival descriptions into AtoM via CSV; creating those descriptions was one of the first steps.
I also found a way to import my broadcasting dates through the accession‑record import. In the first step I regenerated all slugs to match my existing identifiers. Then I generated a CSV that created an accession record for each description. Now each description has a matching broadcasting event.
Actually, I am migrating the whole system to another machine running AtoM 2.10. The patch mentioned in a different thread on that board — the one that prevents the site from failing to load after a database import — is already applied, so I shouldn’t encounter the problem others experienced.
For code changes, I have already forked the current branch and will open a discussion if my modifications work out.
I would be pleased if any of this proves interesting to the community.
Best regards,
--
You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/az5iDRH40aI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ica-atom-users/eaa5a789-3a28-49ed-9c58-269d6180350bn%40googlegroups.com.