Normalization for multiple formats at once

31 views
Skip to first unread message

Joseph Anderson

unread,
Dec 10, 2020, 3:01:00 PM12/10/20
to archivematica
Hello,

I have a question about creating a new normalization rule. I want to create thumbnails for PDF's, which I understand how to do using a convert command. However, my question is regarding the fact that there are several dozen different types of PDF's that are listed in formats. Is there any way around creating a new rule for each of these dozens of PDF's? 

I see there's listed in the formats 'Generic PDF'. Would making a rule for just this one encompass all the other types when being normalized? If not, is there some sort of batch process, or api command I could use to create multiple rules. Creating 24 rules on the web interface seems like a cumbersome proposition.

Thanks for any help!

Joseph Anderson

Ross Spencer

unread,
Dec 11, 2020, 2:57:22 PM12/11/20
to archivematica
it's a good question and understandable use-case Joseph. There is only one idiomatic (not necessarily convenient) way to achieve this which I know of which is Format identification by Extension will return that generic PDF for you to associate your command with. This might be more useful with separate pipelines/workflows that promote FPR level distinctions. i.e. to make up for the inability to modify FPR on the fly.

This impacts format identification globally however so you wouldn't have a PUID against those files in future. Similarly, there is a way to achieve something like this customizing Siegfired/Roy but not without this impact too. 

From a dev perspective, you could write a Django database migration to write the data needed for this many-to-many situation. I'd be interested to know if anyone has explored this on the list, but I don't know of any and haven't done it myself. In a different context, we've used a database injection to disable rules such as FITS. I could imagine a generic pattern that could be used for other FPR like input/modifications. 

Maybe there's some food for thought there? Either way, hope you find a good balance for your use-case and I'll be interested to hear how it goes and what others say. 

Best,
Ross
Reply all
Reply to author
Forward
0 new messages