Code4Lib article: Fractal in detail: What information is in a file format identification report?

30 views
Skip to first unread message

ross-spencer

unread,
May 11, 2022, 3:43:42 AM5/11/22
to droid-list
Folks on the list might be interested in this article I wrote up for the recent Code4Lib journal. It's somewhat of an Ode to the DROID report, and what can be done with one in an archival workflow. As well as providing a detailed breakdown of the report, I touch upon works using the report such as that done by Paul at TNA among others. 
Best,
Ross

Matt Palmer

unread,
May 13, 2022, 3:55:12 AM5/13/22
to droid-list
Hi Ross,

Nice article!

I like the idea of returning how many bytes were scanned to make an identification.   That would certainly help with tuning DROID for particular use cases to get the best performance/identification trade off.   It would also help to probe the behaviour of new signatures being developed.

The only  way of doing this now would be to run multiple profiles on the same files with different max byte settings and then carefully compare their output to see what fails to match as you decrease the max bytes.  We did actually do this once when the feature of limiting the bytes to scan was originally introduced, and is how the default of 64k was agreed, but it was a lot of work and not very efficient.

cheers,

Matt.
Reply all
Reply to author
Forward
0 new messages