Strategy for greedy wildcard searches in PRONOM/DROID

19 views
Skip to first unread message

ross-spencer

unread,
Mar 3, 2023, 9:10:43 AM3/3/23
to PRONOM
Hi all - looking for advice on how to counter greedy wildcard searches in DROID/PRONOM.

I have a file that looks as follows:

[a="variable_length"]
[b="variable_length"]
[c="variable_length"]
[d="variable_length"]


On the plus side, all these elements need to appear in order. On the negative side, the value in between the quotation marks can be any length. Worse still, while the format doesn't seem to support repeating elements, some versions in the wild do have repeating elements. 

So, if you have examples of:

[a="variable_length"]
[x="variable_length"]
[b="variable_length"]
[c="variable_length"]
[d="variable_length"]


or:

[a="variable_length"]
[x="variable_length"]
[y="variable_length"]
[z="variable_length"]

// some data...

[a="variable_length"]
[b="variable_length"]
[c="variable_length"]
[d="variable_length"]


And a signature that is overly greedy, e.g.

5B613D22{0-*}225D{0-1}(0D0A|0A)5B623D22{0-*}225D{0-1}(0D0A|0A)5B633D22{0-*}225D{0-1}(0D0A|0A)5B643D22{0-*}225D{0-1}(0D0A|0A)

Annotated:

5B 61 3D 22 <- [a="
{0-*} <- variable length data
22 5D <- closing quote and bracket
{0-1} <- allow for whitespace at end of line
(0D0A|0A) <- allow for differences in line-ending
5B 62 3D 22 <- [b="
{0-*}
22 5D
{0-1}
(0D0A|0A)
5B 63 3D 22 <- [c="
{0-*}
22 5D
{0-1}
(0D0A|0A)
5B 64 3D 22 <- [d="
{0-*}
22 5D
{0-1}
(0D0A|0A)


Then, the well-behaved example works on first blush. But then, so do the poorly formed examples.

Is there a simple solution I am missing here with regard to countering this greedy (or rather not as it's just doing what it says on the tin!) wildcard matching? Ideally the non well-formed files aren't matched as in this instance they can actually be examples of another version of the same format. 

Attached samples and signature file for those that are interested. 

Ross
file2.theory
Theoretical_Signature_v1.xml
file1.theory
file3.theory
Reply all
Reply to author
Forward
0 new messages