Two PUID numbers for one file

66 views
Skip to first unread message

Tatjana Hajtnik

unread,
Mar 14, 2023, 11:39:40 AM3/14/23
to droid-list
Hello. Please help or explain. With DROID, I wanted to check the format of files with the .tif extension. In doing so, I get two puid numbers for half of the documents (all documents were created in the same digitization process). What does it mean if a file has two puid numbers?

A screenshot of the result I get from DROID is attached.  

Thanks in advance.

Best regards.

Tatjana Hajtnik
   
dr. Tatjana Hajtnik, Assist. Prof.
  Head
   
REPUBLIKA SLOVENIJA
ARHIV REPUBLIKE SLOVENIJE
  Sektor za elektronske arhive in računalniško podporo
   
  Zvezdarska 1, 1102 Ljubljana, Slovenija
T: 01 24 14 224; GSM: 041 495 020

 
   
Printscreen.docx

Dclipsham

unread,
Mar 14, 2023, 11:53:43 AM3/14/23
to droid-list
Hi Tatjana,

The files that are identifying as more than one file format, are identifying by 'Extension' according to the method column. This means that DROID was unable to identify these files according to their internal byte code, so instead relied on the .tif part of the file name, which is associated with a couple of further TIFF variant subtypes.

However, TIFF identification should be relatively straight-forward so this is an unusual occurance. Are you in a position to share any of the files that are identifying as ore than one file format? I appreciate this may not be possible.

David

Tatjana Hajtnik

unread,
Mar 15, 2023, 5:55:55 PM3/15/23
to droid-list
Hi David
thank you very much for your quick response...

it's really strange, I've never met such a case...I'm happy to send examples (files). They are not confidential. However, these are .tiff files and all exceed the size allowed for sending to this system. If you send me your e-mail, I will send files to you or via another channel, e.g. Wetransfer... Would that be ok?

Tatjana

torek, 14. marec 2023 ob 16:53:43 UTC+1 je oseba Dclipsham napisala:

kathryn phelps

unread,
Mar 22, 2023, 10:06:42 AM3/22/23
to droid-list
Hi Tatjana, please do send sample files, our address is PRONOM @ nationalarchives.gov.uk , and I can share them with David as well. Thank you, Kathryn.

Dclipsham

unread,
Mar 27, 2023, 4:29:18 AM3/27/23
to droid-list
Hi Tatjana, thank you for sharing your examples.

Of the examples provided, it appears that for each valid TIFF file in the set, the next file is 'empty'. What this means is that every part of the data the file contains is all set to a byte value of zero, and contains no actual image data of any kind.

When writing a file, the computer will sometime reserve space on the file system by creating such an empty file, before populating the file with the actual data.

It appears that in this case, these files haven't been populated with data at all. Of the files you've share this affects the files ending 00011, 00036, 00044, and 00047.

As these files were created as part of a digitization process, I hope you can return to your digitization supplier to ask them to investigate further. If these files are intended to be images then they will need to be re-scanned as these files themselves have no data that can be repaired or recovered.

I hope this helps, and I wish you the best in resolving this problem.

David


Dclipsham

unread,
Mar 27, 2023, 5:42:29 AM3/27/23
to droid-list
Just to illustrate further, the following image is a side-by-side comparison of two files, as viewed through a hex editor, a tool for looking at the internal byte code that makes up a file.

On the left-hand side is a section of the byte code for image with the filename ending 00010 - in this you see lots of data present - each of the values represents somthing meaningful that an image viewer will interpret as your image. 

On the right-hand side is a section of the byte code for image with the filename ending 00011 - in this case every value is '00' which here indicates that no data exists.

null.png

David

Tatjana Hajtnik

unread,
Apr 4, 2023, 1:29:10 AM4/4/23
to droid-list
Hi

Thank you  very much to all. I checked, the files are indeed empty, even though they appear to have content. We encountered such a case for the first time. We will repeat the digitization.

Best regards from Ljubljana

Tatiana

ponedeljek, 27. marec 2023 ob 11:42:29 UTC+2 je oseba Dclipsham napisala:
Reply all
Reply to author
Forward
0 new messages