manually normalized unzipped bag failed at ingest step

36 views
Skip to first unread message

romain guedj

unread,
Mar 28, 2022, 4:19:08 AM3/28/22
to archivematica
Hi All,

I did not manage to ingest a manually normalized unzipped bag. 

AM did not manage to find the normalized files. If you have successfully import such bag could you share the tree of your bag please ?

I did not find any examples here archivematica-sampledata/SampleTransfers/BagExamples at master · artefactual/archivematica-sampledata (github.com)

Thanks for your help.

Cheers,

Romain 

mathieu....@ohs.org

unread,
Mar 28, 2022, 1:13:48 PM3/28/22
to archivematica
We follow the following structure:

Unzipped Bag
     /data
          /access
          /metadata
          /objects
     bag-info.txt
     bagit.txt
     manifest-sha256.txt
     tagmanifest-sha256.txt

With preservation copies in the /objects folder and manually normalized files in the /access folder.

Guedj Romain

unread,
Mar 29, 2022, 4:15:44 AM3/29/22
to archiv...@googlegroups.com

Hi Mathieu,

 

Thank you for your quick response.

Ok you do not create a directory “manualNormalization” below the packet directory neither a normalization.csv ? something like:

 

     /data

         /manualNormalization

/access

/normalization.csv

         /metadata

         /objects

     bag-info.txt

     bagit.txt

     manifest-sha256.txt

     tagmanifest-sha256.txt


Could you please precise some part of the hierarchy of your unzipped bag ?

We have subfolders and objects under the objects directory. If so, do you reproduce the same structure under the metadata directory or your place only the files below access directory and the path is provided by the normalization.csv file? For example first option:

 

     /data

          /access

                /directory01

                               /image.png

                               /image02.png

                /directory02

                                /directory02-01

                                               Images03.png

          /metadata

          /objects

                /directory01

                               /image.png

                               /image02.png

                /directory02

                                /directory02-01

                                               Images03.png

     bag-info.txt

     bagit.txt

     manifest-sha256.txt

     tagmanifest-sha256.txt

 

Second option :

     /data

          /access

                /image.png

                /image02.png

                /images03.png

          /metadata

          /objects

                /directory01

                               /image.png

                               /image02.png

                /directory02

                                /directory02-01

                                               Images03.png

     bag-info.txt

     bagit.txt

     manifest-sha256.txt

     tagmanifest-sha256.txt

If you follow the second option where the normalization.csv file is placed ?

 

Thank you again for your help.

 

Cheers,

 

Romain

 

De : archiv...@googlegroups.com <archiv...@googlegroups.com> De la part de mathieu....@ohs.org
Envoyé : lundi 28 mars 2022 19:14
À : archivematica <archiv...@googlegroups.com>
Objet : [archivematica] Re: manually normalized unzipped bag failed at ingest step

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/e8d1b97b-5f88-4dad-90ed-53f12bd999ddn%40googlegroups.com.

mathieu....@ohs.org

unread,
Mar 29, 2022, 2:03:41 PM3/29/22
to archivematica
Hi Romain,

We use the DIP upload to AtoM feature, so our bags are more basic than they otherwise could be, that may be why there's a difference between what I showed and the documentation.  We don't make a separate manualNormalization folder or a normalization.csv for that data. I can't say I've done any substantive tests in quite a while, but looking at the documentation It sounds like you can use the normalization.csv to explain the relationships between files within the folder structure, so either of your two scenarios would work, though you might need to put the /access folder in a /manualNormalization folder.  I would assume the normalization.csv would go in the /data directory with the rest of the bag payload.
for example:

/data

          /manualNormalization

               /access

                     /directory01

                               image.png

                               image02.png

                     /directory02

                                /directory02-01

                                               Images03.png

          /metadata

          /objects

                /directory01

                               image.png

                               image02.png

                /directory02

                                /directory02-01

                                               Images03.png

          normalization.csv

     bag-info.txt

     bagit.txt

     manifest-sha256.txt

     tagmanifest-sha256.txt

Hopefully that isn't too unclear. 

romain guedj

unread,
Mar 30, 2022, 8:20:46 AM3/30/22
to archivematica
Hi All,

I give more details about the context of the fail. AM can not find normalized files.

context:
AM  1.12.1

Unzipped bag with following structure :
J:.
│   bag-info.txt
│   manifest-md5.txt
│   processingMCP.xml
│   bagit.txt
│   tagmanifest-md5.txt

├───data
│   └───skip-transfer-directory
│       └───1er Communion CHATEL 2008
│           │   1ere Communion annonce.doc
│           │
│           ├───Reportage
│           │       DSC_0001.jpg
│           │       DSC_0002.jpg
│           │
│           └───Groupe
│                   30x45.jpg
│                   G6SS8038.DCR
│                   20x30.jpg
│                   G6SS8038.psd
│                   G6SS8038.JPG

├───metadata
│       metadata.csv

└───manualNormalization
    ├───access
    │   └───skip-transfer-directory
    │       └───1er_Communion_CHATEL_2008
    │               1ere Communion annonce.pdf
    │
    └───preservation
        └───skip-transfer-directory
            └───1er_Communion_CHATEL_2008
                    1ere Communion annonce.odt

Bag compliance is OK
SIP creation from transfer OK

file from manualNormalization not found:

Stdout:

Module normalize_v1.0

preservation "d8e6f251-e0f1-48e2-befb-dfb5d218e0e2" "/var/archivematica/sharedDirectory/currentlyProcessing/FD-KEHREN-OBERSON-ARCHNUMFR_6932-0128-96292af2-0a92-4753-ae99-7da2330f451c/objects/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce.doc" "/var/archivematica/sharedDirectory/currentlyProcessing/FD-KEHREN-OBERSON-ARCHNUMFR_6932-0128-96292af2-0a92-4753-ae99-7da2330f451c/" "96292af2-0a92-4753-ae99-7da2330f451c" "%taskUUID%" "original"
Standard streams
Standard output (stdout)
File found: d8e6f251-e0f1-48e2-befb-dfb5d218e0e2 %SIPDirectory%objects/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce.doc Checking for a manually normalized file by trying to get the unique file that matches SIP UUID 96292af2-0a92-4753-ae99-7da2330f451c and whose currentlocation value starts with this path: %SIPDirectory%objects/manualNormalization/preservation/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce. No such file found. File format: Word Processing: Microsoft Word : Generic Word Document () Not normalizing 1ere_Communion_annonce.doc - No rule or default rule found to normalize for preservation 



Just for comparision : same files (except names with diachritics characters) without a bag structure > manualized files found and manual normalization works 
 J:.
│   processingMCP.xml

├───metadata
│       metadata.csv

├───manualNormalization
│   ├───access
│   │   └───skip-transfer-directory
│   │       └───1er_Communion_CHATEL_2008
│   │               1ère Communion annonce.pdf
│   │
│   └───preservation
│       └───skip-transfer-directory
│           └───1er_Communion_CHATEL_2008
│                   1ère Communion annonce.odt

└───objects
    └───skip-transfer-directory
        └───1er Communion CHATEL 2008
            │   1ère Communion annonce.doc
            │
            ├───Reportage
            │       DSC_0001.jpg
            │       DSC_0002.jpg
            │
            └───Groupe
                    30x45.jpg
                    G6SS8038.DCR
                    20x30.jpg
                    G6SS8038.psd
                    G6SS8038.JPG

Stdout
Module normalize_v1.0

preservation "f64c1b9b-2660-40e7-94de-7545c4b3b1f1" "/var/archivematica/sharedDirectory/currentlyProcessing/FD-KEHREN-OBERSON-ARCHNUMFR_6932-0128-746d19d4-839c-4b20-9ee4-2a8a1850110b/objects/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce.doc" "/var/archivematica/sharedDirectory/currentlyProcessing/FD-KEHREN-OBERSON-ARCHNUMFR_6932-0128-746d19d4-839c-4b20-9ee4-2a8a1850110b/" "746d19d4-839c-4b20-9ee4-2a8a1850110b" "%taskUUID%" "original"
Standard streams
Standard output (stdout)
File found: f64c1b9b-2660-40e7-94de-7545c4b3b1f1 %SIPDirectory%objects/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce.doc Checking for a manually normalized file by trying to get the unique file that matches SIP UUID 746d19d4-839c-4b20-9ee4-2a8a1850110b and whose currentlocation value starts with this path: %SIPDirectory%objects/manualNormalization/preservation/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce. 1ere_Communion_annonce.doc was already manually normalized into %SIPDirectory%objects/manualNormalization/preservation/skip-transfer-directory/1er_Communion_CHATEL_2008/1ere_Communion_annonce.odt 

I can not figure out why normalized file are found from normal packet and failde from bag. I have tried with normalisation.csv file without any success. 

Thanks again.

Cheers,

Romain
Reply all
Reply to author
Forward
0 new messages