Ingest rules: content model and PHP function

55 views
Skip to first unread message

Serhiy Polyakov

unread,
May 1, 2012, 2:38:12 AM5/1/12
to island...@googlegroups.com
I am working with ingest rules defined in a content model. For
example, PDF CM has this rule:

<ingest_method class="PDFManipulation" dsid="TN" file="pdf_sp.inc"
method="createThumbnailFromPDF" modified_files_ext="jpg"
module="islandora_pdf_sp">

Then header of the function in pdf_sp.inc is this:
function createThumbnailFromPDF($parameterArray, $dsid, $file, $file_ext)

I defined my own function myManipulation for manipulations on my files
and call it in my content model. I am guessing that I cannot change
number of parameters in that function even I defined my new function.
So I cannot have:
function myManipulation($dsid, $file, $file_ext)

Question 1. Is it correct?

Question 2. To which directory files get uploaded (Linux) and from
where they are taken for processing? For example in
createThumbnailFromPDF processing line looks like:

$cmdline = "convert \"$file\"\[0\] -colorspace RGB -thumbnail " .
$width . "x" . $height . " \"$file$file_suffix\"";

So does $file value include path beside filename? Is it some temp
directory? Do files get deleted after uploading and processing?

I see there are some files I uploaded to the repository are left in
this directory:
/var/www/myDrupal/sites/default/files

Question 3.
At the end of createThumbnailFromPDF function there is this a line:
$_SESSION['fedora_ingest_files']["$dsid"] = $file . $file_suffix;

How can I alter this part to create minimal new Fedora Object instead
of just datastream for the existing object? I have calculated PID for
the new object.

Thanks you,
Serhiy

Alan Stanley

unread,
May 1, 2012, 7:35:16 AM5/1/12
to island...@googlegroups.com
You are correct about the method signature.  Then method specified in your content model is executed by the execIngestRules in the ContentModel class so the method signature is locked down, but you can pass in null for the params if they are not used.

On a standard system files will come from sites/all/files/  This isn't always the case, but the files will always be in the directory returned by file_directory_path().  You can see exactly what is going on by looking in fedora_repository_ingest_form_validate() in the fedora_repository.module file.

The quick answer to question is three is - you can't get what you ant by altering that line, but you can do what you want to do.  When files are added to the fedora_ingest_files array in the $_SESSION scope they go through a particular workflow which involves adding that file to the fedora object created at ingest time, and then deleting that file.  At this point in the creation process you can create any derivative you like to be added to the object created at ingest time, or none at all.  If I'm understanding you correctly you wish to create a second object while the first is being created. You can create a fedora object here in the usual way with  

 $my_item = Fedora_Item::ingest_new_item($my_calculated_pid, 'A', $my_object_label);

You'd then add whatever datastreams you wanted to this object before continuing with your myManipulation function.

I hope this is what you need - let me know if there's anything I can clarify for you.

- Alan
   Alan Stanley
   Lead Developer, Islandora Project
   Robertson Library, UPEI
   550 University Avenue, Charlottetown
   PE,  C1A 4P3



Serhiy Polyakov

unread,
May 1, 2012, 11:53:37 AM5/1/12
to island...@googlegroups.com
Alan,

Your answer is very helpful.

I decided to model my desired behavior as close as possible on PDF
content model first and then move to implementing my specifics. I am
basically creating derivative object while creating main object. I
think I have everything to implement this. However, if you have any
comments please let me know.

So, instead of generating TN for PDF and adding TN as a datastream to
the same Fedora Object I will do:
-create main Fedora Object (FO1) with PID1
-create new derivative FO2 with calculated PID2
-add PDF datastream to FO1
-generate TN from PDF
-add TN to FO2
-add to RELS-EXT of the FO2:
<fedora:isMemberOf rdf:resource="info:fedora/PID1"></fedora:isMemberOf>
-remove TN file from the file system

Order of steps may be slightly different.

Thanks,
Serhiy

Alan Stanley

unread,
May 2, 2012, 8:00:24 AM5/2/12
to island...@googlegroups.com
You have a potential problem here.  You need the PID from F01 to build the RELS-EXT stream for f02, but F01 does not yet exist, so no pid is available.  

There would be a few of ways of dealing with this (there always is).  I'd use a hook_form_alter within my custom module to add an additional submit handler to the fedora_repository_ingest_form, and I'd build my additional object there.  By the time the first submit handler has done its work the pid for the new object will be in $form_state['values']['pid'].

Serhiy Polyakov

unread,
May 5, 2012, 1:53:30 AM5/5/12
to island...@googlegroups.com
Alan,

I see that I have to create main object first and then create
derivative objects.

I am confused with the order of all functions. Just to summarize. In
my simple example I want to generate TN from PDF like in PDF solution
pack but instead of adding TN as a datastream I want to create
derivative object and add TN there.

(1)
TN file is created in PDF solution pack but remove the last part:
//$_SESSION['fedora_ingest_files']["$dsid"] = $file . $file_suffix;

(2)
In my module I add my hook function and build my derived object, right?:

function my_hook_form_alter(&$form, &$form_state, $form_id){
$my_pid = $form_state['values']['pid'];
$deriv_object = Fedora_Item::ingest_new_item($my_pid, 'A',
'Label_for_derived_object');
$deriv_item->add_datastream_from_file($file . $file_suffix, 'TN',
'Label_for_TN_datastream', '', 'M');
}

(3)
Where do I call my_hook_form_alter function?

(4)
In fedora_repository.module I add additional submit handler to the
function fedora_repository_ingest_form? So is it like this?:

$ingestForm = new formClass();
$form_state['storage']['content_model'] = $content_model;
$form_state['storage']['collection_pid'] = $collection_pid;
//My handler:
$form_state['storage']['???'] = ???;
return $ingestForm->createIngestForm($collection_pid,
$collection_label, $form_state);

(5)
I delete my derived TN file at the end of my module.
file_delete($file . $file_suffix);


Thank you for additional clarification,
Serhiy





On Wed, May 2, 2012 at 7:00 AM, Alan Stanley
Reply all
Reply to author
Forward
0 new messages