Book batch create_pdfs doesn't do what I expect

51 views
Skip to first unread message

Brandon Weigel

unread,
May 11, 2017, 5:53:21 PM5/11/17
to islandora
I'm testing out a book batch ingest, and what we want is to create a 72 DPI PDF of the entire book, automatically as part of the batch. 

I'd have figured that the --create_pdfs parameter in the drush command would do that for me, but instead it just creates PDFs of the individual pages, leaving me to go to each book, manage, and create the full-book PDF myself.

I don't see any settings in drush help that would do this for me. Is such a thing possible in book batch?

Peter MacDonald

unread,
May 12, 2017, 9:54:10 AM5/12/17
to isla...@googlegroups.com
Brandon:

I've noticed myself that when using the Newspaper Batch Ingest module to ingest zip files of newspaper issues from the command line, that the --create_pdfs option will only create PDF datastreams on the page-level objects, but NOT create an aggregated PDF for the issue-level objects, unless I have remembered to check "PDF datastream" in the Newspaper SP module config menu. If it is not checked there, you will not get an aggregated PDF, even if you use the command line option --create_pdfs.

This may be the same with the Book Batch Ingest command-line use of --create_pdfs too. The Book SP config menu also have the "PDF datastream" box to check off (admin/islandora/solution_pack_config/book)

Peter MacDonald


--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/833c2f04-a59e-4fbe-92e1-2fa97f21a44b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Peter MacDonald,
Library Information Systems Specialist
Hamilton College Library
Clinton, New York
315 859-4493
pmacdona-hamilton (Skype)

Brandon Weigel

unread,
May 12, 2017, 1:23:59 PM5/12/17
to islandora
Thanks, Peter. It looks like there's a bit of a documentation problem for both of these.

From my experience, my understanding of the "PDF Datastream" checkbox, for Newspapers, is that this creates page-level PDFs. If it's not checked off, page-level PDFs are not created. Issue-level PDFs are not created in either case. (I don't think I've tested the --create_pdfs parameter for Newspaper batch.)

I REALLY don't want page-level PDFs to be created, so I don't want to check off the "PDF Datastream" option.

Does anyone have a definitive answer for how exactly the Newspaper and Book batch modules are supposed to work, viz creating issue/book-level PDFs?

On Friday, 12 May 2017 06:54:10 UTC-7, Peter MacDonald wrote:
Brandon:

I've noticed myself that when using the Newspaper Batch Ingest module to ingest zip files of newspaper issues from the command line, that the --create_pdfs option will only create PDF datastreams on the page-level objects, but NOT create an aggregated PDF for the issue-level objects, unless I have remembered to check "PDF datastream" in the Newspaper SP module config menu. If it is not checked there, you will not get an aggregated PDF, even if you use the command line option --create_pdfs.

This may be the same with the Book Batch Ingest command-line use of --create_pdfs too. The Book SP config menu also have the "PDF datastream" box to check off (admin/islandora/solution_pack_config/book)

Peter MacDonald

On Thu, May 11, 2017 at 5:53 PM, Brandon Weigel <jeanpau...@gmail.com> wrote:
I'm testing out a book batch ingest, and what we want is to create a 72 DPI PDF of the entire book, automatically as part of the batch. 

I'd have figured that the --create_pdfs parameter in the drush command would do that for me, but instead it just creates PDFs of the individual pages, leaving me to go to each book, manage, and create the full-book PDF myself.

I don't see any settings in drush help that would do this for me. Is such a thing possible in book batch?

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.

Brandon Weigel

unread,
May 12, 2017, 5:02:35 PM5/12/17
to islandora
Following up after more testing...

Does --create_pdfs actually do anything? Here are my tests:

1. "PDF Datastream" turned on, --create_pdfs used
Result: PDF datastream on pages, no PDF on Book

2. "PDF Datastream" turned on, did not use --create_pdfs
Result: PDF datastream on pages, no PDF on Book

3. Turned off "PDF Datastream", --create_pdfs used
Result: No PDF datastream on pages, no PDF on Book

4. Turned off "PDF Datastream", did not use --create_pdfs
Result: No PDF datastream on pages, no PDF on Book

Peter MacDonald

unread,
May 13, 2017, 8:19:43 AM5/13/17
to isla...@googlegroups.com
Brandon

Would you mind posting your entire drush command here? I've found, for example, that if you have both --create_pdfs and --aggregate_OCR as options in the drush command (at least of a newspaper batch preprocess job),  the issue-level PDFs don't get created properly. That is, the 5 issue-level objects get PDFs and then 2 are skipped, then 5 more get PDFs and then 2 are skipped, and so on until the end of the zip ingest job.

Also, as you mentioned, Islandora add a PDF datastream to every page-level object, which we might not need, so I used to run a batch job after the ingest is done to delete all page-level PDFs -- this saved tons of disk space over a long-run newspaper.

Which brings up the last issue I'd like to mention. The PDFs Islandora creates are huge.

For all these reasons, whenever I was an issue-level PDF I create my own optimized PDFs offline and package them with the zip file. This avoid all the problems mentioned above.

Peter MacDonald

To unsubscribe from this group and stop receiving emails from it, send an email to islandora+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Brad Spry

unread,
May 14, 2017, 5:20:34 AM5/14/17
to islandora
Brandon,

I can confirm your experience, submitted an improvement-tagged ticket late last year:
https://jira.duraspace.org/browse/ISLANDORA-1633

Brad

Brandon Weigel

unread,
May 17, 2017, 1:26:26 PM5/17/17
to islandora
Thanks, Brad. I added some comments to the ticket. Hope it gets more attention.

Peter: here's some code..

$ drush -v -u 1 --uri=http://localhost islandora_book_batch_preprocess --type=zip --target=/home/brandonw/stagedata/unbc_k-p.zip --namespace=unbc --aggregate_ocr --create_pdfs --parent=unbc:nbcdcbooks


But I've tried with and without --aggregate_ocr, with and without --create_pdfs; result is always the same. --create_pdfs appears to do nothing at all; the creation of page-level PDFs is determined solely by whether "PDF Datastream" is checked in the Book Solution Pack config menu.

And of course, there is no command-line method to generate Book-level PDFs at all.

Mark Jordan

unread,
May 17, 2017, 1:37:14 PM5/17/17
to isla...@googlegroups.com
No problem,

Mark

Mark Jordan

unread,
May 17, 2017, 1:39:03 PM5/17/17
to isla...@googlegroups.com
Duh, sorry everyone, I apparently haven't learned to use my phone to reply to the correct message.

Mark

Reply all
Reply to author
Forward
0 new messages