Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Assembling PDFs into a PDF document?

119 views
Skip to first unread message

AES

unread,
Sep 2, 2011, 3:32:37 AM9/2/11
to
[I can set about learning how to do the following by myself, but if
anyone has an off-the-cuff answer or pointers to the starting point,
I'll be appreciative.]

Given a large number of single-page PDF image files, all in a single
folder (on a Mac), write a notebook that will build a single multipage
document containing some or all of these files, in accordance with
(and with page order determined by) a list of some or all of these
image file names.

For extra credit: Have each page in the final document bookmarked by
the name of the corresponding file.

Thanks for any assistance.

Bill Rowe

unread,
Sep 3, 2011, 8:05:12 AM9/3/11
to
On 9/2/11 at 3:29 AM, sie...@stanford.edu (AES) wrote:

>Given a large number of single-page PDF image files, all in a single
>folder (on a Mac), write a notebook that will build a single
>multipage document containing some or all of these files, in
>accordance with (and with page order determined by) a list of some
>or all of these image file names.

On a Mac, probably the simplest way to do this would be save
each portion of the notebook representing a single PDF page to a
separate PDF file, open the first page in Preview, display the
thumbnails (cmd 2) then drag the remaining files to the
thumbnail bar.

Drag to re-order however you like and save the result from Preview.

>For extra credit: Have each page in the final document bookmarked
>by the name of the corresponding file.

You can create bookmarks within Preview.

As for the first step, saving the files from Mathematica to PDF,
you could either manually select what is to go on a single page
and use the Save Selection As item in the File menu to save the
selection to PDF format. Alternatively, this step could easily
be automated using Export and and the various Notebook*
functions to select what you want and save them as PDF.

Or you could use the various Notebook* functions (do ?Notebook*
to get a list) to create a notebook with just what you wanted to
save then save the whole thing from Mathematica as PDF file.

But unless you are going to do this often or have a large number
of files you want collected into the final PDF file, using a
manual method with Preview is likely to be considerably less effort.


David Bailey

unread,
Sep 3, 2011, 8:07:15 AM9/3/11
to
I doubt if this is possible, because the Import operation, can extract
raw text or whole pages in the form of images. It is not possible to
extract formatted text - say in the form of a notebook. Maybe that will
change in future versions of Mathematica, but I understand that the PDF
format is rather opaque.

David Bailey
http://www.dbaileyconsultancy.co.uk


Alan

unread,
Sep 3, 2011, 8:09:18 AM9/3/11
to

dr DanW

unread,
Sep 3, 2011, 8:11:20 AM9/3/11
to
Step 1: be grateful you own a Mac. PDF's are so easy to manipulate on a Mac.

This is one of those rare circumstances where I don't think Mathematica is your best solution. Preview exports actions to Automator for compiling pages into a PDF and for watermarking.

Since this is a Mathematica forum, I have to ask the group: Can Mathematica be used to script Automator or Applescript actions?

Daniel

AES

unread,
Sep 3, 2011, 5:57:25 PM9/3/11
to
Original query:

> Given a large number of single-page PDF image files, all in a single
> folder (on a Mac), write a notebook that will build a single multipage
> document containing some or all of these files, in accordance with
> (and with page order determined by) a list of some or all of these
> image file names.
>
> For extra credit: Have each page in the final document bookmarked by
> the name of the corresponding file.

In article <j3t5h8$58o$1...@smc.vnet.net>,
dr DanW <dmaxw...@gmail.com> wrote:

> This is one of those rare circumstances where I don't think Mathematica is
> your best solution. Preview exports actions to Automator for compiling pages
> into a PDF and for watermarking.

I guess I'm surprised at the answers I've gotten to this query -- thus
far, anyway -- which basically say, "You gotta do this by hand". All
I want to do is, in essence, import a bunch of files; concatenate 'em
(without in any way opening, "reading" or in any way processing them);
and re-export the concatenated file.

What I failed to understand, I suppose, is that one doesn't just
"concatenate" PDF files in this fashion. Text files, yes; image
files, probably; but PDF files, no. (Or can one, in fact, do this with
PDF files, without converting each PDF file to a .jpg or .png file? --
maybe at the cost of a bulkier final document?)

The Automator suggestion is interesting. I've played with it a bit;
found it powerful but quirky; and not a particularly fun language to
program in -- partly because it's hard to follow just what it's doing,
step by step, partly because it's not well documented.

But suppose your workflow involves generating and saving a large
number of one-page PDF files -- each file a spec sheets or catalog
page for one of the products or commercial items that you sell, for
example.

Every so often you edit a master list of the PDF files for those
products that you currently sell (a list of the file names, that is),
removing obsolete items, adding new ones. Then you hand this list to
a Mathematica notebook, which builds an updated multipage catalog of
all your current products. This can't be done . . . ???

As I'm typing, I thinking: Hey, I'm quite sure I can do this in TeX,
and quite easily in fact. I'll give that a try.

Themis Matsoukas

unread,
Sep 3, 2011, 5:56:24 PM9/3/11
to
Hi AES,

This solution cheats because it uses latex to do the real job. However, it does use mathematica to assemble the latex code. To use it:

1. place your pdf's in a subdirectory of the directory that contains the notebook (avoid file names that confuse latex)
2. supply the name of the subdirectory in the notebook in the variable pdfsubdirectory (in this example the directory is called "pdf_files")
3. Execute the notebook.

This creates a latex file in the current directory that uses the package pdfpages. Typeset it and you will get the assembled pdf. I suppose one should be able to send the typesetting command directly from Mathematica to the shell to fully automate the process, but this job is for another volunteer :-)

Themis

(*specify name of subdirectory that contains the pdf files*)

pdfsubdirectory = "pdf_files";
thisdirectory = SetDirectory@NotebookDirectory[];

(*read in list of pdfs*)

mypdffiles = FileNames["*.pdf", pdfsubdirectory];
numberoffiles = Length[mypdffiles];

(*build latex code*)

latexcode =
StringJoin @@
Join[{"\\documentclass[11pt]{article}\n\\usepackage{pdfpages}\n\
\\begin{document}\n"},
Table[StringJoin["\\includepdf[pages=-]{",
ToString[mypdffiles[[i]]], "}\n"], {i, 1,
numberoffiles}], {"\\end{document}"}];
Export["assemblepdf.tex", latexcode, "Text"]

JUN

unread,
Sep 3, 2011, 5:58:27 PM9/3/11
to

I think the only reason this may be worth doing in Mathematica is that
the sorting and selection of the input files perhaps requires some
nontrivial processing. Assuming you have already solved that problem,
here is a simple way to proceed.

Calling this a Mathematica solution is perhaps cheating, but here it
is anyway:

coalescePDF[inputFiles_?ListQ, outputFile_?StringQ] :=
If[FileExistsQ[outputFile],
Print[outputFile <>
" already exists and has not been overwritten"],
Run["/System/Library/Automator/Combine\\ PDF\\ \
Pages.action/Contents/Resources/join.py --output " <> outputFile <>
" " <> StringJoin[Riffle[inputFiles, " "]]]
]

The input filenames are passed to this function as strings in the list
inputFiles, and the desired output file name is the second argument.
So you would call it like this:

coalescePDF[{"picture1.pdf","picture2.pdf"}, "output.pdf"]

Just in case the line breaks are ambiguous when this gets posted, make
sure the string in the path name reads as one line containing
"Combine\\ PDF\\ Pages.action"
That's an Automator action that comes with OS X, so it's perhaps
better than using LaTeX or ghostscript etc., although I'm sure you
have those installed too.

Regards,
Jens


Bill Rowe

unread,
Sep 3, 2011, 5:59:58 PM9/3/11
to
On 9/3/11 at 8:05 AM, dmaxw...@gmail.com (dr DanW) wrote:

>Since this is a Mathematica forum, I have to ask the group: Can
>Mathematica be used to script Automator or Applescript actions?

Yes. You could actually run an Applescript from Mathematica.
There are unix utilities that allow you to call Applescript from
them. And with Mathematica's Run function, you can call up any
unix script from Mathematica.

Additionally, with the current Mac OS, you can run Python, Perl,
Ruby etc., pretty much any scripting language you like. With the
command line interface to Mathematica, you can do just about
anything you like with respect to either scripting Mathematica
or running scripts from Mathematica.

But it is good to keep in mind even though Mathematica is a very
powerful tool, Mathematica is not the most efficient tool for
everything you might want to do with a computer.


AES

unread,
Sep 4, 2011, 4:14:14 AM9/4/11
to
In article <j3u7q8$a7p$1...@smc.vnet.net>,
Themis Matsoukas <tmats...@me.com> wrote:

> Hi AES,
>
> This solution cheats because it uses latex to do the real job. However, it
> does use mathematica to assemble the latex code. To use it:

Thanks very much -- but here's an even simpler way, using just Plain
TeX and TeXShop, without needing to bring Mathematica into the picture
at all.

% To insert a centered PDF image in TeXShop

\pageinsert
\null \vfill
\centerline{
\pdfximage
width xx in {my_pdf_file_name.pdf}
\pdfrefximage
\pdflastximage }
\vfill
\endinsert

where xx is the width in inches you want the PDF image to occupy on
the page.

Just write a TeX preamble that sets the various pdf page size,shape
and margin parameters, then insert a bunch of these \pageinserts, one
per file.

Better yet, macro-ize the above coding, then call the macro repeatedly
on the list of file names.

I've just checked this on a simple three-PDF example. Each PDF file
in the three-page output document seems to have been captured with
full vector coding of the image preserved; each page can be
individually opened and edited in Illustrator if one wants to.

TeXShop and complete installation of TeX Live of course available as
MacTeX from TUG; be sure to join TUG to support this.

Bill Rowe

unread,
Sep 4, 2011, 4:15:15 AM9/4/11
to
On 9/3/11 at 5:55 PM, sie...@stanford.edu (AES) wrote:

>I guess I'm surprised at the answers I've gotten to this query --
>thus far, anyway -- which basically say, "You gotta do this by
>hand".

You have mis-interpreted the responses you have received. It
isn't "you gotta do this by hand". Instead, people are telling
you unless you are doing this often or have a very large number
of files, it is easier/more efficient to do this by hand.

>All I want to do is, in essence, import a bunch of files; concatenate
>'em (without in any way opening, "reading" or in any way processing
>them); and re-export the concatenated file.

There are a variety of third party apps available for the Mac
that will do just this. Many are free or minimal cost.


AES

unread,
Sep 4, 2011, 6:08:13 PM9/4/11
to
In article <j3vc2j$ek0$1...@smc.vnet.net>,
Bill Rowe <read...@sbcglobal.net> wrote:

All right -- I guess what's surprised me is that no one so far has
come up with a few lines of simple and straightforward Mathematica
coding that can do this job simply and quickly.

After all, one is supposed to be able to carry out one's _entire_ work
flow of analysis, calculation, _and publication_ (including the
inclusion of externally generated or provided content), entirely in
Mathematica -- is that not the mantra?

I just want to make a Mathematica-generated publication, to be
exported in PDF format, that will actually have almost no Mathematica
generated content -- maybe a title page or ToC -- but include a lot of
externally generated content, in the form of PDF files.

As I eventually realized, Plain TeX can do this easily.

DrMajorBob

unread,
Sep 5, 2011, 7:07:54 AM9/5/11
to
> After all, one is supposed to be able to carry out one's _entire_ work
> flow of analysis, calculation, _and publication_ (including the
> inclusion of externally generated or provided content), entirely in
> Mathematica -- is that not the mantra?

No mantras here, my friend. None.

Bobby

On Sun, 04 Sep 2011 17:05:48 -0500, AES <sie...@stanford.edu> wrote:

> In article <j3vc2j$ek0$1...@smc.vnet.net>,
> Bill Rowe <read...@sbcglobal.net> wrote:
>

> All right -- I guess what's surprised me is that no one so far has
> come up with a few lines of simple and straightforward Mathematica
> coding that can do this job simply and quickly.
>
> After all, one is supposed to be able to carry out one's _entire_ work
> flow of analysis, calculation, _and publication_ (including the
> inclusion of externally generated or provided content), entirely in
> Mathematica -- is that not the mantra?
>
> I just want to make a Mathematica-generated publication, to be
> exported in PDF format, that will actually have almost no Mathematica
> generated content -- maybe a title page or ToC -- but include a lot of
> externally generated content, in the form of PDF files.
>
> As I eventually realized, Plain TeX can do this easily.
>


--
DrMaj...@yahoo.com

Armand Tamzarian

unread,
Sep 5, 2011, 7:17:28 AM9/5/11
to
On Sep 3, 10:11 pm, dr DanW <dmaxwar...@gmail.com> wrote:
> Step 1: be grateful you own a Mac. PDF's are so easy to manipulate on =
a Mac.
>
> This is one of those rare circumstances where I don't think Mathematica i=
s your best solution. Preview exports actions to Automator for compiling=

pages into a PDF and for watermarking.
>
> Since this is a Mathematica forum, I have to ask the group: Can Mathemati=

ca be used to script Automator or Applescript actions?
>
> Daniel

There is this:

http://library.wolfram.com/infocenter/MathSource/5688/

which may help. There is also a services package as well. I haven't
used either of them for many years and recall that an update was
needed for the graphics rendering in services (for better resolution).

Mike

Bill Rowe

unread,
Sep 5, 2011, 7:23:45 AM9/5/11
to
On 9/4/11 at 6:05 PM, sie...@stanford.edu (AES) wrote:

>In article <j3vc2j$ek0$1...@smc.vnet.net>,
>Bill Rowe <read...@sbcglobal.net> wrote:

>>>All I want to do is, in essence, import a bunch of files;
>>>concatenate 'em (without in any way opening, "reading" or in any
>>>way processing them); and re-export the concatenated file.

>>There are a variety of third party apps available for the Mac that
>>will do just this. Many are free or minimal cost.

>All right -- I guess what's surprised me is that no one so far has
>come up with a few lines of simple and straightforward Mathematica
>coding that can do this job simply and quickly.

Why should this be surprising? While Mathematica is a very
powerful tool, it is not the best tool for all purposes.

>After all, one is supposed to be able to carry out one's _entire_
>work flow of analysis, calculation, _and publication_ (including the
>inclusion of externally generated or provided content), entirely in
>Mathematica -- is that not the mantra?

That may be the mantra of some. But it certainly isn't mine.

>I just want to make a Mathematica-generated publication, to be
>exported in PDF format, that will actually have almost no
>Mathematica generated content -- maybe a title page or ToC -- but
>include a lot of externally generated content, in the form of PDF
>files.

>As I eventually realized, Plain TeX can do this easily.

Recall, Mathematica's Run function. That is you can invoke
command line TeX tools from Mathematica. So, anything you can do
with TeX could be done from within Mathematica if you like. In
fact, since Mathematica can be seen as a general purpose
programming language, you could actually create command line TeX
tools from Mathematica if you wanted. In principle, anything you
want to do on a computer could be done within Mathematica. But
this is hardly an efficient or easy way to accomplish many tasks
you are likely to want to do with a computer.


Themis Matsoukas

unread,
Sep 6, 2011, 4:09:24 AM9/6/11
to
> Thanks very much -- but here's an even simpler way,
> using just Plain
> TeX and TeXShop, without needing to bring Mathematica
> into the picture
> at all.

I thought that the whole point of the challenge was to use Mathematica! :-)

>
> % To insert a centered PDF image in TeXShop
>
> \pageinsert
> \null \vfill
> \centerline{
> \pdfximage
> width xx in {my_pdf_file_name.pdf}
> \pdfrefximage
> \pdflastximage }
> \vfill
> \endinsert
>
> where xx is the width in inches you want the PDF
> image to occupy on
> the page.
>
> Just write a TeX preamble that sets the various pdf
> page size,shape
> and margin parameters, then insert a bunch of these
> \pageinserts, one
> per file.
>

But don't you have to code the list of files manually? The notebook I posted produces a latex file in which the list of files, however long, is processed by mathematica (see example below). You can probably adapt it to the plain TeX version.

\documentclass[11pt]{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages=-]{pdf_files/fig2_ex_BWR.pdf}
\includepdf[pages=-]{pdf_files/fig2_interpolation.pdf}
\includepdf[pages=-]{pdf_files/mathematica.pdf}
\includepdf[pages=-]{pdf_files/solution.pdf}
\end{document}

As a side comment, pdfpages allows you to include pdf files with multiple pages and even to choose which pages to include. My example includes all pages.

> TeXShop and complete installation of TeX Live of
> course available as
> MacTeX from TUG; be sure to join TUG to support this.
>

There is also a support group dedicated to TeX on Mac: http://email.esm.psu.edu/mailman/listinfo/macosx-tex

Themis

Ulrich Arndt

unread,
Sep 7, 2011, 5:43:37 AM9/7/11
to
Hi,

this might work if the PDFs will always smaller than a page.

1. generate some sample PDF files - sin plots

dir = NotebookDirectory[];
t = 20;
d = Table[Plot[Sin[a x], {x, 1, 20}], {a, 1, t}];
Export[FileNameJoin[{dir,
"Sin[" <> ToString[NumberForm[#, 2, NumberPadding -> "0"]] <>
"t].pdf"}], d[[#]]] & /@ Range[t]

2. Get file name list

files = FileNames[dir <> "Sin*.pdf"];

3. Generate Notebook - (maybe bruce force but seems to work)
Flatten is used to get a final list of Cell objects.
It creates a title page and one section per file. Filename is used for the section name and at the end of each Section a page break is added..


l = Flatten[{{TextCell["My Title for this Doc", "Title"],
Cell["", "PageBreak",
PageBreakBelow ->
True]}, {TextCell[FileNameTake[files[[#]]], "Section"],
ExpressionCell[Import[files[[#]]][[1]], "Input"],
Cell["", "PageBreak", PageBreakBelow -> True]} & /@
Range@Length[files]}];
nb = CreateDocument[l, Visible -> False, NotebookFileName -> "test"]

4. Save Notebook as PDF

Export[dir <> "test.pdf", nb]

5. Clean up test files
DeleteFile[dir <> "test.nb"]
DeleteFile[files]

Maybe not 100% what you looking for but would avoid other tools.

Ulrich

--
www.data2knowledge.de

0 new messages