embed Sage worksheets into PDFs?

29 views
Skip to first unread message

Dan Drake

unread,
Dec 18, 2008, 1:05:41 AM12/18/08
to sage-...@googlegroups.com
This is a response to an idea that William mentioned recently [1]. He
asked if it's possible to embed a Sage worksheet into a PDF so that one
could upload the PDF to a Sage notebook server, which would then extract
the worksheet and let you edit it.

Yes, it's possible! Take a look at:

http://sage.math.washington.edu/home/drake/foo.pdf

(and related files in my sage.math home directory). You can use pdftk
(or Acrobat) to extract a worksheet from that pdf.

This complements Rob Beezer's recent ideas about making LaTeX and Sage
work together.

I think it would be great if we could get the notebook server to do this
to uploaded PDFs, but I thought I would ask what people think first.
Also, I don't know enough about the notebook server to add this
functionality and don't know how hard it might be.

Comments? Ideas?

Dan

1. http://groups.google.com/group/sage-support/msg/3ea7ed2eeab0824a
--
--- Dan Drake <dr...@kaist.edu>
----- KAIST Department of Mathematical Sciences
------- http://mathsci.kaist.ac.kr/~drake

signature.asc

Rob Beezer

unread,
Dec 18, 2008, 2:12:49 AM12/18/08
to sage-devel
Dan,

Very nice!

The pdftk route worked fine for me. I'll add that KPDF (KDE's pdf
viewer) falls into the "scant support" category. Not much of a
surprise there.

Rob

On Dec 17, 10:05 pm, Dan Drake <dr...@kaist.edu> wrote:
> This is a response to an idea that William mentioned recently [1]. He
> asked if it's possible to embed a Sage worksheet into a PDF so that one
> could upload the PDF to a Sage notebook server, which would then extract
> the worksheet and let you edit it.
>
> Yes, it's possible! Take a look at:
>
> http://sage.math.washington.edu/home/drake/foo.pdf
>
> (and related files in my sage.math home directory). You can use pdftk
> (or Acrobat) to extract a worksheet from that pdf.
>
> This complements Rob Beezer's recent ideas about making LaTeX and Sage
> work together.
>
> I think it would be great if we could get the notebook server to do this
> to uploaded PDFs, but I thought I would ask what people think first.
> Also, I don't know enough about the notebook server to add this
> functionality and don't know how hard it might be.
>
> Comments? Ideas?
>
> Dan
>
>   1.http://groups.google.com/group/sage-support/msg/3ea7ed2eeab0824a
> --
> ---  Dan Drake <dr...@kaist.edu>
> -----  KAIST Department of Mathematical Sciences
> -------  http://mathsci.kaist.ac.kr/~drake
>
>  signature.asc
> < 1KViewDownload

Jason Grout

unread,
Dec 18, 2008, 3:15:39 AM12/18/08
to sage-...@googlegroups.com
Dan Drake wrote:
> This is a response to an idea that William mentioned recently [1]. He
> asked if it's possible to embed a Sage worksheet into a PDF so that one
> could upload the PDF to a Sage notebook server, which would then extract
> the worksheet and let you edit it.
>
> Yes, it's possible! Take a look at:
>
> http://sage.math.washington.edu/home/drake/foo.pdf
>
> (and related files in my sage.math home directory). You can use pdftk
> (or Acrobat) to extract a worksheet from that pdf.
>

I can also use PDFMiner to extract the worksheet. The nice thing is
that PDFMiner is a pure python script. See
http://www.unixuser.org/~euske/python/pdfminer/

This extracts the sage worksheet from the above pdf:


python -m tools.dumppdf -i20 -b foo.pdf > embedded-worksheet.sws

The -b means binary mode, the -i20 specifies that we should extract the
content of object 20 in the pdf file. We know it is object 20 by
looking at object 23. Here is the output from object 23, interspersed
with my comments (lines starting "#")


$ python -m tools.dumppdf -i23 foo.pdf
# This is a xml representation of a dictionary, where each key is
# followed by its value.

<dict size="9">
<key>C</key>
<value><list size="3">
<number>1</number>
<number>0.9255</number>
<number>0.7765</number>
</list></value>

# Here is where we find the file information. FS = "File Specification"
<key>FS</key>
<value><dict size="3">

# EF = Embedded File
<key>EF</key>
<value><dict size="1">
# F = File (in this case, it's an internal reference)
<key>F</key>
<value><ref id="20"/></value>
</dict></value>
<key>Type</key>
<value><literal>Filespec</literal></value>
# F = File; it's the filename
<key>F</key>
<value><string size="22">embedded-worksheet.sws</string></value>
</dict></value>
<key>Name</key>
<value><literal>PushPin</literal></value>
<key>AP</key>
<value><dict size="3">
<key>R</key>
<value><ref id="21"/></value>
<key>D</key>
<value><ref id="21"/></value>
<key>N</key>
<value><ref id="21"/></value>
</dict></value>
<key>F</key>
<value><number>4</number></value>
<key>M</key>
<value><string size="16">D:20081218143900</string></value>

# We should scan for this key/value pair. This tells us that this
# object contains info for a file attachment.
<key>Subtype</key>
<value><literal>FileAttachment</literal></value>
<key>Type</key>
<value><literal>Annot</literal></value>
<key>Rect</key>
<value><list size="4">
<number>253.98</number>
<number>216.522</number>
<number>277.98</number>
<number>230.522</number>
</list></value>
</dict>


See p. 683 of the PDF 1.7 spec
(http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf from
http://www.adobe.com/devnet/pdf/pdf_reference_archive.html)

So basically, it looks like we need to scan the pdf file for objects of
subtype FileAttachment, look at the FS key to find the filename, make
sure the filename ends in .sws, and then extract the internal object we
get from the EF key.

Sounds pretty easy, if we have something like pdfminer in Sage.

Thanks,

Jason

Jason Grout

unread,
Dec 18, 2008, 4:03:15 AM12/18/08
to sage-...@googlegroups.com
Jason Grout wrote:
> Dan Drake wrote:
>> This is a response to an idea that William mentioned recently [1]. He
>> asked if it's possible to embed a Sage worksheet into a PDF so that one
>> could upload the PDF to a Sage notebook server, which would then extract
>> the worksheet and let you edit it.
>>
>> Yes, it's possible! Take a look at:
>>
>> http://sage.math.washington.edu/home/drake/foo.pdf
>>
>> (and related files in my sage.math home directory). You can use pdftk
>> (or Acrobat) to extract a worksheet from that pdf.
>>
>
> I can also use PDFMiner to extract the worksheet. The nice thing is
> that PDFMiner is a pure python script. See
> http://www.unixuser.org/~euske/python/pdfminer/
>
> This extracts the sage worksheet from the above pdf:
>
>
> python -m tools.dumppdf -i20 -b foo.pdf > embedded-worksheet.sws
>


Here is a short python script which extracts the embedded worksheet in
the above pdf file and outputs it to stdout. To run this, put it in the
tools directory of the pdfminer distribution above, cd to the pdfminer
directory, and do:


python -m tools.sage foo.pdf > embedded.sws


Here's the file:

from pdflib.pdfparser import PDFDocument, PDFParser
import sys
stdout = sys.stdout

doc = PDFDocument()
fp = file('foo.pdf', 'rb')
parser = PDFParser(doc, fp)
doc.initialize()

for xref in doc.xrefs:
for objid in xref.objids():
try:
obj = doc.getobj(objid)
except:
continue
if isinstance(obj,dict) and 'Type' in obj and obj['Type'].name
== "Annot":
if 'Subtype' in obj and obj['Subtype'].name ==
"FileAttachment":
# We have an attached file!
filespec = obj['FS']
# Look for embedded file; we could try to extract the
# filename too. but that is platform dependent. See page
# 182 (Section 3.10.2) of
#
http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf.
if 'EF' in filespec:
fileobj = filespec['EF']['F']
embeddedspec = filespec['EF']
stdout.write(fileobj.resolve().get_data())
# Just output the first file found.
exit()


Thanks,

Jason

Jason Grout

unread,
Dec 18, 2008, 4:07:58 AM12/18/08
to sage-...@googlegroups.com
Dan Drake wrote:
>
> I think it would be great if we could get the notebook server to do this
> to uploaded PDFs, but I thought I would ask what people think first.
> Also, I don't know enough about the notebook server to add this
> functionality and don't know how hard it might be.

I don't think it will be that hard. Those are famous last words, though.


>
> Comments? Ideas?
>

This is now #4825

Jason

Robert Bradshaw

unread,
Dec 18, 2008, 5:52:26 AM12/18/08
to sage-...@googlegroups.com
On Dec 17, 2008, at 10:05 PM, Dan Drake wrote:

> This is a response to an idea that William mentioned recently [1]. He
> asked if it's possible to embed a Sage worksheet into a PDF so that
> one
> could upload the PDF to a Sage notebook server, which would then
> extract
> the worksheet and let you edit it.
>
> Yes, it's possible! Take a look at:
>
> http://sage.math.washington.edu/home/drake/foo.pdf
>
> (and related files in my sage.math home directory). You can use pdftk
> (or Acrobat) to extract a worksheet from that pdf.
>
> This complements Rob Beezer's recent ideas about making LaTeX and Sage
> work together.

This is very cool!

>
> I think it would be great if we could get the notebook server to do
> this
> to uploaded PDFs, but I thought I would ask what people think first.
> Also, I don't know enough about the notebook server to add this
> functionality and don't know how hard it might be.
>
> Comments? Ideas?

It would be neat if the worksheet could be generated from the .tex
source, with perhaps extra examples, so the author doesn't have to do
something totally separate/manually keep them in sync. But perhaps
you're already thinking along these lines.

- Robert

Martin Albrecht

unread,
Dec 18, 2008, 6:19:58 AM12/18/08
to sage-...@googlegroups.com
> Yes, it's possible! Take a look at:
>
> http://sage.math.washington.edu/home/drake/foo.pdf

This is very cool. My only concern is that I don't see what I'll get if the
server unpacks the PDF automatically. But I suppose this is the same problem
with sws files anyway.

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: martinr...@jabber.ccc.de

Rob Beezer

unread,
Dec 18, 2008, 2:03:14 PM12/18/08
to sage-devel
Two comments:

1.
> It would be neat if the worksheet could be generated from the .tex
> source, with perhaps extra examples, so the author doesn't have to do
> something totally separate/manually keep them in sync.

In a very recent post there is an example of this. Compare

http://buzzard.ups.edu/sage/sage-group-theory.pdf
http://buzzard.ups.edu/sage/sage-group-theory-20081217.sws

With a global boolean in LaTeX you could control including extra
examples into the sws version.

2.
> My only concern is that I don't see what I'll get if the server unpacks the PDF automatically.

If the notebook had an "Upload PDF with embedded worksheets" feature
then this would presumably add the worksheets to the user's store and
display them by their titles in the usual list.


On Dec 18, 3:19 am, Martin Albrecht <m...@informatik.uni-bremen.de>
wrote:
> _jab: martinralbre...@jabber.ccc.de

Ronan Paixão

unread,
Jan 2, 2009, 12:30:33 PM1/2/09
to sage-...@googlegroups.com

>
> It would be neat if the worksheet could be generated from the .tex
> source, with perhaps extra examples, so the author doesn't have to do
> something totally separate/manually keep them in sync. But perhaps
> you're already thinking along these lines.
>
> - Robert

I see many ideas in converting latex>worksheet, pdf>worksheet and
worksheet>pdf. It would be cool if all those converged to a single
solution. Maybe the sphinx solutions could help.

Ronan

Franco Saliola

unread,
Jan 2, 2009, 6:00:35 PM1/2/09
to sage-...@googlegroups.com
On Thu, Dec 18, 2008 at 2:12 AM, Rob Beezer <goo...@beezer.cotse.net> wrote:

> The pdftk route worked fine for me. I'll add that KPDF (KDE's pdf
> viewer) falls into the "scant support" category. Not much of a
> surprise there.

Okular, the KDE-4 pdf viewer, has (some) support for attached files,
but it doesn't seem to see the attached worksheet in this case.

Franco

--

Reply all
Reply to author
Forward
0 new messages