Yes, it's possible! Take a look at:
(and related files in my sage.math home directory). You can use pdftk
(or Acrobat) to extract a worksheet from that pdf.
This complements Rob Beezer's recent ideas about making LaTeX and Sage
I think it would be great if we could get the notebook server to do this
to uploaded PDFs, but I thought I would ask what people think first.
Also, I don't know enough about the notebook server to add this
functionality and don't know how hard it might be.
--- Dan Drake <dr...@kaist.edu>
----- KAIST Department of Mathematical Sciences
I can also use PDFMiner to extract the worksheet. The nice thing is
that PDFMiner is a pure python script. See
This extracts the sage worksheet from the above pdf:
python -m tools.dumppdf -i20 -b foo.pdf > embedded-worksheet.sws
The -b means binary mode, the -i20 specifies that we should extract the
content of object 20 in the pdf file. We know it is object 20 by
looking at object 23. Here is the output from object 23, interspersed
with my comments (lines starting "#")
$ python -m tools.dumppdf -i23 foo.pdf
# This is a xml representation of a dictionary, where each key is
# followed by its value.
# Here is where we find the file information. FS = "File Specification"
# EF = Embedded File
# F = File (in this case, it's an internal reference)
# F = File; it's the filename
# We should scan for this key/value pair. This tells us that this
# object contains info for a file attachment.
See p. 683 of the PDF 1.7 spec
So basically, it looks like we need to scan the pdf file for objects of
subtype FileAttachment, look at the FS key to find the filename, make
sure the filename ends in .sws, and then extract the internal object we
get from the EF key.
Sounds pretty easy, if we have something like pdfminer in Sage.
Here is a short python script which extracts the embedded worksheet in
the above pdf file and outputs it to stdout. To run this, put it in the
tools directory of the pdfminer distribution above, cd to the pdfminer
directory, and do:
python -m tools.sage foo.pdf > embedded.sws
Here's the file:
from pdflib.pdfparser import PDFDocument, PDFParser
stdout = sys.stdout
doc = PDFDocument()
fp = file('foo.pdf', 'rb')
parser = PDFParser(doc, fp)
for xref in doc.xrefs:
for objid in xref.objids():
obj = doc.getobj(objid)
if isinstance(obj,dict) and 'Type' in obj and obj['Type'].name
if 'Subtype' in obj and obj['Subtype'].name ==
# We have an attached file!
filespec = obj['FS']
# Look for embedded file; we could try to extract the
# filename too. but that is platform dependent. See page
# 182 (Section 3.10.2) of
if 'EF' in filespec:
fileobj = filespec['EF']['F']
embeddedspec = filespec['EF']
# Just output the first file found.
I don't think it will be that hard. Those are famous last words, though.
> Comments? Ideas?
This is now #4825
> This is a response to an idea that William mentioned recently . He
> asked if it's possible to embed a Sage worksheet into a PDF so that
> could upload the PDF to a Sage notebook server, which would then
> the worksheet and let you edit it.
> Yes, it's possible! Take a look at:
> (and related files in my sage.math home directory). You can use pdftk
> (or Acrobat) to extract a worksheet from that pdf.
> This complements Rob Beezer's recent ideas about making LaTeX and Sage
> work together.
This is very cool!
> I think it would be great if we could get the notebook server to do
> to uploaded PDFs, but I thought I would ask what people think first.
> Also, I don't know enough about the notebook server to add this
> functionality and don't know how hard it might be.
> Comments? Ideas?
It would be neat if the worksheet could be generated from the .tex
source, with perhaps extra examples, so the author doesn't have to do
something totally separate/manually keep them in sync. But perhaps
you're already thinking along these lines.
This is very cool. My only concern is that I don't see what I'll get if the
server unpacks the PDF automatically. But I suppose this is the same problem
with sws files anyway.
name: Martin Albrecht
I see many ideas in converting latex>worksheet, pdf>worksheet and
worksheet>pdf. It would be cool if all those converged to a single
solution. Maybe the sphinx solutions could help.
> The pdftk route worked fine for me. I'll add that KPDF (KDE's pdf
> viewer) falls into the "scant support" category. Not much of a
> surprise there.
Okular, the KDE-4 pdf viewer, has (some) support for attached files,
but it doesn't seem to see the attached worksheet in this case.