Short: Impossible.
Long: I know no tool which might be able to do this.
Uwe
BTW: If you do not need to modify the math, crop the pages with Acrobat
Pro (or whatever the name of the commercial license is now) or PDFTK
(maybe, never used it) and embed them as graphics.
Uwe
cropping is also possible with pdfLaTeX:
\includegraphics[viewport=u v w x]{file}
...Rolf
This is like: "Can I create the movie script from the finished film?"
Or: "Can I create the recipe from that meal they served me?"
Or: "Can I create apples from apple puree?"
...Rolf
There're a couple of tools which attempt OCR which includes
mathematics, for example:
http://research.cs.queensu.ca/drl//ffes/
Convert the .pdf to a bitmap, then feed it to ffes.
William
I'm not sure it's that useful to consider this branch of the thread,
but...
Considering that the PDF may not have been created with TeX to begin
with, perhaps...
"Can I create apples from concentrated orange juice?"
or...
"Can I create a recipe from a shooting star?"
or...
"Can I create the movie script from the banana-flavored toothpaste?"
This is like asking to recreate the whole cow from a hamburger.
> Specifically, I want to be able to convert the math that
> appears in a PDF document to LaTeX code, so that I don't have to write
> it all out manually.
Find the original source and use that. Reverse-engineering may be
possible, but it will take longer than retyping it.
///Peter
Enough of this.
The fact is that Adobe Acrobat can often create a usable .doc from a
PDF, though this likely works well only with ordinary text documents.
It's unfortunate a comparable free application doesn't exist.
Bob T.
Ah, but this depends on what one calls "usable". Usable means the
consistent use of style sheets, cross references and stuff like that.
That 95% of WYSIWYG system users will go "Huh? What's that?" does not
change that you don't want a 1000-page document without such basic
elements in them.
Regardless whether it has been produced by Acrobat, a clueless retyper,
a clueless original typer or a free tool.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
UKTUG FAQ: <URL:http://www.tex.ac.uk/cgi-bin/texfaq2html>
In fact you can convert from pdf to .doc using free tools. If you are on
linux, kword can import pdf and export to formats that MS word can read.
Or you can use pdftohtml and then convert the html to .doc. Or you can
sign up for gmail from google, email the pdf to yourself, and have
google convert it to html (and then convert the html to .doc format).
None of these methods will do what the OP wanted of course (convert math
in a pdf to latex), but then again neither will Adobe Acrobat...
What I meant by comparable was to convert .pdf to .tex. I'm aware it is
possible to go from .pdf to .doc and then .doc to .tex using Abiword,
but surely we could and should do better.
My main point was that it is inappropriate to use irrelevant analogies
to mock the OP's request.
Bob T.
there is a faq answer that says (in effect) that there's no point in
even trying anything beyond extracting the text. this thread is the
first time anyone's mentioned anything else ... rescanning printed
output sounds (ahem) "fun".
anyway, i shall revise the answer some time.
--
Robin Fairbairns, Cambridge
> Bob Tennent <Bo...@cs.queensu.ca> writes:
>>
>>What I meant by comparable was to convert .pdf to .tex. I'm aware it is
>>possible to go from .pdf to .doc and then .doc to .tex using Abiword,
>>but surely we could and should do better.
>
>there is a faq answer that says (in effect) that there's no point in
>even trying anything beyond extracting the text. this thread is the
>first time anyone's mentioned anything else ... rescanning printed
>output sounds (ahem) "fun".
There is no need to "rescan printed output".
Modern OCR software (commercial: Caere OmniPage, Abbyy FineReader) can
directly read pdf, convert it to a bitmap and OCR this bitmap. Of
course the quality is better than with printing and rescanning.
And if you want to do it manually, you can open the pdf with
Ghostscript and convert it to a bitmap, then apply the OCR of your
choice.
This OCR software can also guess formatting (not perfect, but
useable).
Drawback: It saves in MS Word format, not (La)TeX.
Wilfried Hennings
please reply in the newsgroup
It's actually unbelievable how well you can reconstruct the cow from
the hamburger:
http://www.inftyproject.org/en/software.html#InftyReader
Didn't test it, though.
Kurt
Thought I should point out that FFES is a prototype for pen-based math
entry, and does not converting images directly to .tex at this time.
There is a preliminary, experimental part of the program for importing
images, but it's fairly weak at the present time. Also, for those
interested, there is a newer version of FFES available here:
http://www.cs.rit.edu/~rlaz/ffes/
I believe that the Infty system of Suzuki et al. does support
conversion from images to .tex, but have not had time to try the
system myself.
-Richard Zanibbi (member of the FFES development team, FFES maintainer)
Slightly off topic -- if you try to install the distribution that's on-
line, it's going to fail when it tests the TXL compiler... From the
test_txl called from the Makefile for the DRACULAE_0.4 directory:
COMPILE_TEST=`cd test; txlc test/Test.Txl`
I think that "test/" should be removed. Additionally, in that DRACULAE
Makefile, I had to change the *.x rule to wrap a $< by a basename.
That is, you're doing a "cd src" and then still using "src."
I'm running OS/X 10.4. After making those changes, I was able to build
ffes fine.
--Ted
Thank you for catching this. I will update these files when I get the
chance.
-Richard Zanibbi
> It's actually unbelievable how well you can reconstruct the cow from
> the hamburger:
Do you think we can put a copy of the cow into the hamburger?
What I mean is: can pdf(la)tex somehow put the original tex code into
the pdf? I don't know what the pdf specs say about this, but I seem to
remember that pdf's can have embedded files (attachments). It would
increase the chances of the document being convertable to a new
standard in 30 or 100 years.
cherio, Luite.
Meanwhile, you can put anything you like into the pdf as
a comment (I DO mean comment, not comment-annotation).
PDF comments start with % and last until the end of the line.
That's not what he means, but yes, one can store a copy of the .tex
source (or any other file) w/in a .pdf when typesetting / creating it.
The Mac OS X Service app LaTeXiT.app (among others) does this, which
allows an embedded equation to be reverted back to its source for
editing, then re-typesetting.
William
> > > There're a couple of tools which attemptOCRwhich includes
> > > mathematics, for example:
> >
> > >http://research.cs.queensu.ca/drl//ffes/
>
> > It's actually unbelievable how well you can reconstruct the cow from
> > the hamburger:
>
> Do you think we can put a copy of the cow into the hamburger?
> What I mean is: can pdf(la)tex somehow put the original tex code into
> the pdf?
Easy, look at package embedfile or attachfile2 (or attachfile).
Yours sincerely
Heiko <ober...@uni-freiburg.de>
I assume that these packages require the use of pdftex. That is, they
require generating a PDF directly from TeX, which may not be appealing
for many users (including this one).
Is there a way to embed the TeX into a DVI and then still manage to
maintain it through the dvips and ps2pdf pipeline? (I assume not)
--Ted