Michael Vilain wrote:
> Philipp Kraus <
philip...@flashpixx.de> wrote:
>> how can I extract text, images and other structures can be ignored,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> with PHP from a PDF file?
>> We have a lot of LaTeX PDFs and Powerpoint PDFs and would like to
^^^^^ ^^^^^^^^^^
>> extract only the text content
>> to create a text analysis of the content eg for LaTeX scripts we would
>> like the chapter structure as well.
>>
>> Is there any solution to do this with build-in PHP functions?
>
> I tried a bunch of stuff to read some bank statements that were in PDF
> format so I could import them via CSV. Didn't work out so well. Adobe's
> OCR feature only works if the PDFs are unlocked to allow it. I found an
> application that would do that but the OCRed text was unusable.
>
> So, my question is "what's generating the PDF files?"
The ability to read can be of advantage sometimes …
> Can you get whomever to do it in text or some other format?
OMG. One can leave it to you to give the worst possible technical advice.
> If they're encrypted images, then you've got a lot of work to do in order
> to get some output. Maybe.
Nobody but you is talking about images and OCR. You really don't have a
clue what PDF is, do you?
--
PointedEars
Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.t r