Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Import from PDF

0 views
Skip to first unread message

Bill

unread,
Jun 25, 2008, 11:22:11 PM6/25/08
to
Any chance that ACCESS can import from a PDF? I have a website that outputs
the data to a PDF or a PRN file. I don't know anything about how to import
this into access or if it can be done. Sounds like a good challenge though!
Any help would be greatly appreciated. Thx

a a r o n . k e m p f @ g m a i l . c o m

unread,
Jun 26, 2008, 10:23:38 AM6/26/08
to
nope.

but if you were using SQL Server, you should be able to.

SQL Server supports XML, PDF is XML.

-Aaron

James A. Fortune

unread,
Jun 26, 2008, 1:47:45 PM6/26/08
to

Assuming you want to do this entirely in Access rather than using a
program made to do it, text extraction can be very hard or very easy
depending on whether or not encription is used, how much compression is
being done within the PDF file and whether or not the linearized PDF
format is being used. Maybe it's just the challenge you're looking for
:-). A text stream may or may not be compressed (usually with flate
compression (Zip), but not always). If so, there will be a flag in the
object stream before the compressed part indicating that the stream is
compressed. Unless, of course, the author of the PDF file has taken the
extra step of compressing the text used for the command stream. It's
rare that storage space is at such a premium that command stream
compression is necessary. Almost any text you would be interested in
would be contained in Page objects. For details see:

http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference.pdf

James A. Fortune
MPAP...@FortuneJames.com

a a r o n . k e m p f @ g m a i l . c o m

unread,
Jun 26, 2008, 11:11:33 PM6/26/08
to

good stuff.

Of course, SQL Server can search through PDFs using Full Text Search.

RIGHT?


On Jun 26, 10:47 am, "James A. Fortune" <MPAPos...@FortuneJames.com>
wrote:

> MPAPos...@FortuneJames.com

James A. Fortune

unread,
Jun 26, 2008, 11:58:50 PM6/26/08
to
a a r o n . k e m p f @ g m a i l . c o m wrote:
> good stuff.
>
> Of course, SQL Server can search through PDFs using Full Text Search.
>
> RIGHT?

Well, you could Select All, Copy and Paste from the PDF to get the text
you want, then do a screen capture to get the images and add those to
something like Word as well, but we want to do it in Access --
automatically. To do the things done by VBA in SQL Server you're
probably talking about using .NET. If you'd like to tackle the PDF text
extraction problem using .NET, go ahead. Please post the code here so
that we can compare it with equivalent VBA code.

James A. Fortune
MPAP...@FortuneJames.com

Sascha Trowitzsch

unread,
Jun 27, 2008, 6:44:00 AM6/27/08
to
There is a tool called pdf2text you can find at
http://www.foolabs.com/xpdf/download.html
Download the Precompiled binaries for Windows there.
You just need the pdf2text.exe from the package. This is a command line utility.
You tell it the location of the pdf file and it will extract the contents to a
text file. If using some special switches the text will being formatted similar
to the pdfs composition.
I use this tool for full text seach in pdfs from within Access. I call it via
ShellExecute and wait then for completion of the process. That can be done with
API WaitForSingleProcess. The easier way would be to wait as long in a loop till
the exported text file exists. After that the text file is read (Open file...)
as a record into a table.

Sascha

"James A. Fortune" <MPAP...@FortuneJames.com> schrieb im Newsbeitrag
news:%23Ne$oT71IH...@TK2MSFTNGP04.phx.gbl...

Stephen Lebans

unread,
Jun 27, 2008, 2:08:49 PM6/27/08
to
Hi Sascha,
I'll add a function to return the text of a PDF in the next release of
ReportToPDF.

--

HTH
Stephen Lebans
http://www.lebans.com
Access Code, Tips and Tricks
Please respond only to the newsgroups so everyone can benefit.


"Sascha Trowitzsch" <n...@moss-soft.de> wrote in message
news:e11Y5KE2...@TK2MSFTNGP06.phx.gbl...

0 new messages