Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Searching for PDF files on the Web

1 view
Skip to first unread message

Helene...@adobeforums.com

unread,
Oct 14, 2003, 10:05:46 AM10/14/03
to

Ian_B...@adobeforums.com

unread,
Oct 14, 2003, 10:34:29 AM10/14/03
to
Clearly you've heard of indexes. Indexes can be created by a variety of utilities (depending on use of Intranet/Internet or mapped drive file systems). However, these are indexes of server content (including or not PDF files). The index supports a search of the text content (and also keywords and meta-data according to the utility chosen) of a server full of files. You state you're looking OF PDF files, rather than searching for content that might be in a PDF file. Could you perhaps clarify what you're trying to achieve, and why you think indexes might help? Also have a look at the Acrobat Help file in a full version of Acrobat. Find references to Catalog.
That's all for now. Looking for your next post.
Cheers
Ian

Helene...@adobeforums.com

unread,
Oct 14, 2003, 10:09:25 AM10/14/03
to
Hi,
I wonder if somebody out there would be able to point me in the right direction.

I work on Windows 2000, using Acrobat v5.0, 512MB RAM.

If I try to do searches for PDF files on the Internet/Intranet it
does not bring up anything at all.

Is this a problem on v5.0, or what do I have to do to enable search for PDF files? Am I right in thinking that this has something to do with indexing files?

If anybody has come across this problem, I would be grateful for any advise.

Many thanks.

Kind Regards,
Helene

Mark_...@adobeforums.com

unread,
Oct 14, 2003, 10:26:30 AM10/14/03
to
searching for or searching in? Google will find pdf files on the web by entering your search string plus the filetype identifier, e.g "apples filetype:pdf" Leave off the quote marks.

Susi_...@adobeforums.com

unread,
Oct 14, 2003, 10:35:21 AM10/14/03
to
In Google:
inurl:pdf

Helene...@adobeforums.com

unread,
Oct 14, 2003, 11:12:04 AM10/14/03
to
All,
I know how to search for pdf files using google etc.

What I am really trying to find out is WHY Adobe files needs to be indexed & cataloged prior to being able to find them using a search
on a Web site.

Word & Excel files is not a problem, but there seems to be a problem with PDF files.

Could anybody clarify WHY Adobe files have to be Indexed/Cataloged?
Is there any patches or upgrades that will enable the behaviour of
searching for PDF files to be more like MS Word or Excel files?

Hope this makes sense.

Cheers.

Helene

Aandi_...@adobeforums.com

unread,
Oct 14, 2003, 12:15:56 PM10/14/03
to
>What I am really trying to find out is WHY Adobe files needs to be indexed & cataloged prior to being able to find them using a search
>on a Web site.

They don't. Google doesn't need to you to index/catalog your files,
and if you did, it wouldn't help.

Aandi Inston

Ian_B...@adobeforums.com

unread,
Oct 14, 2003, 1:26:25 PM10/14/03
to
Helene,
If you open a PDF file in a text editor you'll see lots of strange characters that are the binary representation of an image of a document. If for example you do a simple search of a drive for a files containing a string of characters "string", such a search would not find a PDF file that was an image of a document in which the image of the word word "string" was clearly to be seen.

If instead you did this for the Word document that was used as the source to create the PDF with "string" in it, the search of a drive for documents containing "sting" would find the Word file.

Enter the Catalog command in Acrobat that is capable of looking inside the file structure of PDFs and identifying all the text words in all the PDFs. The Catalog command creates an index that can be searched for "string". All occurrences of "string" in the index will have a link to them so that clicking on a document that is identified by the index as having "string" in it will display that document in Acrobat with he occurences of "string" highlighted.

NOTE BENE: Indexes produced by the Catalog command will not work on a Web site. These indexes contain relative links that will not work within http. Catalog provides a full text search function for collections of PDFs on a drive using the file pathname - for example, a mapped drive on a LAN, or a CD drive.

The particular file structure of PDFs is the reason why special search tools are available that can decode this structure to find text. It is also the reason why not all search tools can operate on PDF files. You can find more information on searching PDF files for defined content at <http://www.searchtools.com/info/pdf.html>.

I think the expression "searching for PDFs" that you used in your posts may have confused some of the contributors into thinking you were trying to find PDF files. Hence my suggestion that you explain in more detail just what you're trying to do.
Hope this helps.
Cheers
Ian

Susi_...@adobeforums.com

unread,
Oct 14, 2003, 5:15:13 PM10/14/03
to
I repeat my post:

Go to Google and type

inurl:pdf

followed by your serch notion.

0 new messages