blocking a folder name at multiple locations from being crawled

0 views
Skip to first unread message

mayu

unread,
Oct 31, 2008, 7:06:30 PM10/31/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi,
I have a folder name that exists at multiple locations and that folder
needs to be prevented from crawling. For example I have -
/exe/
/europe/downloads/exe/
/asia/downloads/exe/

And this "exe" folder has to be blocked from crawling. I have listed
"/exe/" in the do not crawl list. But how do I tell GSA to not crawl
any instance of folder name with "exe" starting from the root level?

Thanks.

Prathap

unread,
Nov 1, 2008, 1:04:15 AM11/1/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
You may consider adding the exact pattern like

smb://file server/sharename/subfolder/

in Do not crawl patterns.

If you do not want to crawl /exe/ folder all together, you can add

contains:/exe/ in do not crawl patterns.

Randy120

unread,
Nov 12, 2008, 3:08:43 PM11/12/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
How about the using contains? This would work on the Mini. This is
like using a wildcard.

----------------------

contains:exe

B.A.

unread,
Nov 13, 2008, 4:59:03 AM11/13/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini

See the example for /images/ in this document:

Constructing URL Patterns
http://code.google.com/apis/searchappliance/documentation/50/admin/URL_patterns.html

and just replace /images/ by /exe/ and you'll be done.

The contains:exe pattern would remove stuff like:

annexe
circumflexes
executive
exemplar
exempt
exercise
indexes
multiplexer

and many many more, so I don't think that would be really good.

On Nov 12, 8:08 pm, Randy120 <randy.paym...@gossinternational.com>
wrote:
Reply all
Reply to author
Forward
0 new messages