The DO NOT Crawl rule doesn't work

4 views
Skip to first unread message

Brian

unread,
Nov 2, 2009, 2:40:54 AM11/2/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hello,

I set this rule for the GSA to let it avoid the swf file: .swf$, but
unfortunately, it didn't work. When check the Crawl Diagnostics, I
found it indexed a lot of links like this: http://mydomain.com/club.swf?logged=0&id.
Does anyone know why? Should I modify my rules? Many thanks!

Best regards!
Brian

Mike

unread,
Nov 2, 2009, 4:56:49 AM11/2/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hello Brian,

.swf$

will match
http://mydomain.com/club.swf
but not
http://mydomain.com/club.swf?abc=xzy

Because the "swf$" says that you are looking for a swf as the last
three characters in the sequence. To exclude the other ones as well
you will need to modify your regexp to include parameters after the
extension as well. You can find some more instructions and tips on:
http://code.google.com/apis/searchappliance/documentation/50/admin/URL_patterns.html

or if you search the internet for regexp tutorials.

Cheers
Mike

JMarkham

unread,
Nov 2, 2009, 1:14:42 PM11/2/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
hi,

A recommendation would be to have a rule to catch both the URLs that
end in .swf, and those that have parameters.

Two rules, then.

regexp:\\.swf$
regexp:\\.swf\\?

What also might work is the contains keyword:

contains:.swf

If you might have links with capital letters in them, however, you'll
have to use regexpIgnoreCase.

regexpIgnoreCase:\\.swf$
regexpIgnoreCase:\\.swf\\?


Jeff

Brian

unread,
Nov 3, 2009, 2:23:36 AM11/3/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
The rule "regexpIgnoreCase:\\.swf\\?" works, thanks for all of your
help!

Brian

Brian

unread,
Nov 3, 2009, 2:23:36 AM11/3/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
The rule "regexpIgnoreCase:\\.swf\\?" works, thanks for all of your
help!

Brian

On Nov 3, 2:14 am, JMarkham <jeff.mark...@capella.edu> wrote:
Reply all
Reply to author
Forward
0 new messages