Regexp to exclude URL's with commas

1 view
Skip to first unread message

rickosuavesan

unread,
Apr 8, 2009, 7:26:38 PM4/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
can someone help me with a regexp? i need to exclude URL's with a
comma before the domain.

http://bob,marley.rasta.com
could also be
http://bob,ziggy,marley.rasta.com

need something before: regexp: ??\\.rasta\\.com

i have been digging around for a while but not come up with a solution
yet so we can clean up some url's in our gsa.

Thanks,

Rick

Joe D'Andrea

unread,
Apr 8, 2009, 7:42:38 PM4/8/09
to Google-Search-...@googlegroups.com
Greetings!

On Wed, Apr 8, 2009 at 7:26 PM, rickosuavesan <rickos...@yahoo.com> wrote:

> can someone help me with a regexp?  i need to exclude URL's with a
> comma before the domain.

Perhaps this will do the trick:

regexp:http://[^,]+\\.com/

Another strategy is the "that which is not expressly permitted is
denied" rule, wherein you would specify what characters _are_ allowed,
and then everything else is an automatic red flag. For instance:

regexp:http://[-.0-9a-z]+\\.com/

Allows domains with a dash, dot, number or lower-case letter before
the dot-com. Anything else (including commas) is out of the running.

--
Joe D'Andrea
Liquid Joe LLC | Google Enterprise Partner
www.liquidjoe.biz | skype:joedandrea | +1 (908) 781-0323

rickosuavesan

unread,
Apr 9, 2009, 10:34:57 AM4/9/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Ahhh...good ol Joe. Thanks - works like a charm!!

Rick

On Apr 8, 4:42 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> Greetings!
>
> On Wed, Apr 8, 2009 at 7:26 PM, rickosuavesan <rickosuave...@yahoo.com> wrote:
> > can someone help me with a regexp?  i need to exclude URL's with a
> > comma before the domain.
>
> Perhaps this will do the trick:
>
>   regexp:http://[^,]+\\.com/
>
> Another strategy is the "that which is not expressly permitted is
> denied" rule, wherein you would specify what characters _are_ allowed,
> and then everything else is an automatic red flag. For instance:
>
>   regexp:http://[-.0-9a-z]+\\.com/
>
> Allows domains with a dash, dot, number or lower-case letter before
> the dot-com. Anything else (including commas) is out of the running.
>
> --
> Joe D'Andrea
> Liquid Joe LLC | Google Enterprise Partnerwww.liquidjoe.biz| skype:joedandrea | +1 (908) 781-0323
Reply all
Reply to author
Forward
0 new messages