Google Mini Newbie - refreshing the index

0 views
Skip to first unread message

Ade

unread,
Jan 22, 2009, 12:20:28 PM1/22/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hello All,
I have looked in the archives and got back as far as November last
year..this is a very busy forum! So am going to ask my question, hope
that's cool.

We have purchased a GSA(mini) with a 300,000 license and have around
30 URL's that need to be indexed. The total number of pages according
to google.com when using 'site:URL' is in excess of 4 million over the
30 URLs. However Im aware that alot of these are DB and dynamic pages.

We have set the GSA to crawl the 30 URLs and as they pull back dynamic
pages and forum pages (anything DB generated) we place these into the
'do not crawl' list.

My hope was that as we placed pages into the list, the GSA would then
go out and pull back more documents and we could weed out the non
relevant documents. This doesn't seem to be happening (certainly not
as I envisaged). As we place documents and pages into the do not crawl
list it seems to stop indexing them but keeps them in the 300,000 doc
amount?

My question is do I need to get the GSA to re-index the URL's each
time we place pages/directories in to the do not crawl list?

Many thanks

Joe D'Andrea

unread,
Jan 22, 2009, 12:59:51 PM1/22/09
to Google-Search-...@googlegroups.com
Greetings!

On Thu, Jan 22, 2009 at 12:20 PM, Ade <ade.s...@gmail.com> wrote:

> ... do I need to get the GSA to re-index the URL's each


> time we place pages/directories in to the do not crawl list?

You shouldn't need to reindex ... though that's certainly one way to
get those removed patterns out of there! See this for more info:

http://snurl.com/ajlx3

"Document URLs disappear from search results between 15 minutes and
six hours after the [Do Not Crawl] pattern changes, depending on
system load."

If you're using software prior to 4.6, the Remove Doc Ripper kicks in
every six hours to do the deed.

Perhaps the patterns you're using aren't matching up? Double
check/test them against these examples:

http://snurl.com/ajm79

In a pinch (for instance, if Legal is demanding those links be "taken
down right now!"), you can add those same patterns to the Remove URLs
list (in your Front End). Note however that these are consulted after
every query, so I would use that only as a temporary measure.

--
Joe D'Andrea
Liquid Joe LLC
Google Enterprise Partner
www.liquidjoe.biz
+1 (908) 781-0323

Ade

unread,
Jan 26, 2009, 6:20:54 AM1/26/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hello Joe,
Many thanks for your prompt response, I am getting our guys to
investigate your thoughts and we will be looking at your advice to see
if it helps with our cause.
I will keep you informed.

Many thanks again, really appreciated.

Ade

On Jan 22, 5:59 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> Greetings!
>
Reply all
Reply to author
Forward
0 new messages