how to remove old url from collection

6 views
Skip to first unread message

Zeeshan Iqbal

unread,
Feb 11, 2015, 5:26:06 AM2/11/15
to Google-Search-...@googlegroups.com
Hi All,

I had a html page for specific collection which was earlier returning more than 3000 links and GSA crawled all of them & i could see them in its collection.

Now my home page is changed and it contains only 300 links, and i am not able to remove all the pages GSA has already crawled for which parent link does not exists (as they are valid urls).
I tried to "recrawl" collection number of times with no luck
also tried to change the url pattern inside collection for a while and put the correct URL back when i saw no results in collection but it still get all the old results.
The number of url is very huge in number and i don't think its a good idea to put them in not crawl box of GSA. Any suggestions??


Thanks,
zeeshan

Dave Watts

unread,
Feb 11, 2015, 11:03:19 AM2/11/15
to Google-Search-...@googlegroups.com
As you've discovered, once the GSA learns about a URL it doesn't need
to rediscover it by following a link to it. Recrawling specific
collections isn't going to force the GSA to forget those URLs if
they're still valid.

You can reset the index, which will force the GSA to forget everything
it knew about your content and start from scratch. This would affect
the entire index, and is obviously a drastic measure.

You can enter URLs in the Do Not Crawl Patterns box. If there are no
longer links to those URLs, you should be able to remove them once
they've been dropped from the index.

You can use the delete action within a feed to tell the GSA to delete
documents from the index.

Dave Watts, CTO, Fig Leaf Software
1-202-527-9569
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Service-Disabled Veteran-Owned Small Business
(SDVOSB) on GSA Schedule, and provides the highest caliber vendor-
authorized instruction at our training centers, online, or onsite.
Reply all
Reply to author
Forward
0 new messages