Web Images Videos Maps News Shopping Gmail more »
Recently Visited Groups | Help | Sign in
Google Groups Home
"With a database this large the crawlers will be disabled"
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Patrick Kinney  
View profile  
 More options Jun 19, 2:00 pm
From: Patrick Kinney <patr...@kinneys.net>
Date: Fri, 19 Jun 2009 14:00:25 -0400
Local: Fri, Jun 19 2009 2:00 pm
Subject: "With a database this large the crawlers will be disabled"

When I opened GsiteCrawler today it told me:
"Warning!
The size of your database file is over 900mb - please compact it.
With a database this large the crawlers will be disabled."
I have no idea how to do that. Any suggestions? I don't really have a
very big site, but it is an osCommerce site, so maybe that is different?
Thanks for any help,
Patrick

Patrick Kinney
Kinney's Shooting Supply, LLC
kinneysshootingsupply.com

"......the right of the people to keep and bear arms shall not be infringed."  


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado2  
View profile  
 More options Jun 19, 8:45 pm
From: webado2 <web...@gmail.com>
Date: Fri, 19 Jun 2009 17:45:10 -0700 (PDT)
Local: Fri, Jun 19 2009 8:45 pm
Subject: Re: "With a database this large the crawlers will be disabled"
You can compact from the File menu.

Do you have any clue how big the site should be and how many urls a
robot will actually find?

I have a hunch you need to disallow a bunch of things in robtos.txt or
risk spawining unlimited number of urls.

Also is it possible there are session ids that get added to the urls?
GSC can remove those but the better solution woudl be to prevent them
from being added when robots crawl.

A page like http://kinneysshootingsupply.com/fajen-m-6.html?sort=2d&page=1&filter...
are only a differently sorted and filtered version of
http://kinneysshootingsupply.com/fajen-m-6.html, so it should not be
separately indexed.

On Jun 19, 2:00 pm, Patrick Kinney <patr...@kinneys.net> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian Lutz  
View profile  
 More options Jun 19, 9:11 pm
From: "Sebastian Lutz" <l...@oberberg.net>
Date: Sat, 20 Jun 2009 03:11:54 +0200
Local: Fri, Jun 19 2009 9:11 pm
Subject: AW: [GSiteCrawler] Re: "With a database this large the crawlers will be disabled"

Hi,

yoz can forbidden URL's with a "?" at the robots.txt

By Basti

-----Ursprüngliche Nachricht-----
Von: gsitecrawler@googlegroups.com [mailto:gsitecrawler@googlegroups.com] Im
Auftrag von webado2
Gesendet: Samstag, 20. Juni 2009 02:45
An: SOFTplus GSiteCrawler
Betreff: [GSiteCrawler] Re: "With a database this large the crawlers will be
disabled"

You can compact from the File menu.

Do you have any clue how big the site should be and how many urls a
robot will actually find?

I have a hunch you need to disallow a bunch of things in robtos.txt or
risk spawining unlimited number of urls.

Also is it possible there are session ids that get added to the urls?
GSC can remove those but the better solution woudl be to prevent them
from being added when robots crawl.

A page like
http://kinneysshootingsupply.com/fajen-m-6.html?sort=2d&page=1&filter...
sort=5a
are only a differently sorted and filtered version of
http://kinneysshootingsupply.com/fajen-m-6.html, so it should not be
separately indexed.

On Jun 19, 2:00 pm, Patrick Kinney <patr...@kinneys.net> wrote:

> When I opened GsiteCrawler today it told me:
> "Warning!
> The size of your database file is over 900mb - please compact it.
> With a database this large the crawlers will be disabled."
> I have no idea how to do that. Any suggestions? I don't really have a
> very big site, but it is an osCommerce site, so maybe that is different?
> Thanks for any help,
> Patrick

> Patrick Kinney
> Kinney's Shooting Supply, LLC
> kinneysshootingsupply.com

> "......the right of the people to keep and bear arms shall not be

infringed."  

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Kinney  
View profile  
 More options Jun 20, 7:00 am
From: Patrick Kinney <patr...@kinneys.net>
Date: Sat, 20 Jun 2009 07:00:16 -0400
Local: Sat, Jun 20 2009 7:00 am
Subject: Re: [GSiteCrawler] Re: "With a database this large the crawlers will be disabled"

It appears that the pages that include the "?" , like in the example
below are the extra pages. Can I just filter out the "html?" pages
and still have all of my product be found?
I got rid of about a third of the urls by adding one other filter in Gsite.
Should I be disallowing all of this in the robots.txt?
Thanks,
Patrick

At 08:45 PM 6/19/2009, you wrote:

Patrick Kinney
Kinney's Shooting Supply, LLC
kinneysshootingsupply.com

"......the right of the people to keep and bear arms shall not be infringed."  


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado2  
View profile  
 More options Jun 20, 7:16 am
From: webado2 <web...@gmail.com>
Date: Sat, 20 Jun 2009 04:16:49 -0700 (PDT)
Local: Sat, Jun 20 2009 7:16 am
Subject: Re: "With a database this large the crawlers will be disabled"
Ok, then yes, you should disallow anything that has a query string
after the .html. Not just in GsiteCrawler's filter but also in
robots.txt

Add this to the robots.txt file, under User-agent: *

Disallow: /*html?sort

If ther eare other query strings where the first param is other than
sort, add other lines. Not sure if the simpler, more general
Disallow: /*html?
would work.

Whatever else you added to the filter in GSC should also be added to
robots.txt.

Then in GSC import the robots.txt file again and refilter URL List and
Crawler queue.

Then start the crawl again.

On Jun 20, 7:00 am, Patrick Kinney <patr...@kinneys.net> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Kinney  
View profile  
 More options Jun 20, 12:30 pm
From: Patrick Kinney <patr...@kinneys.net>
Date: Sat, 20 Jun 2009 12:30:46 -0400
Local: Sat, Jun 20 2009 12:30 pm
Subject: Re: [GSiteCrawler] Re: "With a database this large the crawlers will be disabled"

Thanks a million. It looks like I have plenty to do, now.
Patrick

At 07:16 AM 6/20/2009, you wrote:

Patrick Kinney
Kinney's Shooting Supply, LLC
kinneysshootingsupply.com

"......the right of the people to keep and bear arms shall not be infringed."  


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google