Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
crawler idle
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
prodigy2006  
View profile  
 More options May 21, 4:34 pm
From: prodigy2006 <adever...@gmail.com>
Date: Thu, 21 May 2009 13:34:01 -0700 (PDT)
Local: Thurs, May 21 2009 4:34 pm
Subject: crawler idle
I'd like to make a sitemap but I can't seem to get the site crawled
with the program, other than robots.txt, all I get in URL list is one
URL (the main url). www.mediaportal.hr

I read on this a post with a similar problem on this board but it
doesn't help solve the problem.

I set up the program through it's wizard but it doesnt seem to crawl
anything (crawlers remain idle). any ideas? help please!


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado2  
View profile  
 More options May 25, 7:43 am
From: webado2 <web...@gmail.com>
Date: Mon, 25 May 2009 04:43:23 -0700 (PDT)
Local: Mon, May 25 2009 7:43 am
Subject: Re: crawler idle
First of all if your computer runs Vista then check the Vista
instructions:
http://groups.google.com/group/gsitecrawler/web/gsc-v1-23-and-vista?h...

The when you request a crawl use the down arrow next to the button Re-
crawl and pikc This site.

Give it plenty of time to cawl, your site is big and/or slow for
crawlers (dont' knwo whihc, only tested a bit and seems to take long).

If your site has any redirecittions on it as it's being crawled that's
going to cause problems.

You can crawl your site using Xenu from
http://home.snafu.de/tilman/xenulink.html
and see if there are no errors (404, redirections, etc) during
navigation. These will need to be fixed before you can hope to build a
sitemap using Gsietcrawler or any other tool.

On May 21, 4:34 pm, prodigy2006 <adever...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jazbo  
View profile  
 More options Jun 28, 8:34 pm
From: Jazbo <wellspri...@gmail.com>
Date: Sun, 28 Jun 2009 17:34:24 -0700 (PDT)
Local: Sun, Jun 28 2009 8:34 pm
Subject: Re: crawler idle
I am having the same problem as prodigy2006. I am using Windows XP
pro.

I ran the url in Xenu and had no errors and only 28 links...


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado2  
View profile  
 More options Jun 28, 8:55 pm
From: webado2 <web...@gmail.com>
Date: Sun, 28 Jun 2009 17:55:50 -0700 (PDT)
Local: Sun, Jun 28 2009 8:55 pm
Subject: Re: crawler idle
Did you click Re-crawl > This Site ?

Once it finishes (maybe 1 or 2 mintes for only 28 urls) it will say
crawlers are emtpy, idle.
You can check what has bene found by clicking URL List and refresh. It
shoudl show the list of urls it has found.

Then click Generate > Google Sitemap

And so on.

On Jun 28, 8:34 pm, Jazbo <wellspri...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jazbo  
View profile  
 More options Jun 28, 9:14 pm
From: Jazbo <wellspri...@gmail.com>
Date: Sun, 28 Jun 2009 18:14:11 -0700 (PDT)
Local: Sun, Jun 28 2009 9:14 pm
Subject: Re: crawler idle
Re-crawl this site from the top toolbar, or the url list? The only
option on the top is recrawl this project.

I understand the final part, I made and uploaded a Sitemap from the
one url. But it seems like no matter what I do, it seems like I can't
get it to crawl...

Thanks for getting back so soon!

On Jun 28, 5:55 pm, webado2 <web...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christina S  
View profile  
 More options Jun 28, 9:37 pm
From: "Christina S" <web...@gmail.com>
Date: Sun, 28 Jun 2009 21:37:48 -0400
Local: Sun, Jun 28 2009 9:37 pm
Subject: Re: [GSiteCrawler] Re: crawler idle
Yes, Re-Crawl fromt the top button and select This project .

It should start it again.

Christina
www.webado.net


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jazbo  
View profile  
 More options Jun 28, 10:01 pm
From: Jazbo <wellspri...@gmail.com>
Date: Sun, 28 Jun 2009 19:01:20 -0700 (PDT)
Local: Sun, Jun 28 2009 10:01 pm
Subject: Re: crawler idle
I can't get it to work that way either... I haven't been able to get
it to crawl anything. All it has is the homepage.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christina S  
View profile  
 More options Jun 28, 10:26 pm
From: "Christina S" <web...@gmail.com>
Date: Sun, 28 Jun 2009 22:26:14 -0400
Local: Sun, Jun 28 2009 10:26 pm
Subject: Re: [GSiteCrawler] Re: crawler idle
Please post the url.

Christina
www.webado.net


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jazbo  
View profile  
 More options Jun 28, 10:58 pm
From: Jazbo <wellspri...@gmail.com>
Date: Sun, 28 Jun 2009 19:58:22 -0700 (PDT)
Local: Sun, Jun 28 2009 10:58 pm
Subject: Re: crawler idle
It's not fully functional yet, but the pages work. Here it is:

http://www.scentimentsfromtheheart.com/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jazbo  
View profile  
 More options Jun 29, 5:08 pm
From: Jazbo <wellspri...@gmail.com>
Date: Mon, 29 Jun 2009 14:08:37 -0700 (PDT)
Local: Mon, Jun 29 2009 5:08 pm
Subject: Re: crawler idle
I don't understand it, I've tried a few other sites, and they worked
with no problem. It must be an issue with the site? Why would all of
the links test with no problems and still not have it crawl? Thanks
for the help!

On Jun 28, 7:58 pm, Jazbo <wellspri...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado2  
View profile  
 More options Jun 29, 5:44 pm
From: webado2 <web...@gmail.com>
Date: Mon, 29 Jun 2009 14:44:08 -0700 (PDT)
Local: Mon, Jun 29 2009 5:44 pm
Subject: Re: crawler idle
Hah!

In your internal navigation yoru urls are on http://scentimentsfromtheheart.com/
rather than on http://www.scentimentsfromtheheart.com/

So either you fix all your internal links to use all www urls or run
the crawler for http://scentimentsfromtheheart.com/  and submit  the
site that way, without www.

On Jun 29, 5:08 pm, Jazbo <wellspri...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado2  
View profile  
 More options Jun 29, 6:18 pm
From: webado2 <web...@gmail.com>
Date: Mon, 29 Jun 2009 15:18:25 -0700 (PDT)
Local: Mon, Jun 29 2009 6:18 pm
Subject: Re: crawler idle
Actually it's worse. In addition to not staying on one particular
domain, you are also generating urls with some kind of session id.
Sort of never ending.

You have some broken links, and some redirected ones.

Use Xenu Link Sleuth to crawl the site - add both the www and non www
starting url.
http://home.snafu.de/tilman/xenulink.html

You should have a robots.txt file where you disallow various uris or
uri prefixes.

For instance:

User-agent: *
Disallow: /address_book
Disallow: /login
Disallow: /password_forgottten

First fix your navigation to stay either all on www or all without www.

Fix the broken links.

Add the robots.txt file.

Start Gsitecrawler again, delete the project.
Add it again for the particular domain form (with or withotu www).

Ask to read robtos.txt.
Do not ask to import known urls from Google.

Uncheck the option to crawl  files types for images, word documents,
pdf, etc.

Go to Filter  > Remove Parameters and add a new line for SFHid .

Also all yrou pages have the same title: SFH
Titles should be unique to each page and reflect what the page is
about.

I have tried to run GSC for yrou site - including botth the www and
non www urls in it, banning some ursl in the absence of a robots.txt
file.

I am not able to make headway with GSC.

I wonder if markup errrors are enough to break the crawler:
http://scentimentsfromtheheart.com/

Sometimes somethign stupid like an opening comment like <!-- which
doesn't get closed will mean that everything from that place one is
ignored. Thus no further links are found.

I don't know yet.

Fix what you can from the site and try again.

On Jun 29, 5:08 pm, Jazbo <wellspri...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jazbo  
View profile  
 More options Jun 29, 6:28 pm
From: Jazbo <wellspri...@gmail.com>
Date: Mon, 29 Jun 2009 15:28:23 -0700 (PDT)
Local: Mon, Jun 29 2009 6:28 pm
Subject: Re: crawler idle
Thanks! I'll try that.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google