Retrying URL: Connection reset by peer during fetch.

149 views
Skip to first unread message

AsparagusX

unread,
May 7, 2009, 11:34:01 AM5/7/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hallo

I am getting this error continously on a GSAve (5.2.xx version). I
have seen some other posts about this same problem, and have reduced
host loads, but to no avail. Any other information would be
appreciated.

Thanks

Anton

Thiru

unread,
May 7, 2009, 6:26:04 PM5/7/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Did you force a recrawl from Crawl Diagnostics? If yes, do you get the
same error ?
Note : In Crawl Diagnostics, You need to drill down the url to the
final page where you will see a section "More information about this
page". Right below that section, you will see an option called
"Recrawl this url". Selecting that link will force a recrawl of that
url.

Also, I would recommend that you update your appliance to 5.2.0.G.32-
p1 (if you have not done it already). This fixes number of crawling
related issues.

Cheers,
Thiru

AsparagusX

unread,
May 8, 2009, 2:55:37 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I have certainly tried Recrawling a number of times. I am on GSA
Virtual Edition - has this been patched to the latest version as
mentioned below. We are in a process of evaluating the GSA, using ve,
before taking the 'plunge'.

Anton
> > Anton- Hide quoted text -
>
> - Show quoted text -

brianb

unread,
May 8, 2009, 3:49:22 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hmmm... the connection reset by peer error is usually some kind of tcp/
network issue where the GSA is getting cut off in the middle of trying
to crawl. So a couple things:

1. What is your host load currently set to? You can try even setting
it to 0.1.
2. Does this only happen to some hosts or all hosts/URLs you are
crawling?
3. Is this will http or smb?
4. It might be worth it to do a quick tcpdump/wireshark to see what is
happening.

That is a start. Lets see what you find.

Brian

justin.brister

unread,
May 8, 2009, 5:59:51 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
You don't say what you are trying to crawl;

I have seen instances where you get this error if you try and crawl a
Windows fileshare and you don't have the credentials set correctly.

Also, are you sure you have patched the VGSA? I didn't think that the
version manager was available on the VGSA?

Thanks,

J

bmacias

unread,
May 8, 2009, 9:27:10 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
We, too, are fighting the "Retrying URL: Connection reset..."
monster. On GSA 5.2.xx version (not VE) on continuous crawl.

cphdforumsrosslynmain.aspx Retrying URL: Connection reset by peer
during fetch. 15 Mar 4:21 AM
cphdforumsrosslynproposedguidelines.aspx Retrying URL: Network
unreachable during fetch. 15 Mar 4:28 AM

1. What is your host load currently set to? 1.0 for the domain
2. Does this only happen to some hosts or all hosts/URLs you are
crawling? Seeing it on most of our 25 hosts, but most are <500 URLs.
Primarily concerned with main domain of approx 40k URLs
3. Is this will http or smb? http

Overtime (few weeks) you can watch the Crawl Diagnostics Report where
all the Crawled URLs drain down while the corresponding Retrieval
Errors column increases proportionally.

Current klugey strategy is a weekly refresh where we drill down to the
directories with the bulk of the errors, export them to a file, then
paste that list in Freshness Tuning in the Recrawl these URL Patterns
field. Repetitive and tedious across several folder paths and several
domains as you can imagine. And I'm aware that we have a small volume
of URLs comparatively.

We will try your suggestion "4. It might be worth it to do a quick
tcpdump/wireshark to see what is happening."

Wanted to be on record along side AsparagusX and others. Thanks.


On May 8, 5:59 am, "justin.brister" <justin.bris...@googlemail.com>
wrote:

AsparagusX

unread,
May 8, 2009, 10:24:40 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Brian

I am using smb. It is all to the same NAS. It has correctly picked up
the correct folders and sub-folders, but once a document is
identified, then this error appears some time later on. I have reset
the index a number of times, with no joy. Is it possible to patch a
GSAve? I can certainly crawl other URL's (e.g. HTTP).

Thanks

Anton
> > > - Show quoted text -- Hide quoted text -

AsparagusX

unread,
May 8, 2009, 10:49:42 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Further to my previous post.

I moved the file share somewhere else (but still on the same NAS). The
GSAve duly started crawling and results were being returned, with no
errors - HOWEVER, as soon as I started 'interacting' with the GSA e.g.
doing searching , these errors started appearing. I am hosting the
GSAve on VMPlayer 2.xxxx on a Vista Host. I have also turned the Host
Load down to 0.1 as suggested.

justin.brister

unread,
May 8, 2009, 11:36:26 AM5/8/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I have had issues running the VGSA on the Player.

You should try running the VGSA on the full VMWare product - that
seems to cure a whole bunch of network related issues with the VGSA.

J

AsparagusX

unread,
Jun 1, 2009, 10:43:53 AM6/1/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hallo

Been quite on this post for a while. I have now rebuilt the GSAve on a
Windows 2008 Server, with WMWare Server (all the bells and whistles)
and the problems still persist. When the GSA starts crawling, the urls
are correctly shown in the crawler diagnostics - I have tried this
with more that one SMB on different servers for testing. The moment a
query result returns one of these urls and I select it, an error
occurs (cannot open document). After that has happended, I immediatly
start getting "Retrying URL: Connection reset by peer during fetch."
errors in the crawl diagnostics.

Any ideas please?

Regards

Asp

On May 8, 4:36 pm, "justin.brister" <justin.bris...@googlemail.com>
wrote:

brianb

unread,
Jun 2, 2009, 9:43:38 PM6/2/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
It definitely sounds like you may be overloading the smb server. When
you run a secure search, the GSA will send requests to your smb server
for each of the results to make sure that you have access to them.
then when you click on the result, the GSA then goes and tries to
retrieve that file. That said:

1. Try serving the results publicly. You can do this by going to Crawl
and Index -> Crawler Access and checking the Make Public checkbox.
Then just wait about 30 minutes for the changes to take effect.
2. If that does not help, try doing a tcpdump or some kind of packet
trace on your smb server. This will give you an idea of what is
causing that error as you can see the communications between the smb
server and the GSA directly. It is likely that the smb server is just
rejecting the tcp connections which is causing that error.

Hope this helps.

Brian

AsparagusX

unread,
Jun 3, 2009, 3:40:21 AM6/3/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I have made some progress on this. I installed an apache server and
redirected a url to the same folder on the NAS and then, rather than
using SMB crawling I used normal HTTP crawling and the documents are
found correctly. If i select the document, it does display the results
correctly. Question - what is now different? Same network - same IP
address, but the crawling method has changed. Does this point to a
problem with SMB crawling in VGSA? Is there an update planned on VGSA
in the near future?

Regards

Asp

miguev

unread,
Jun 4, 2009, 6:30:12 AM6/4/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini

Hi AsparegusX,

HTTP and SMB are very different protocols, so no wonder things work
differently when crawling them.
HTTP is always preferred over SMB, but you can indeed crawl SMB even
though it's a bit tougher.
In your case, it seems puzzling that the SMB server can't cope with
the appliance, even with low host load, have you looked at the Windows
logs to see why is it not responding timely?
Reply all
Reply to author
Forward
0 new messages