Crawl Content Protected by IIS NTLM

63 views
Skip to first unread message

timalex

unread,
Nov 20, 2008, 10:04:12 AM11/20/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Okay. Pulling hair out here..

This is what i've done.

Created user in active directory for google.

Granted user read privilages to directory on IIS web server where i'd
like GSA to crawl content.

In IIS went to this directory and right click properites --> directory
security --> authentication and access control: I've unchecked 'enable
anonymous access' and in the authenticated access dialog the only
radio button selected is 'integrated windows authentication'.

Went into Crawl and Index --> Crawler Access and entered correct URL
pattern entered correct user name, domain and password. clicked save.

Added another entry directly to robots.txt file with correct username
and password and domain.

robots.txt content:

User-Agent: gsa-crawler
Disallow:

Created collection with url pattern.

Checked status and reports --> crawl diagnostics for this collection
and the only status i get is:

Excluded: Authentication Failed.

I've verified i'm using proper username and credentials by hitting the
URL pattern i'm using from a web browser, getting windows challenge
box (i.e. username and password) entering them there and bingo i'm in.

wtf??? what am i missing?

Tim Stevens

unread,
Nov 20, 2008, 10:22:45 AM11/20/08
to Google-Search-...@googlegroups.com
Does the user i've created for the GSA to crawl this content have to be a member of the Domain???

________________________________________
From: Google-Search-...@googlegroups.com [Google-Search-...@googlegroups.com] On Behalf Of Tim Stevens
Sent: Thursday, November 20, 2008 10:04 AM
To: Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Subject: None Crawl Content Protected by IIS NTLM

brianb

unread,
Nov 20, 2008, 9:44:14 PM11/20/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi Tim,

Yes, I believe it would need to be a member of the domain since you
are using NTLM and would need to set the domain in the Crawler Access.
When the domain is there, the GSA will login as "domain\user".

Let us know how it goes.

Brian

Tim Stevens

unread,
Nov 21, 2008, 10:43:32 AM11/21/08
to Google-Search-...@googlegroups.com
Brian,
Thanks for replying. the user we created is a member of the domain and i do have the domain populated in the settings.

When i test the url pattern in administration --> network settings i get a 'returncode 401, should be 200'.

this machine is not in our dmz so i'm wondering if the attempt is failing because it can't talk directly to the domain controller.

Would setting up LDAP Directory Server Addresss settings be something i should pursue?

________________________________________
From: Google-Search-...@googlegroups.com [Google-Search-...@googlegroups.com] On Behalf Of brianb [bria...@gmail.com]
Sent: Thursday, November 20, 2008 9:44 PM
To: Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Subject: None Re: Crawl Content Protected by IIS NTLM

brianb

unread,
Nov 21, 2008, 7:12:47 PM11/21/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi Tim,

Hmmm... Yeah the GSA I believe would need to be able to reach the DC
in order to complete the authentication. Regarding the 401 you are
seeing, the GSA does not use login credentials for the network
diagnostics and it is only meant to see if there is network
connectivity so the 401 would be expected. A couple things to try as a
test:

1. Disable integrated auth. I don't think this will make a difference
but it could be changing something.
2. Enable anonymous access temporarily and crawl again. This will at
least tell us that there is nothing inherently wrong and that the
problem is definitely with auth settings.
3. I am not sure what you have in Crawler Access at this point but for
sanity sake, try to broaden the URL pattern you have there. What I
mean is to only put the host there with the trailing slash or if you
only have this one protected site, just try putting a slash. That will
tell the GSA that each time it gets a 401 for anywhere to use those
credentials.

If none of that helps I would contact support with the above
information and see if they can maybe find something.

Brian

Regarding LDAP settings on the GSA, that is actually only used for
secure serving as opposed to crawling so probably wouldn't be of much
use.

On Nov 22, 12:43 am, Tim Stevens <tstev...@advsol.com> wrote:
> Brian,
>   Thanks for replying. the user we created is a member of the domain and i do have the domain populated in the settings.
>
> When i test the url pattern in administration --> network settings i get a 'returncode 401, should be 200'.
>
> this machine is not in our dmz so i'm wondering if the attempt is failing because it can't talk directly to the domain controller.
>
> Would setting up LDAP Directory Server Addresss settings be something i should pursue?
>
> ________________________________________
> From: Google-Search-...@googlegroups.com [Google-Search-...@googlegroups.com] On Behalf Of brianb [brianj...@gmail.com]

timalex

unread,
Nov 25, 2008, 10:27:20 AM11/25/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I've got this solved thanks to support as well as the posts here. It
never fails that the answer to something is usually something simple.

In my case i had everything setup correctly except for the fact that i
needed to use my protected URL patter in both the start crawling from
dialog box as well as the follow and crawl only dialog box.

oy vey!!

anyway. there it is...yeah i know...DUH!!!
> > are usingNTLMand would need to set the domain in the Crawler Access.
> > > wtf???  what am i missing?- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages