Configuring Sharepoint (robots.txt)

492 views
Skip to first unread message

Whitney

unread,
Dec 3, 2008, 11:16:25 AM12/3/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi,

I'm trying to set up our GSA to crawl our Sharepoint (MOSS 2007)
sites, according to this document.

http://code.google.com/apis/searchappliance/documentation/connectors/110/connector_admin/sharepoint_connector.html

When I try to crawl, I get "Retrying URL: Host unreachable while
trying to fetch robots.txt." The procedure says nothing about this
error for 2007, and I tried recrawling (as specified for 2003 after
defining the managed path), although I defined the path after creating
my robots.txt file.

Does anything have to be in the robots.txt besides this?
User-agent: *
Disallow:

I'm pretty sure it's in my root directory (IIS, site properties, Home
Directory tab... followed that path and created my robots.txt), and I
defined it as Explicit Inclusion managed path in Sharepoint
Administration.

One thing I did differently was that I created a new default internal
path (for FQDN) and set the old one (short) to Intranet. I did this
because we have all of the users pointing to the short name via GPO,
and I wasn't sure if I just switched it over to FQDN, if it would
automatically resolve, or if it would be inaccessible at the short
name (and so appear to them that Sharepoint's down). I didn't want to
remove the short until I knew how it would work. Would this variation
from the procedure cause a problem?

Any other ideas are welcome (if you need more information, just ask).

~Whitney

Jyothi

unread,
Dec 3, 2008, 1:42:29 PM12/3/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi whitney,

In order to crawl through the Sharepoint you should have a connector
installed.
here is the link that gives the information about the connector and
installing it

http://code.google.com/apis/searchappliance/documentation/50/connector_admin/sharepoint_connector.html

Here is a discussion on regarding the installation and the appliying
it to the GSA

http://code.google.com/apis/searchappliance/documentation/50/connector_admin/sharepoint_connector.html

Go through it and let me know.

By the way install the connecton on the server wher the Sharepoint is
present

-Jyothi



On Dec 3, 9:16 am, Whitney <whitney.l...@creativelogicgroup.com>
wrote:
> Hi,
>
> I'm trying to set up our GSA to crawl our Sharepoint (MOSS 2007)
> sites, according to this document.
>
> http://code.google.com/apis/searchappliance/documentation/connectors/...

Jyothi

unread,
Dec 3, 2008, 1:43:32 PM12/3/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
sorry the discussion link is here

http://groups.google.com/group/Google-Search-Appliance-Help/browse_thread/thread/5bb393af333bbfe4?hl=en&q=



On Dec 3, 11:42 am, Jyothi <eswarec...@gmail.com> wrote:
> Hi whitney,
>
> In order to crawl through the Sharepoint you should have a connector
> installed.
> here is the link that gives the information about the connector and
> installing it
>
> http://code.google.com/apis/searchappliance/documentation/50/connecto...
>
> Here is a discussion on regarding the installation and the appliying
> it to the GSA
>
> http://code.google.com/apis/searchappliance/documentation/50/connecto...
> > ~Whitney- Hide quoted text -
>
> - Show quoted text -

Whitney

unread,
Dec 3, 2008, 3:35:46 PM12/3/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Thanks for the reply. I already have a connector manager/connector
installed (both have the green light), and the link I posted (the one
I've been working with) is almost identical to the one you posted, so
I didn't find much help there, since I've been through it several
times, and as far as I can tell, everything is set correctly.

From the discussion, I noticed that I had not made my crawler account
a local administrator on the Sharepoint host (which is necessary), but
I set that and recrawled... then I restarted the connector (just in
case) and recrawled again... everything looks right to me (and yet,
when I recrawl, it's still giving me the same "Retrying URL: Host
unreachable while trying to fetch robots.txt" error under Crawl
Diagnostics).

~Whitney

On Dec 3, 1:43 pm, Jyothi <eswarec...@gmail.com> wrote:
> sorry the discussion link is here
>
> http://groups.google.com/group/Google-Search-Appliance-Help/browse_th...
> > - Show quoted text -- Hide quoted text -

Jyothi

unread,
Dec 3, 2008, 4:13:46 PM12/3/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
"Retrying URL: Host unreachable while trying to fetch robots.txt" this
error is a pain in the ass. I am suffering from the same error when
trying to crawl through a web site. I spoke to the GSA customer
service and the reply was that the site is being blocked by the
firewall and that he cannot provide any further assistance

Try checking the network setting of the site in Administration >
Network setting > Enter the URL in the Network Diagnostics and see
what it says.

Thanks,
Jyothi



On Dec 3, 1:35 pm, Whitney <whitney.l...@creativelogicgroup.com>
wrote:

Joe D'Andrea

unread,
Dec 4, 2008, 8:07:28 AM12/4/08
to Google-Search-...@googlegroups.com
On Wed, Dec 3, 2008 at 4:13 PM, Jyothi <eswar...@gmail.com> wrote:

> "Retrying URL: Host unreachable while trying to fetch robots.txt" this
> error is a pain in the ass. I am suffering from the same error when
> trying to crawl through a web site.

One thing I've noticed with robots.txt: It's important that the web
server provide a definitive status in response - preferably a 200 or
404. (AuthN/AuthZ notwithstanding.)

I have encountered many a server where this is _not_ the case. After
adjustment, the "Host unreachable while trying to fetch robots.txt"
error goes away.

--
Joe D'Andrea
Liquid Joe LLC
www.liquidjoe.biz
+1 (908) 781-0323

Whitney

unread,
Dec 4, 2008, 8:13:54 AM12/4/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
"Return Code 401, should be 200"

So it's something with the authentication. I added permission for my
crawling account to the robots.txt, which didn't make a difference.
Do you know where else I might need to add permission?

~Whitney

Whitney

unread,
Dec 4, 2008, 8:29:23 AM12/4/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Something else just occurred to me. I made my crawler account a local
administrator yesterday, and Administrators had permission to that
file (and probably all files), which is why it didn't make a
difference when I set permissions (again) on the file for my crawler
account. Something else is up.

On Dec 4, 8:13 am, Whitney <whitney.l...@creativelogicgroup.com>
wrote:

Joe D'Andrea

unread,
Dec 4, 2008, 8:33:29 AM12/4/08
to Google-Search-...@googlegroups.com
On Thu, Dec 4, 2008 at 8:13 AM, Whitney
<whitne...@creativelogicgroup.com> wrote:

> "Return Code 401, should be 200"

Ahh! Now, where are you spotting that status? (In your
browser/user-agent, in the GSA Diags, elsewhere?)

> So it's something with the authentication. I added permission for my
> crawling account to the robots.txt, which didn't make a difference.

To clarify - do you mean you added gsa-crawler to robots.txt? Or
something else ... ?

> Do you know where else I might need to add permission?

Normally I would direct you to "Crawl and Index > Crawler Access" but
... let's double-check the connector docs:

http://snurl.com/743rm [code_google_com]

If you want to go the quick-and-dirty route (or perhaps just as a
bonus reality check), can you force IIS to serve just robots.txt w/o
_any_ AuthN? This may not be a best practice, but in a pinch it might
do the trick.

Whitney

unread,
Dec 4, 2008, 8:35:12 AM12/4/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> One thing I've noticed with robots.txt: It's important that the web
> server provide a definitive status in response - preferably a 200 or
> 404. (AuthN/AuthZ notwithstanding.)

Can I make it provide a definitive response (if so, how)?

Joe D'Andrea

unread,
Dec 4, 2008, 8:34:57 AM12/4/08
to Google-Search-...@googlegroups.com
On Thu, Dec 4, 2008 at 8:33 AM, Joe D'Andrea <jdan...@gmail.com> wrote:
> Normally I would direct you to "Crawl and Index > Crawler Access" but
> ... let's double-check the connector docs:
>
> http://snurl.com/743rm [code_google_com]

Ah-ha!

"The Google Search Appliance cannot crawl the content unless a
robots.txt file is present in the SharePoint site's root directory.
Ensure that you create a robots.txt file and ensure that the file is
public."

So you _do_ need to make robots.txt public, at least if I understand
the above correctly.

Whitney

unread,
Dec 4, 2008, 9:08:06 AM12/4/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I created it at http://sharepointexample.mydomain.com:10000/robots.txt
(I believe this would be the root directory); however, I'm trying to
start the crawl at http://sharepointexample.mydomain.com:10000/MainSite/
I don't know if that would make any difference (I assume it would be
passing by the root directory - and robots.txt - to get to the main
site), but just in case... :)

How do I make robots.txt public? Is that referring to giving everyone
read permission (or something like), or is it something I need to do
on the GSA?

~Whitney

On Dec 4, 8:34 am, "Joe D'Andrea" <jdand...@gmail.com> wrote:

Joe D'Andrea

unread,
Dec 4, 2008, 9:21:31 AM12/4/08
to Google-Search-...@googlegroups.com
On Thu, Dec 4, 2008 at 8:29 AM, Whitney
<whitne...@creativelogicgroup.com> wrote:

> Something else just occurred to me. I made my crawler account a local
> administrator yesterday, and Administrators had permission to that
> file (and probably all files), which is why it didn't make a
> difference when I set permissions (again) on the file for my crawler
> account. Something else is up.

I think you're on to something. Also from the doc:

"The Microsoft SharePoint connector and the Google Search Appliance
require user credentials for traversal and indexing. Google recommends
that you use a single user account for both."

To your question about how to get a 200 or 404 response for robots.txt
... perhaps try this?

http://www.starznet.co.uk/sharepoint/blog/RobotstxtinSharePoint.htm

--
Joe D'Andrea

Joe D'Andrea

unread,
Dec 4, 2008, 9:22:52 AM12/4/08
to Google-Search-...@googlegroups.com
On Thu, Dec 4, 2008 at 9:08 AM, Whitney
<whitne...@creativelogicgroup.com> wrote:
>
> I created it at http://sharepointexample.mydomain.com:10000/robots.txt

Good - that's where you want it - at the docroot (even if you start
your crawl at a lower level).

> How do I make robots.txt public?

See my previous msg (we just missed each other) - that might do the trick!

--
Joe D'Andrea

Whitney

unread,
Dec 4, 2008, 10:29:27 AM12/4/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Ok, I've enabled Anonymous access on robots.txt through IIS
(robots.txt file properties > File Security). I don't think I can do
an IIS reset mid-day (would restarting the site implement the change,
or does it have to be a full-blown IIS reset?), but maybe I can do it
tonight, and then I'll post an update.

Thanks,
Whitney

On Dec 4, 9:22 am, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> On Thu, Dec 4, 2008 at 9:08 AM, Whitney
>
> <whitney.l...@creativelogicgroup.com> wrote:
>
> > I created it athttp://sharepointexample.mydomain.com:10000/robots.txt

Whitney

unread,
Dec 5, 2008, 10:53:43 AM12/5/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I did not reset IIS last night, so I don't know if that would resolve
this, but I just noticed under my Crawl Diagnostics for the Sharepoint
URL, at 10:36 a.m. yesterday (I guess this was the next recrawl after
enabling anonymous access), there's an "Info: Redirected URL" and then
"Excluded: Authentication Failed."... well, at least the "Retrying
URL:... robots.txt" message is gone, but it seems like a step in the
wrong direction. :P

Did I enable anonymous access incorrectly? Should I just assign
'Everyone' Read permission?

~ Whitney

On Dec 4, 10:29 am, Whitney <whitney.l...@creativelogicgroup.com>
wrote:
> > +1 (908) 781-0323- Hide quoted text -

Joe D'Andrea

unread,
Dec 5, 2008, 5:57:32 PM12/5/08
to Google-Search-...@googlegroups.com
On Fri, Dec 5, 2008 at 10:53 AM, Whitney
<whitne...@creativelogicgroup.com> wrote:

> I did not reset IIS last night, so I don't know if that would resolve
> this, but I just noticed under my Crawl Diagnostics for the Sharepoint
> URL, at 10:36 a.m. yesterday (I guess this was the next recrawl after
> enabling anonymous access), there's an "Info: Redirected URL" and then
> "Excluded: Authentication Failed."...

Progress! Sort of. :)

> ... well, at least the "Retrying URL:... robots.txt" message is gone, but it seems
> like a step in the wrong direction. :P

At least now you're arriving at a more definitive end result.

> Did I enable anonymous access incorrectly? Should I just assign
> 'Everyone' Read permission?

Looks like robots.txt is still being blocked by AuthN. So long as you
set the robots.txt permissions to be anonymous read, the next step[1]
is to try an IISReset. THEN, try to get robots.txt using your web
browser - don't even bother with the GSA for this test. If you get the
file without a login (note: make sure you aren't already logged in),
then I think you're set!

- Joe

Siva

unread,
Dec 13, 2008, 9:33:01 PM12/13/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi Jyothi,
Can u Please Help me How to Connect and Configure Google Search
Appliance
if any sample links and trial download is available
please do favour for me
Thanks
L.Siva Subramanian

Whitney

unread,
Dec 22, 2008, 4:11:43 PM12/22/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Well, two weeks later... I did the IISReset. The GSA is still giving
the Excluded: Authentication Failed crawl error, but I opened a new
browser, signed out of Sharepoint and then tried accessing robots.txt
and it displayed the text in the browser, so... I don't know what that
means, since the GSA still isn't crawling Sharepoint (apparently).

On Dec 5, 5:57 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> On Fri, Dec 5, 2008 at 10:53 AM, Whitney
>

Joe D'Andrea

unread,
Dec 22, 2008, 4:28:42 PM12/22/08
to Google-Search-...@googlegroups.com
On Mon, Dec 22, 2008 at 4:11 PM, Whitney
<whitne...@creativelogicgroup.com> wrote:
> Well, two weeks later... I did the IISReset. The GSA is still giving
> the Excluded: Authentication Failed crawl error, but I opened a new
> browser, signed out of Sharepoint and then tried accessing robots.txt
> and it displayed the text in the browser, so... I don't know what that
> means, since the GSA still isn't crawling Sharepoint (apparently).

Arrrgh. OK. Reality check Q: Without being logged in from the browser,
you can reach robots.txt from the docroot - that is:

http://my.site/robots.txt

and not:

http://my.site/some/directory/robots.txt

What else ... does the GSA show any change in crawl diagnostics with
respect to robots.txt?

- Joe

Whitney

unread,
Dec 26, 2008, 2:42:11 PM12/26/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> Without being logged in from the browser...

After the IISReset, I tried logging out of Sharepoint (since it
automatically logs you in according to your computer/network login)
and I was able to access it. Is that what you meant? Of course, when
I tried it again just now, it didn't let me sign out without closing
the browser, so I'm not sure how I did it or what I did the other day
(it seemed to work). (I think I figured it out: I tried it from the
server, with the enhanced browser security, which asked me to log in.
I hit cancel, (got the "You are not authorized to view this page" 401
error), and changed the end of the URL to robots.txt.) Is there a
different/better way?

> What else ... does the GSA show any change in crawl diagnostics with
> respect to robots.txt?

Nothing. I see one retrieval error on <servername> until I drill down
to servername\mainsite, at which point I see "Excluded: Authentication
Failed"... if I click the backslash (root directory of the main site)
it has Info: Redirected URL and Excluded: Authentication Failed at the
time of each crawl (same as before the IISReset).

I tried the URL test on Network Settings, and http://serverFQDN/robots.txt
came through OK. The short URL came through invalid (not surprising,
but I was just testing), and the serverFQDN (no robots.txt on the end)
came through with the "Returncode 401, should be 200"

Joe D'Andrea

unread,
Dec 27, 2008, 9:26:02 AM12/27/08
to Google-Search-...@googlegroups.com
On Fri, Dec 26, 2008 at 2:42 PM, Whitney
<whitne...@creativelogicgroup.com> wrote:

> After the IISReset, I tried logging out of Sharepoint (since it
> automatically logs you in according to your computer/network login)
> and I was able to access it.

Ahh, we need to try it from a location that isn't logged in (or can't
get automatically logged in).

It sounds like you did that though:

> I tried it from the
> server, with the enhanced browser security, which asked me to log in.
> I hit cancel, (got the "You are not authorized to view this page" 401
> error), and changed the end of the URL to robots.txt.)

Ahh, good. At least you can get to it now, which means the GSA should
also be able to.

> Nothing. I see one retrieval error on <servername> until I drill down
> to servername\mainsite, at which point I see "Excluded: Authentication
> Failed"...

So we still have an authentication issue. :\

> I tried the URL test on Network Settings, and http://serverFQDN/robots.txt
> came through OK. The short URL came through invalid (not surprising,

> but I was just testing) ...

Auugh! I know - this is frustrating.

I'm crossing fingers that we find out it's something trivial we both
missed, and we're both going to virtually smack our foreheads. (From
the "it's easy when you know how" department!)

Hmm ... remind me, did we try this?

http://www.findabilityproject.org/?p=228

Do you need FQDN resolution for host names enabled? (It's disabled by default.)

http://snurl.com/95t1x [code.google.com]

Try this to force a full recrawl (slightly different from before):

http://snurl.com/95t30 [code.google.com]

- Joe

Whitney

unread,
Dec 29, 2008, 3:22:50 PM12/29/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> Hmm ... remind me, did we try this?
>  http://www.findabilityproject.org/?p=228

No, I don't think we had before. When I navigated to the connector
folder, the XML file was not there (only the PROPERTIES file)... I ran
through the steps, and when it was recreated, the PROPERTIES file was
still the only one there. This might be [part of] the problem.

> Do you need FQDN resolution for host names enabled? (It's disabled by default.)
>  http://snurl.com/95t1x[code.google.com]

Can't do this yet, since there's no such xml file. :(

Joe D'Andrea

unread,
Dec 29, 2008, 7:37:23 PM12/29/08
to Google-Search-...@googlegroups.com
On Mon, Dec 29, 2008 at 3:22 PM, Whitney
<whitne...@creativelogicgroup.com> wrote:
>
>> Hmm ... remind me, did we try this?
>> http://www.findabilityproject.org/?p=228
>
> No, I don't think we had before. When I navigated to the connector
> folder, the XML file was not there (only the PROPERTIES file)... I ran
> through the steps, and when it was recreated, the PROPERTIES file was
> still the only one there. This might be [part of] the problem.

Hmm. I wonder if there's a permissions problem on that folder
(preventing the XML from being generated)?

>> Do you need FQDN resolution for host names enabled? (It's disabled by default.)
>> http://snurl.com/95t1x[code.google.com]
>
> Can't do this yet, since there's no such xml file. :(

Aye. :(

- Joe

Whitney

unread,
Dec 30, 2008, 2:41:40 PM12/30/08
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
I checked permissions... as far as I see, it's the same there as any
of the other folders (moving up) and there are xml files in some of
the folders above. :(

On Dec 29, 7:37 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> On Mon, Dec 29, 2008 at 3:22 PM, Whitney
>

Whitney

unread,
Jan 5, 2009, 4:57:55 PM1/5/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Well, I don't know what happened, but Sharepoint_state.xml magically
appeared in that folder... I was looking through it in Notepad, and I
don't see the FQDNconversion. Is this still the wrong file for the
Advanced Configuration http://snurl.com/95t1x[code.google.com]? I
noticed that it says connectorinstance.xml, but the other link (for
recreation of connector) procedure mentions this
Sharepoint_state.xml... I was assuming they were referring to the same
file.

On Dec 29 2008, 7:37 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> On Mon, Dec 29, 2008 at 3:22 PM, Whitney
>

B.A.

unread,
Jan 6, 2009, 5:04:58 AM1/6/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini

Whitney, you may need a neutrons cannon here: Wireshark ;-)

As long as you can use HTTP (not HTTPS) you can capture the traffic
between the appliance and IIS --the easiest would be running Wireshark
on the IIS server-- and see what the HTTP conversation looks like.
There you can see how exactly the appliance is requesting robots.txt
and what exactly is the server responding. Then you can do the same
replacing the appliance with your own browser, compare and possibly
tell us what difference you find.

On Dec 30 2008, 7:41 pm, Whitney <whitney.l...@creativelogicgroup.com>
wrote:

B.A.

unread,
Jan 6, 2009, 5:06:01 AM1/6/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini

Whitney, you may need a neutrons cannon here: Wireshark ;-)

As long as you can use HTTP (not HTTPS) you can capture the traffic
between the appliance and IIS --the easiest would be running Wireshark
on the IIS server-- and see what the HTTP conversation looks like.
There you can see how exactly the appliance is requesting robots.txt
and what exactly is the server responding. Then you can do the same
replacing the appliance with your own browser, compare and possibly
tell us what difference you find.

On Dec 30 2008, 7:41 pm, Whitney <whitney.l...@creativelogicgroup.com>
wrote:

Whitney

unread,
Jan 7, 2009, 11:18:15 AM1/7/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> I'm crossing fingers that we find out it's something trivial we both
> missed, and we're both going to virtually smack our foreheads. (From
> the "it's easy when you know how" department!)
> - Joe

I think I may have found it... won't know for sure until Monday (the
JDK requires reboot, which I can't do until Sunday)... I'm not sure
how I missed this; I probably overlooked it, because I thought I had
the answers: I read somewhere else (I know I did, but I can't find it
now) that the Sharepoint connector uses either 1.4.2 or 1.5. This
says differently, and that would sure explain a lot. :P I found two
similar versions of this document (twice! I missed it twice!); the
second seems more to the point. I looked for the errors they
mentioned and didn't find them, but I did double-check and I'm running
1.5... and the Sharepoint_state.xml file wasn't created initially, as
we know (although it did magically appear, but I'm still not sure it's
populated correctly :)
---------------------------------
http://code.google.com/apis/searchappliance/documentation/connectors/110/connector_admin/sharepoint_connector.html
"State File Not Created or
javax.xml.transform.TransformerFactoryConfigurationError Error in Log
You might see the following error in the stderr_date log:

javax.xml.transform.TransformerFactoryConfigurationError: Provider
org.apache.xalan.processor.TransformerFactoryImpl not found

The error means that the Tomcat Java compatibility patch for JDK 1.4
is installed, but the connector manager is running with JDK 1.5. The
SharePoint connector requires JDK 1.4.2. Under JDK 1.5, the SharePoint
connector appears to function, but some functionality fails silently.
Another symptom of using the wrong JDK is that the state file is not
created in tomcat\webapps\connector-manager\web-inf\connectors
\connector_name\connector_name\."
----------------------------------
http://code.google.com/apis/searchappliance/documentation/50/connector_admin/sharepoint_connector.html#erranch
State File Not Created or
javax.xml.transform.TransformerFactoryConfigurationError Error in Log
You might see the following error in the stderr_date log:

javax.xml.transform.TransformerFactoryConfigurationError: Provider
org.apache.xalan.processor.TransformerFactoryImpl not found

The error means that the connector manager is running with JDK 1.5.
The SharePoint connector requires JDK 1.4.2. Under JDK 1.5, the
SharePoint connector appears to function, but some functionality fails
silently. Another sympton of using the wrong JDK is that the state
file is not created in tomcat\webapps\connector-manager\web-inf
\connectors\connector_name\connector_name\.

To correct the error:

Delete the connector and connector manager on the Google Search
Appliance Admin Console.
Uninstall the connector and connector manager on the connector manager
host.
Install JDK 1.4.2 on the connector manager host.
Use the installation instructions in this document to recreate the
connector manager and connector.


I sure hope this fixes it!
~Whitney

Whitney

unread,
Jan 12, 2009, 11:08:41 AM1/12/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
> I read somewhere else (... I can't find it
> now) that the Sharepoint connector uses either 1.4.2 or 1.5.  This
> says differently, and that would sure explain a lot. :P

I found it: under Installing the Connector Using the Installer (http://
code.google.com/apis/searchappliance/documentation/connectors/110/
connector_admin/sharepoint_connector.html) it says "Before installing
the connector using the installer, ensure that Java Development Kit
(JDK) 1.4.2 or 1.5 is installed on the host where you are installing
the connector."

Then under troubleshooting, it says 1.5 doesn't work. Oh, boy... what
fun. I got the versions switched around, so I'll post an update when
I see if I can get the connector working now.

~Whitney

Joe D'Andrea

unread,
Jan 12, 2009, 12:17:15 PM1/12/09
to Google-Search-...@googlegroups.com
Happy New Year! Did I miss anything? :-o

Auugh! OK, that's frustrating. Thwarted by conflicting docs. :(

Bonus points for your stick-to-it-ness on this. Keep us posted! (Wow I
have a lot of catching up to do with the posts. hehe.)

--
Joe D'Andrea

Whitney

unread,
Jan 12, 2009, 4:08:47 PM1/12/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
*sigh* No dice on the JDK. I installed 1.4.2, uninstalled 1.5,
installed a new Google Connector on the server, and set up the
Connector Manager and Connector Instance on the GSA (green lights on
both)... but I'm still getting the "Excluded" message. I've been
going through the setup procedures again, trying to figure out if I
forgot something/overlooked some part of the procedure.

~Whitney

On Jan 12, 12:17 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> Happy New Year! Did I miss anything? :-o
>
> On Mon, Jan 12, 2009 at 11:08 AM, Whitney
>

Joe D'Andrea

unread,
Jan 15, 2009, 9:30:00 AM1/15/09
to Google-Search-...@googlegroups.com
On Mon, Jan 12, 2009 at 4:08 PM, Whitney
<whitne...@creativelogicgroup.com> wrote:

> *sigh* No dice on the JDK.

Auuugh! Somehow I think there must be a hole in the wall nearby ...
about fist size. :(

OK. Broader RFC going out here (or RFH - request for help?). Anyone
else ever have to work through issues when getting Sharepoint and the
GSA to play nice?

- Joe

Whitney

unread,
Jan 21, 2009, 11:22:39 AM1/21/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
*chirp, chirp*

Well... at least I know I'm in good company. Lots of crickets have
had this problem. :P

On Jan 15, 9:30 am, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> On Mon, Jan 12, 2009 at 4:08 PM, Whitney
>

Joe D'Andrea

unread,
Jan 21, 2009, 12:32:03 PM1/21/09
to Google-Search-...@googlegroups.com
On Wed, Jan 21, 2009 at 11:22 AM, Whitney
<whitne...@creativelogicgroup.com> wrote:
>
> *chirp, chirp*
>
> Well... at least I know I'm in good company. Lots of crickets have
> had this problem. :P

I feel like we should just start fresh. Maybe there's some silly thing
we missed. :-o

- Joe

Whitney

unread,
Jan 27, 2009, 3:52:51 PM1/27/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Ok, so I was reviewing the document from my original post, and I
noticed the supported operating systems. I'm not very familiar with
differences between WS2K3 and R2; I think it's mainly features, but
might it include something, or is it enough that it could cause
problems? The document specifies R2 in supported OSs, and we're
running WS2K3 Enterprise (but not R2, to my knowledge).

~Whitney

On Jan 21, 12:32 pm, "Joe D'Andrea" <jdand...@gmail.com> wrote:
> On Wed, Jan 21, 2009 at 11:22 AM, Whitney
>

Joe D'Andrea

unread,
Jan 28, 2009, 10:34:31 AM1/28/09
to Google-Search-...@googlegroups.com
On Tue, Jan 27, 2009 at 3:52 PM, Whitney
<whitne...@creativelogicgroup.com> wrote:

> Ok, so I was reviewing the document from my original post, and I
> noticed the supported operating systems. I'm not very familiar with
> differences between WS2K3 and R2; I think it's mainly features, but
> might it include something, or is it enough that it could cause
> problems? The document specifies R2 in supported OSs, and we're
> running WS2K3 Enterprise (but not R2, to my knowledge).

I have access to WS2k3 but it's on an intranet - no GSA/Mini in the
vicinity. Otherwise I'd try that myself.

I'm still thinking we have a permissions problem somewhere along the
way, but I can't put my finger on it thus far.

- Joe

Reply all
Reply to author
Forward
0 new messages