Intranet Documents Accessible via the Internet

1 view
Skip to first unread message

SkatterBox

unread,
Nov 20, 2009, 7:45:32 PM11/20/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
On my web page which uses a Google Search Appliance it's possible to
see internal documents by tampering with the search URL parameters;
specifically site, client, and proxystylesheet. The indexed documents
aren't directly accessible, however many of the documents can be
accessed from an external perspective via the google cache (provided
as a link in the results). By default the "cache" link is not
presented to the end user, however by tampering with the "output"
query parameter it's possible to change the content from "xml_no_dtd"
to "rich". This causes the search engine to present the results in a
default Google format, which includes the "cache" link. The Google
Search Appliance we have deployed is basically used in two different
contexts. One collection is used for our publicly facing web content
and a second for our internal web site content. I want to know how I
can prevent internal URLs from being returned in searches from our
website (even if URL parameters are tampered with), and how I can
prevent internal documents residing in googles cache from being
externally accessible. Any assistance would be greatly appreciated.
I've done a good bit of google searching on this topic and couldn't
find any documented solution to this particular issue. Thanks!

Michael Cizmar

unread,
Nov 21, 2009, 11:15:10 AM11/21/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
There is no parameter that I know of which specifices the output
parameter as "rich". What is probably happing is that you are getting
a redirect to the default stylesheet which contains the cached link.

In your scenrio, it would appear that you're internal documents do not
have document level security. So they are being servied as public.
If it is possibly, you should secure these documents so that a user is
authenticated and authorized to view to the document. This
authorization is performed when a cached link is used as well.

If that is not possible then you should do the following:

1) remove the cached link from the default_frontend
This way if someone manipulates your url they will see the default
collection and nothing sensative.
2) change the default collection to only have external documents
This way only documents that are visible externally will be displayed
to the public
3) create a collection called all_collection or
mycompany_all_collection and use the pattern "/"
You'll need a collection that contains all of the documents for
admnistrative purposes.

Cheers,

Michael Cizmar | MC+A
Google Enterprise Partner
http://www.mcplusa.com/blog/ | twitter: http://www.twitter.com/mcplusa
Reply all
Reply to author
Forward
0 new messages