robots.txt in LifeRay

535 views
Skip to first unread message

abc

unread,
Jun 1, 2009, 9:45:52 AM6/1/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi,

Does anyone have experience with LifeRay-supported web sites? My
colleagues using LifeRay claim it's not possible to create a file
under the standard name robots.txt, due to the LifeRay's internal
logic of serving files.

Thanks for your time

JMarkham

unread,
Jun 1, 2009, 12:51:48 PM6/1/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
They are correct. You can't have a robots.txt file in Liferay. If
you have sections of content outside Liferay, you can use robots.txt
there. Thus far, I have only used the googleon/googleoff tags
successfully within Liferay, though I'm fairly certain there would be
a way you could also use robots meta tags. If my suspicions are
correct, you could use the robots meta tags within a theme, which may
then cover "sections" of content that utilize that theme. For
individual portlet content, googleon/googleoff tags are the best
options I've found.

Still learning to work within Liferay, so those are the only ideas I
can give. If you do any experimentation, I'd love to hear results.

Jeff

abc

unread,
Jun 1, 2009, 1:54:25 PM6/1/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
We have a meeting on the subject this Wednesday. I'll let you know by
all means what our LifeRay guys recommend as 'the' solution. All I
know so far is that they are pretty much reluctant to use crawling
control on a page level - they call it 'micromanaging'.

I tried to use 'Do not Crawl URLs' in GSA admin console and concluded
that it's even more clumsy a way to address the problem since we have
about 100 LifeRay-supported web sites, each of them with its distinct
directory structure. I guess the 'Do not Crawl' list itself is of a
limitted size.

Thanks, Jeff, for your help. I'll let you know what happens next.

abc

JMarkham

unread,
Jun 1, 2009, 5:41:14 PM6/1/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
The thing about the micromanaging is that it improves your user
experience. As one for instance that I've implemented, I used the
googleon/googleoff tags to turn off text indexing of menu elements.
Before I did this, the keywords in the menu elements were returning
every page that contained that menu, rather than the relevant pages we
wanted the learner to see. Without the micromanaging, very bad user
experience.

Jeff

abc

unread,
Jun 2, 2009, 10:30:54 AM6/2/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Agreed - and applied in *my* web sites - but the problem is persuading
my colleagues to do what needs to be done in their web sites. They
expect me to have a silver bullet, a quick and dirty solution so they
don't have to do the work. Their idea of a workaround would be
applying regexps in the 'Do Not Crawl' list which is - according to my
experience - inadequate, at best. I anticipate a war in our meeting
tomorrow, our boss losing his patience regarding the search
functionality: he believes we've already spent too much time working
on it. People don't understand that arranging their web space for
quality crawling/indexing is a long-lasting, sometimes painstaking,
process.

Thanks, Jeff, for giving me a shoulder,
abc

Joe D'Andrea

unread,
Jun 2, 2009, 10:37:46 AM6/2/09
to Google-Search-...@googlegroups.com
My (unsolicited) $0.0002:

On Tue, Jun 2, 2009 at 10:30 AM, abc <vpo...@unl.edu> wrote:

> They expect me to have a silver bullet, a quick and dirty solution so they
> don't have to do the work.

Then they have no right to complain.

> I anticipate a war in our meeting tomorrow, our boss losing his patience
> regarding the search functionality: he believes we've already spent too
> much time working on it.

... and yet, to me, it isn't about the search functionality. It's
about prepping the content. The rest follows naturally out of that.

Let's take the Search Appliance (or Mini) out of the equation and drop
in any other search engine.

Would things be any different?

> People don't understand that arranging their web space for
> quality crawling/indexing is a long-lasting, sometimes painstaking,
> process.

Hear hear. :(

- Joe

abc

unread,
Jun 2, 2009, 11:22:21 AM6/2/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Joe,

You have a real subtle ear, a quality not found in too many people
these days. You've aptly noticed:

> ... and yet, to me, it isn't about the search functionality. It's
> about prepping the content. The rest follows naturally out of that.

I didn't mention everything there is to it: I've made a wrapper,
integrating GSA and AJAX/REST search, as could be seen in some of my
web sites (beef.unl.edu, cropwatch.unl.edu, marketjournal.unl.edu,
etc). It took me quite awhile to do the integration, so my boss's
impatience is understandable.

Another layer of integration would be the excellent job a colleague of
mine did for incorporating the search interface into the LifeRay-
supported web sites, whereby users can add/remove search by simply
checking radio buttons of a radio button group. Beautifull, can't be
simpler than that! You can check a LifeRay web site at: byf.unl.edu,
where you can also see what a mess it is when it comes to a poor
crawling control (search for 'garden', for instance, and you'll se a
bunch of documents which shouldn't be delivered by any means).

Maybe my understanding of the 'functionality' notion is not quite
accurate. And - no, I'm not splitting hair, on the contrary: I'm
learning.

Thanks for your $0.0002!

abc

unread,
Jun 4, 2009, 4:47:35 PM6/4/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Jeff,

Here's my report from our meeting, as promissed:

Actually, there's nothing new - our LifeRay guys settled with 'Do Not
Crawl URLs' patterns common to all development pages, for all web
sites. Which means that many of the pages for archive browsing remain
being indexed (pages like browse by category; browse by date, etc.) No
meta tags are going to be used, let alone Google tags for fine tuning.

In their opinion, I'm a perfectionist, waisting their and my precious
time. But then again, if they can live with a clattered index, so can
I (actualy, I have no choice).

Thanks for your help, Jeff.

Cheers!

JMarkham

unread,
Jun 4, 2009, 5:32:37 PM6/4/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Ouch, sorry to hear that. Thankfully, our number one requirement is
user experience, so they're loving all the fine tuning around here,
even upgrading our web development standards.

Wish I could rub some of that off on your peeps. :)

Jeff

Reply all
Reply to author
Forward
0 new messages