Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Discussions > Sitemap Protocol > How to use OAI-PMH to disclose other websites?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  1 message - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Conal  
View profile  
 More options Jul 12 2007, 11:05 pm
From: Conal
Date: Fri, 13 Jul 2007 03:05:41 -0000
Local: Thurs, Jul 12 2007 11:05 pm
Subject: How to use OAI-PMH to disclose other websites?
I am working on a couple of library projects in which metadata
including hyperlinks are aggregated from a variety of sources and
republished via a portal. The websites which the hyperlinks are drawn
from are not all crawler-friendly, so most of these resources are not
known to Google at all. Unfortunately, the portal software is not
crawler-friendly either, so I'm tasked with investigating exposing the
aggregated metadata to Google via OAI-PMH.

I have an OAI-PMH server which I have registered with Google Sitemaps,
and in particular I need to disclose URLs with different domain names
than the OAI-PMH server itself, because these links have been
aggregated from a variety of sources. The issue is that  Google
Sitemaps rejects any URL whose domain was different to the OAI-PMH
server's domain.

Why is this? This restriction does not apply to URLs harvested by
spidering the web itself, so why should sitemaps be any different?

Anyway ... since I'd been expecting this behaviour, I also tested a
work-around: I used my OAI-PMH server to disclose a URL with the same
domain as the OAI-PMH server, but which used an HTTP redirection to
point to a location on a different ("foreign") domain. This URL was
accepted by Google Sitemaps, and I'm now waiting to tell if the
"foreign" content itself is eventually accepted by Googlebot and
indexed.

If this simple work-around succeeds, I wonder what is the point in
rejecting "foreign" links from OAI-PMH in the first place?

On the other hand, if the work-around fails, and the redirected URL is
rejected by Googlebot (because of using a redirect?), then how is it
possible to use Google Sitemaps to disclose "foreign" links to Google?
Would I have to re-publish all my hyperlinks in the form of HTML? That
would seem silly to me, since all the technical efficiencies of OAI-
PMH would then be lost.

Regards

Con


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »