OAI-PMH.xsd isn't responsible for schema validation

54 views
Skip to first unread message

Christian Bethge

unread,
Apr 2, 2025, 9:51:52 AMApr 2
to OAI-PMH
Vaildation of an OAI response isn't possible with XML tools like Oxygen.

I checked the URL with curl

curl --header "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:136.0) Gecko/20100101 Firefox/136.0" "https://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" -> HTTP: 200

The XSD is only responsible for "normal" browser, but the tools for validation of XMLs use normaly other User-Agents... Is the XSD not for programmatically and automatic use case? Why is the file only reachable for real humans with a browser?

Best regards,
Christian Bethge

Simeon Warner

unread,
Apr 3, 2025, 8:55:27 AMApr 3
to OAI-PMH
Hi Christian,

Requests are not denied based on curl or other user agents:

> curl -s -o /dev/null -w "%{http_code}\\n" https://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
200
> curl -s -o /dev/null -w "%{http_code}\\n" --header "UserAgent: BadBot" https://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
200
> curl -s -o /dev/null -w "%{http_code}\\n" --header "UserAgent:" https://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
200

I also can't find any 403 for the XML schema on April 1 and 2 in the logs. If you'd like me to look to see whether I can find a request from you, please add a query string that would help me pull it out of the log, e.g.


and let me know what time you tried.

Cheers,
Simeon

Simeon Warner

unread,
Apr 3, 2025, 9:01:48 AMApr 3
to OAI-PMH
Darn... I missed out the hyphen in the attempts to set the user agent string in the previous. Same results with correctly set user agent.

Cheers,
Simeon

John Salter

unread,
Apr 3, 2025, 9:40:13 AMApr 3
to oai...@googlegroups.com

Hi Simeon,
I can replicate the 403 responses from where I am.

There should be one with ‘JLRS’ as the User-Agent (and one with the UserAgent) too.

If I make a normal ‘Chrome’ user-agent, and append ‘JLRS/wibble’, I get in.

If I prepend that value to the user-agent instead, I get a 403.

 

On a successful request, these response headers are present:

    x-amzn-waf-action: challenge

    Access-Control-Allow-Headers: x-amzn-waf-action
which I guess might be the clue needed.

 

Cheers,

John

 

From: oai...@googlegroups.com <oai...@googlegroups.com> On Behalf Of Simeon Warner
Sent: 03 April 2025 14:02
To: OAI-PMH <oai...@googlegroups.com>
Subject: [OAI-PMH] Re: OAI-PMH.xsd isn't responsible for schema validation

 

CAUTION: External Message. Use caution opening links and attachments.

--

---
You received this message because you are subscribed to the Google Groups "OAI-PMH" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oai-pmh+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/oai-pmh/a08fd891-c5d0-424d-8eab-4ea30b3e9e40n%40googlegroups.com.

Simeon Warner

unread,
Apr 3, 2025, 10:41:25 AMApr 3
to OAI-PMH
Thanks John. Maybe I was testing from too close (my office), have to explore from home. Perhaps WAF settings are affecting this site as we try to protect library services from all the rampant AI harvesting bots currently attacking us.

Cheers,
Simeon

Simeon Warner

unread,
Apr 3, 2025, 9:25:00 PMApr 3
to oai...@googlegroups.com
Apologies and thanks, Christian, John and Angus.

I can indeed replicate from home and will follow up with our systems folks.

Cheers,
Simeon




Simeon Warner

unread,
Apr 9, 2025, 7:02:25 PMApr 9
to OAI-PMH
Hi all,

This should be fixed and works for me from outside the Cornell network:

simeon@Cyser ~> curl -s -o /dev/null -w "%{http_code}\\n" https://www.openarchives.org/OAI/2.0/OAI-PMH.xsd
200

Cheers,
Simeon
Reply all
Reply to author
Forward
0 new messages