OAI-PMH and Javascript challenge

3 views
Skip to first unread message

paclo...@gmail.com

unread,
Jul 2, 2026, 3:22:03 PM (19 hours ago) Jul 2
to AtoM Users
Hi all -

I'm running AtoM v2.9.2 currently. I'm working on another project that will involve periodically pulling information from AtoM via OAI-PMH. I can navigate to and access the records I need in the browser, the issue appears when I try to automate the process. The script I wrote to process the OAI-PMH constantly hits the Javascript challenge page. I attempted a bunch of workarounds, but was unsuccessful. I've added `/oai` and `/;oai` to the `endpoint_exceptions` in the "appChallenge.yml" file without success. I don't know if it matters, but I specifically need the EAD versions of the records; I was able to get the DC versions. Has anyone implemented this? Am I missing something obvious? Thanks for any help!

Johan Pieterse

unread,
4:20 AM (6 hours ago) 4:20 AM
to AtoM Users
Short version: the reason adding `/oai` and `/;oai` to `endpoint_exceptions` had no effect is a matching quirk in the challenge filter. It matches your exception paths against the full `REQUEST_URI` "including the query string", and the generated regex only allows the base path to be followed by a slash-subpath or nothing - not by a `?query`. OAI is requested as `/;oai?verb=...`, i.e. the base path immediately followed by `?`, so the exception never matches and the challenge always fires.

Why (from `lib/challenge/filter.php`)

Each exception is turned into:
#^<escaped-path>(/.*)?$#

and tested against `$_SERVER['REQUEST_URI']`. So:
- `/api` works because API calls look like `/api/informationobjects?...` - there is a `/subpath` after `/api`, and the `(/.*)` greedily swallows the rest including the query string.
- `/oai` (or `/;oai`) does **not** work because a real OAI request is `/;oai?verb=ListRecords&metadataPrefix=oai_ead` - straight to `?query` with no subpath. `(/.*)?$` cannot match a `?...` that directly follows the base, so it falls through to the challenge.

That is why your exceptions were ignored.

 Fixes (pick one)
1. Bypass by client IP - no code change (recommended).
The filter also bypasses the challenge in `QubitUserChallenge::shouldBypassChallenge()`, which is independent of the URL. Add your harvester's address/subnet to `appChallenge.yml`:

"yaml"
cidr_exceptions:
  - '203.0.113.42/32'      # your harvesting server

or, to scope by both network and client, `network_user_agent_exceptions` (a `src_net` + `user_agent` regex pair). Then `php symfony cc` and restart php-fpm. This is the cleanest fix for automated, server-to-server harvesting.

2. Make `endpoint_exceptions` actually work for OAI - one-line code change.
Match on the path only, not the full URI. In `lib/challenge/filter.php`, before the exception loop, replace:

php
$requestUri = $_SERVER['REQUEST_URI'] ?? '/';

with:
php
$requestUri = parse_url($_SERVER['REQUEST_URI'] ?? '/', PHP_URL_PATH);

Then your `/;oai` exception will match. Confirm the exact base first (see below) and add that prefix.

3. Confirm the exact OAI base path.**
Check what your script actually requests - is it `/;oai`, `/oai`, or `/index.php/;oai`? The leading segment must match the start of `REQUEST_URI` (the `;` matters, and a non-clean-URL setup will include `/index.php`). Add the exact prefix you see.

On "DC works but EAD doesn't"

If both formats go through the same `/;oai` endpoint, the challenge alone can't let `oai_dc` through while blocking `oai_ead` - so something else differs for EAD. Before assuming it's the challenge, grab the raw response for the failing EAD request:
bash
curl -i 'https://<host>/;oai?verb=ListRecords&metadataPrefix=oai_ead'

- If you get the challenge HTML: it is the exception issue above (apply fix 1 or 2), and the DC "success" was probably a cached challenge/visited cookie from earlier testing.
- If you get an OAI error such as `cannotDisseminateFormat`, or a 200 with empty EAD: it is a metadata-format issue, not the challenge - check that `oai_ead` is offered by `verb=ListMetadataFormats` and that the records you want actually disseminate EAD.

Paste the two exact URLs (DC vs EAD) and the raw `curl -i` responses and it will be obvious which of the two it is.

Johan Pieterse
Reply all
Reply to author
Forward
0 new messages