Dealing with crawlers in ePrints: Cloudflare etc.

11 views
Skip to first unread message

Dominic Allington-Smith

unread,
Feb 26, 2026, 10:38:24 AMFeb 26
to EPrints UK User Group
Hi everyone,

Thank you for accepting my request to join the group!

I'm the repository manager (in terms of its content) at University College London.  For the past few months, we have been experiencing intermittent but sometimes severe performance issues with our ePrints repository due to it being scraped by crawlers.

Our IT team are planning to implement Cloudflare soon to filter out the crawler activity, but the service has proved tricky to set up as it appears to conflict with ePrints' own authentication system, resulting in administrators needing to login frequently and occasionally getting stuck in infinite loops of this nature.  Indeed, an initial attempt to implement Cloudflare late last year had to be rolled back as the authentication problems were too severe.

I am therefore wondering if:
  • Any other institutions have experienced this crawler issue?
  • Any other institutions have implemented Cloudflare?
    • If so, did you encounter authentication problems, and how were these minimised/resolved?
  • There are any other technical solutions that have been adopted, aside from Cloudflare?
Best wishes,

Dominic

John Salter

unread,
Feb 27, 2026, 8:59:50 AMFeb 27
to Dominic Allington-Smith, EPrints UK User Group
Hi Dominic,
The crawler issue is impacting a lot of Library related systems globally - you are definitely not alone!

This is a really good summary of what the sector is facing, in case you haven't seen it: https://dealing-with-bots.coar-repositories.org/

The most difficult issue with the traffic is that it's hard to determine that it isn't a human. There's also a desire to keep our Open Access content open to all (human and machine).  I don't know whether the free Cloudflare Turnstile offering allows any tailoring of it's rules.
I'd be interested in the authentication problems you faced when you tried deploying this the first time.
We (White Rose) are currently pondering our next steps in this saga.

I think one EPrints site is looking at Anubis (https://anubis.techaro.lol/), and my colleagues are looking at Microsoft Frontdoor to protect other Library services, although this work is in it's early days.

We haven't implemented Cloudflare, but have made some changes to try and limit impact of this traffic. One of the early traits we saw was the re-running of old search requests, from a spread of IP addresses. The search interface is 'heavy' in terms of resource usage, and these re-played searches would lock our server up.
An option was added to EPrints to prevent automatic replayed searches being run (https://github.com/eprints/eprints3.4/issues/479).

We are serving approx. 1Tb of content per day from our two repositories. This is just silly. We want our content to be used - but in a friendly way!

There is also a useful community on the '#bots' Code4Lib Slack channel, and there has been a series of online events looking at solutions: https://wiki.lyrasis.org/display/cmtygp/Solutions+Showcase+Series

Hope that helps a bit.
We could convene an EPrints-centric 'bots' meeting if others would like to share what they have tried?

Cheers,
John

John Salter

https://orcid.org/0000-0002-8611-8266

 

White Rose Libraries Technical Officer
Library and Research Management team, IT
University of Leeds



From: eprints-uk...@googlegroups.com <eprints-uk...@googlegroups.com> on behalf of Dominic Allington-Smith <d.alling...@ucl.ac.uk>
Sent: Thursday, February 26, 2026 15:38
To: EPrints UK User Group <eprints-uk...@googlegroups.com>
Subject: Dealing with crawlers in ePrints: Cloudflare etc.

CAUTION: External Message. Use caution opening links and attachments.

--
You received this message because you are subscribed to the Google Groups "EPrints UK User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eprints-uk-user-...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/eprints-uk-user-group/ae07fb76-907b-4988-a053-bd7eebca5927n%40googlegroups.com.

Alan Exelby (LLE - Staff)

unread,
Feb 27, 2026, 9:19:37 AMFeb 27
to John Salter, Dominic Allington-Smith, EPrints UK User Group, Alan Exelby (LLE - Staff), Grant Young (LLE - Staff), Edmund Chamberlain (LLE - Staff)

Purely FYI: there has also been an e-mail going round from Kent, Essex and Sussex about this – I have only just been copied in. I am not sure if I should give names, so am keeping it vague. I have responded on that, mentioning the UG and saying that we are hosted and not aware of any problem affecting us, suspecting that, if this does touch us, Southampton as host are dealing with it on our behalf. Though, knowing my luck, now that I’ve said that, in five minutes our performance will crash…!

 

Best wishes,

 

Alan

 

==============================
Mr A.V. Exelby,
Digital Library Manager (Systems)
The Library,
University of East Anglia,
Norwich Research Park,
Norwich, NR4 7TJ

Tel.: 01603 592432  (mobile 07736 093516, but only in office hours and landline always preferable)
E-mail: a.ex...@uea.ac.uk

================================
"Man, who'd have thought being a librarian could be so tough"

Seamus Harper, in 'Harper 2.0', "Andromeda".

 

From: 'John Salter' via EPrints UK User Group <eprints-uk...@googlegroups.com>
Sent: 27 February 2026 14:00
To: Dominic Allington-Smith <d.alling...@ucl.ac.uk>; EPrints UK User Group <eprints-uk...@googlegroups.com>
Subject: Re: Dealing with crawlers in ePrints: Cloudflare etc.

 

Warning: This email is from outside the UEA system. Do not click on links or attachments unless you expect them from the sender and know the content is safe.

Reply all
Reply to author
Forward
0 new messages