High traffic / DDoS / fail2ban

69 views
Skip to first unread message

Florian Wille

unread,
Jan 19, 2023, 5:50:08 AM1/19/23
to DSpace Technical Support
Hey There,

my DSpace (6.3) Site usually gets around 10k/h requests. This is handled
quite well. But sometimes there are multiple
bots/crawlers/spiders/indexers/harvester/whatevers throwing each up to
15k/h request at me at the same time and that on top of my 10k/h
standart traffic. This my DSpace cannot handle and it becomes
unresponsive, making the site seem offline to my users.
I performance tuned my Apache and Postgres to handle more
request/connections and gave the system plenty ram/cpu but DSpace gives
up, I think, it's the hibernate layer breaking down.

I was thinking of using fail2ban to get a lid on exessive requesting.
Anyone experience with that, or are there some best practice guides for
fail2ban with DSpace? I don't wanna block/drop legit harvesters/indexers...

Also I came across mod_apache_rate_limit. Would that do any good for my
case?

Are there other guides/ideas how to handle these amounts of traffic?

THX and Regards
Florian

Mark H. Wood

unread,
Jan 19, 2023, 8:27:24 AM1/19/23
to dspac...@googlegroups.com
On Thu, Jan 19, 2023 at 11:50:03AM +0100, Florian Wille wrote:
> my DSpace (6.3) Site usually gets around 10k/h requests. This is handled
> quite well. But sometimes there are multiple
> bots/crawlers/spiders/indexers/harvester/whatevers throwing each up to
> 15k/h request at me at the same time and that on top of my 10k/h
> standart traffic. This my DSpace cannot handle and it becomes
> unresponsive, making the site seem offline to my users.
> I performance tuned my Apache and Postgres to handle more
> request/connections and gave the system plenty ram/cpu but DSpace gives
> up, I think, it's the hibernate layer breaking down.
>
> I was thinking of using fail2ban to get a lid on exessive requesting.
> Anyone experience with that, or are there some best practice guides for
> fail2ban with DSpace? I don't wanna block/drop legit harvesters/indexers...
>
> Also I came across mod_apache_rate_limit. Would that do any good for my
> case?

Well, do you want to ban the spiders, or just slow them to a
reasonable rate? If it were my site, unless I could identify some
genuinely abusive clients, I'd go with rate limiting. There might be
a case for banning some clients and slowing others.

I'd probably choose something made for rate limiting, if I went that
route, rather than pressing fail2ban into this sort of service. I do
see that a number of others have used fail2ban in this way.

But I haven't yet made the time to explore these options in depth.
What we do here is to keep an eye on response time with 'monit'. If
monit thinks DSpace is sick or has died, it kills and restarts
Tomcat. That is kind of drastic but it does shed an excessive load.

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

Edmund Balnaves

unread,
Jan 19, 2023, 2:11:00 PM1/19/23
to DSpace Technical Support

A robots.txt file can help with many spiders, along with a link to the dspace sitemap.

Sitemap: /jspui/sitemap

The robots.txt file can include
Crawl-delay: 10

and it is useful to disallow the search and browse links - eg

Disallow: /jspui/simple-search

May robots get lost in circling around the dspace search results

We use fail2ban to detect malicious activity mainly like high-rate hits on login endpoints.  However it can also be used to detect inordinate activity.   The Crawl-delay is honoured by many of the robots.
We tend to be aggressive in using fail2ban to block access to invalid and maliciously crafted urls.     



Edmund Balnaves
Prosentient Systems
Reply all
Reply to author
Forward
0 new messages