High requests count per item

11 views
Skip to first unread message

Andrew K

unread,
Feb 4, 2026, 12:11:29 PM (7 days ago) Feb 4
to DSpace Developers
Hi,

I analyze my dspace server logs regularly, both dspace and apache.
For instance, yesterday there was 5.7M events in apache log (5M from bots, though)
I could not help notice that all top requests are service:

145837 /server/api
144840 /server/api/authn/status
144102 /server/api/authz/authorizations/search/object?uri=...&feature=isCommunityAdmin&embed=feature
144099 /server/api/authz/authorizations/search/object?uri= ...  &feature=administratorOf&embed=feature
144088 /server/api/authz/authorizations/search/object?uri= ...  &feature=isCollectionAdmin&embed=feature
144031 /server/api/authz/authorizations/search/object?uri= ...  &feature=canSubmit&embed=feature
143820 /server/api/authz/authorizations/search/object?uri= ...  &feature=coarNotifyEnabled&embed=feature
143704 /server/api/authz/authorizations/search/object?uri= ...  &feature=canEditItem&embed=feature
143670 /server/api/system/systemwidealerts/search/active
143640 /server/api/authz/authorizations/search/object?uri= ...  &feature=canSendFeedback&embed=feature
143638 /server/api/authz/authorizations/search/object?uri= ...  &feature=canSeeQA&embed=feature
143512 /server/api/core/sites
143495 /server/api/discover/browses?size=9999
143486 /server/api/config/properties/contentreport.enable
143486 /server/api/authz/authorizations/search/object?uri= ...  &feature=epersonForgotPassword&embed=feature
143482 /server/api/authz/authorizations/search/object?uri= ...  &feature=epersonRegistration&embed=feature
143311 /server/api/authz/authorizations/search/object?uri= ...  &feature=canManageGroups&embed=feature
137268 /server/api/config/properties/submit.type-bind.field
117537 /server/api/discover
117057 /server/api/discover/search
116703 /server/api/config/properties/websvc.opensearch.enable
116564 /server/api/config/properties/websvc.opensearch.svccontext
...and so on.

I just opened an item page and watched 80+ requests to from my IP in the log!
So I figure out that the real view count was 100K a day.

At the same time, the server's Core Web Vitals are barely OK. Actually, all pages are fast on the desktop and slow on the phone.

So the question is:
Is that absolutely nesessary to have 80+ server requests per page?
Does it make sense to ask for authorizations more than once for an anonymous user? Pardon my ignorance.

Thanks,
Andrew

DSpace Developers

unread,
Feb 6, 2026, 4:10:40 PM (4 days ago) Feb 6
to DSpace Developers
Hi Andrew,

DSpace does send several requests per page to the REST API to gather all the data it needs to fill out the page.  The number of requests, however, differs based on the page data and how many objects you're displaying on the page, and which features you have enabled.

However, each request is cached.  So, on your first visit to the site, you may have a larger number of requests to gather information...and then that information is cached (in your browser) and other pages no longer need to make the same requests.

I'm not familiar with any page which involves 80 requests at once.  Most pages make <10 requests to draw the page (and often <5).  Some administrative or submitter tools may use more, but public pages tend to have fewer requests. You can see that behavior on our Sandbox site at https://sandbox.dspace.org/ , where you can "watch" the requests per page by opening up your Developer Tools in your browser and looking at the "Network" tab.  If you've found a single page that is generating 80+ requests, then that sounds like a bug that should be reported to https://github.com/DSpace/dspace-angular/issues  (We'd need details though on how to reproduce the behavior, etc)

That all said, SSR (server side rendering) can be a performance bottleneck at times when a large number of bots are hitting the site.  We do have a number of "Performance Tuning" tips in the documentation that can help to cache pages for bots or *limit which pages do SSR* (which can block bots from accessing those pages):  https://wiki.lyrasis.org/display/DSDOC9x/Performance+Tuning+DSpace

If you use some of those tactics, you should be able to lessen the server load from some (better behavior) bots.  

However, I'll admit, the AI bots these days are a lot "smarter" and more aggressive.  These can be difficult to deal with (and it's not just DSpace that has issues with these aggressive bots -- all repository systems do).  There is no "silver bullet" for the aggressive, AI-related bots.  But, we have a "discussion ticket" open where we link to resources, and other developers have shared tips on what works for their DSpace: https://github.com/DSpace/dspace-angular/issues/4565   There's not a single solution in this case though, and it often requires installing tools *around* DSpace to help manage the bad-behaving bots.

Hopefully this gives you some resources to get started.

Tim

Andrew K

unread,
Feb 9, 2026, 1:14:25 PM (2 days ago) Feb 9
to DSpace Developers
Hi Tim,

Thanks for your extensive reply and a very useful link to the discussion ticket.

I started second-guessing my findings. So I just made this little test before and after opening a simple item page (one pdf, one thumbnail and a license).

grep -c 'MY IP' /var/log/apache2/access.log
23
grep -c 'MY IP' /var/log/apache2/access.log
118

Total 95 requests.
Sure, this was my first visit to the page. The next visit to the same page took just a few requests, to another item - about 20 requests. Thus, when a user is browsing DSpace it is pretty much OK.
But when I opened the same link in a different session (with middle mouse button) it took 95 requests again.

And it looks like crawlers open (almost) every link in a new session which generates a lot of requests. It is indirectly confirmed by the statistics I presented before.

Best regards,
Andrew


пʼятниця, 6 лютого 2026 р. о 23:10:40 UTC+2 DSpace Developers пише:
Hi Andrew,

DSpace does send several requests per page to the REST API to gather all the data it needs to fill out the page.  The number of requests, however, differs based on the page data and how many objects you're displaying on the page, and which features you have enabled.

However, each request is cached.  So, on your first visit to the site, you may have a larger number of requests to gather information...and then that information is cached (in your browser) and other pages no longer need to make the same requests.

I'm not familiar with any page which involves 80 requests at once.  Most pages make <10 requests to draw the page (and often <5).  Some administrative or submitter tools may use more, but public pages tend to have fewer requests. You can see that behavior on our Sandbox site at https://sandbox.dspace.org/ , where you can "watch" the requests per page by opening up your Developer Tools in your browser and looking at the "Network" tab.  If you've found a single page that is generating 80+ requests, then that sounds like a bug that should be reported to https://github.com/DSpace/dspace-angular/issues  (We'd need details though on how to reproduce the behavior, etc)

That all said, SSR (server side rendering) can be a performance bottleneck at times when a large number of bots are hitting the site.  We do have a number of "Performance Tuning" tips in the documentation that can help to cache pages for bots or *limit which pages do SSR* (which can block bots from accessing those pages):  https://wiki.lyrasis.org/display/DSDOC9x/Performance+Tuning+DSpace

If you use some of those tactics, you should be able to lessen the server load from some (better behavior) bots.  

However, I'll admit, the AI bots these days are a lot "smarter" and more aggressive.  These can be difficult to deal with (and it's not just DSpace that has issues with these aggressive bots -- all repository systems do).  There is no "silver bullet" for the aggressive, AI-related bots.  But, we have a "discussion ticket" open where we link to resources, and other developers have shared tips on what works for their DSpace: https://github.com/DSpace/dspace-angular/issues/4565   There's not a single solution in this case though, and it often requires installing tools *around* DSpace to help manage the bad-behaving bots.

Hopefully this gives you some resources to get started.

Tim

DSpace Developers

unread,
Feb 10, 2026, 12:57:16 PM (13 hours ago) Feb 10
to DSpace Developers
Hi Andrew,

I see what you are asking about now.  What you are seeing is the "performance bottleneck" that I mentioned around SSR (server side rendering).  Any bot which cannot process Javascript will *only trigger SSR* (essentially they require each page to be translated into HTML on each request).  That process *does require a number of requests* from the Server-Side Javascript to the REST API, which is what you are seeing in your logs.

(As a sidenote, since you are grepping in your Apache Logs you may also seeing a mix of Angular and REST API requests... when I was tracking requests, I was doing it in my browser and only viewing the direct REST API requests, which excludes things like requesting the images, CSS and Javascript files necessary to run the frontend.)

When you access the page initially, you are also triggering SSR (server side rendering) for the first page you access. But, then your browser switches you quickly to CSR (client side rendering). Bots may *never* switch to CSR because they cannot process the Javascript required to switch to CSR... in which case every request may be SSR and result in a larger number of REST API requests.

As I linked to previously, we have ways to minimize the SSR processing (but it never goes away entirely) documented on our "Performance Tuning DSpace" page.

One option listed there is to "Limit which pages are processed by SSR":  This will ensure only some pages in your site undergo SSR, while others *ignore* SSR requests.  This essentially limits which pages bots can access (but only if the bot doesn't understand how to process Javascript).

Another option listed there is to "Turn on caching of SSR pages":  This essentially caches the generated HTML for specific pages and serves later bot requests with a cached copy of that page.  This can also be achieved via external tools (e.g. Apache has some caching plugins I believe that do a similar thing).  This essentially means the first bot will make a larger number of those REST API requests, but later bots may make *zero requests* (until the cached page times out and needs to be refreshed).

In summary, the behavior you are seeing is a side effect of SSR & the fact that bots trigger SSR *heavily*.   The tips above can help lessen the number of times bots will trigger SSR, but it won't go away entirely.  (And, as I noted previously, some AI bots are getting much smarter and may find ways to bypass those settings on that Performance Tuning DSpace page, especially by performing client side rendering themselves.)

I hope that helps better explain what you are seeing.

Tim

Reply all
Reply to author
Forward
0 new messages