DSpace 7 on production - poor performance

356 views
Skip to first unread message

Karol

unread,
Jun 26, 2023, 3:57:38 PM6/26/23
to DSpace Technical Support
Hi,

I have implemented dspace7 in production, I have 4vCPU 16 Gb ram . I start angular using all cpu pm2 -i max, but the performance of the whole site is very bad. Now I can see my apache logs are growing fast: access.log and dspace.log - probably bots are indexing new content and this is killing my site, and real users can't submit work and use the repository. Unfortunately, I can't tell 100% what or who is overloading the system, because the apache logs show my server address (probably by using a proxy for angular)

* "top" shows 130% CPU 20% ram node /dspace-angular-7.5/dist/server/main.js - this is where I'm looking for a performance problem.

* Apache access.logs per day take up 400 MB - I see continuous logging, but I can't tell from which IP addresses. The dspace log: dspace.log 300MB per day.

1) How can I increase the performance of angular: node /dspace-angular-7.5/dist/server/main.js ? (I already use pm2 -i max)  

2) How can I check from which addresses are so many requests flying to dspace7 ?


Thanks and best regards,

Karol

Edmund Balnaves

unread,
Jun 26, 2023, 5:14:22 PM6/26/23
to DSpace Technical Support
There is an architectural issue in the angular -> API design which means that a very large number of calls are made to the API for each page load. 

This also makes the logs very noisy.

I have found 16GB lean for DSpace 7.5 where the database is on the same server.     Tomcat needs about 3G,  Each pm2 takes about 1G and SOLR and postgres chew up a lot.  ClamAV daemon can chew up another 2G.

 You might want to *reduce* the number of pm2 instances as you may be running low on memory.   If your system is starting to swap this can slow things down terribly.

Adjust the robots to block entity paths and browse paths as robots can get lost in the DSpace search (a historical problem).

The following can help show where traffic is coming from

cat access.log | grep -v " 403 " | grep -v " 301 "  grep -v " 408 " | cut -d " " -f 1 | sort | uniq -c | sort -n 

Unfortunately you will find that a lot of the traffic is to the API server but you can identify bots this way.

fail2ban is a useful tool to block sim-behaving bots.

IMHO DSpace7 needs a bit more work architecturally to improve performance.    That is understandable, it is huge (and impressive) migration that has been completed from DSpace 6.   The new version is a very fresh and nice design, and the new API is nice.

Edmund Balnaves
Prosentient Systems

Technologiczny Informator

unread,
Jun 27, 2023, 2:01:50 AM6/27/23
to DSpace Technical Support
Hi,

are you sure your frontend is running in cluster mode?

Regards,
Mariusz

Karol

unread,
Jun 28, 2023, 3:55:07 AM6/28/23
to DSpace Technical Support
Hi,

Edmund,

thank you very much for the hints. I have a few questions:

1)Yes, the system has started swapping, I can't identify what is causing the swapping (tomcat, angular or postgresql). How can I identify which service is swapping?

2) Is this where I can reduce the number of pm2 instances:
config.prod.yml ?

 # The rateLimiter settings limit each IP to a 'max' of 500 requests per 'windowMs' (1 minute).
  rateLimiter:
    windowMs: 60000 # 1 minute
    max: 500 # limit each IP to 500 requests per windowMs
  # Trust X-FORWARDED-* headers from proxies (default = true)
  useProxies: true


3) This command is great, thanks : cat access.log | grep -v " 403 " | grep -v " 301 " grep -v " 408 " | cut -d " " -f 1 | sort | uniq -c | sort -n
It gets a return:

  13395 195.164.49.68 - amazon bot
  16903 3.224.220.101 - amazon bot
  177081 52.70.240.171 - amazon bot
  17146 23.22.35.162 - amazon bot
1644494 ip address of my server


Do I understand correctly, each amazon query generates several-something proxy queries and that's why so many proxy queries?

4) I totally agree, this is a huge and elegant project, but we need report various problems, it will allow better development in the future:)

Mariusz,

Thanks, I added to dspace-ui.json      
  
  "instances": "max",
  "exec_mode": "cluster",

and I start pm2 -i max dspace-ui.json
So I was convinced that this is enough.

Do you know a method how can I confirm this?

Greetings,

Karol



Technologiczny Informator

unread,
Jun 28, 2023, 4:17:27 AM6/28/23
to DSpace Technical Support
Hi,

run pm2 list command. In the mode column you will see what mode your frontend is in.

Regards,
Mariusz

Karol

unread,
Jun 28, 2023, 7:24:11 AM6/28/23
to DSpace Technical Support
Hi,

Mariusz, this is screen from command pm2 list
Screenshot from 2023-06-28 13-22-05.png

If i good understand "mode" should be a "cluster" ? 
Thanks,

Karol

Technologiczny Informator

unread,
Jun 28, 2023, 7:44:51 AM6/28/23
to DSpace Technical Support
Hi, 

Exactly. I don't know what linux distribution you are using, but you should probably look in systemd which is responsible for the pm2 service. There you can have Type=forking instead of Type=cluster by default.

Regards,
Mariusz

Karol

unread,
Jun 28, 2023, 1:46:58 PM6/28/23
to DSpace Technical Support
Mariusz,

Thank You, i don't now why, but the command "pm2 -i max start dspace-ui.json" did not work as needed. Only when I added entries to the dspace-ui.json file and rebooted the whole server (pm2 reboots had no effect) it managed to run on 4 processors in cluster mode - now the repository runs much faster, I'm curious how it will behave during the day. Thanks and best regards,

Karol

Technologiczny Informator

unread,
Jun 28, 2023, 4:08:31 PM6/28/23
to DSpace Technical Support
Karol,

I am very glad that I could help. :)

Greetings from the other side of Poland,
Mariusz :)
Reply all
Reply to author
Forward
0 new messages