Hi Angel,
as Dieter suggested, being short on memory is often the main reason for bad performance.
It is always a good idea to keep an eye on memory usage and to increase it early, befor running into a situation where performance degrades.
But since you're experiencing long startup times, my suspicion goes into another direction.
I think that your "life" tables (SUBMITTED_ENTITY, HIERARCHY_INSTANCE, DEPENDENCY_INSTANCE, and a few more) are grown huge in the meantime.
At server startup the relevant information from those tables is read and the required time also depends on the size of those tables.
Since you indicate that everything elder than a week doesn't care you too much, those tables can be cleaned up, which will then save time during start.
If you look at the server.conf file, you'll find a few parameters:
#
# DbHistory: Number of minutes to keep in BICsuite repository database
# If the Archive option is disabled, data of older job executions will get lost !!!
# Set to 0 if you do not want to remove data of older job executions from the BICsuite repository
# Set to 1 to remove all data which isn't any longer in server memory
# Set to 45000 to keep the data for approximately 1 month
# Set to 540000 to keep data for approximately 1 year
#
DbHistory=0
# MinHistoryCount: Minimum number of masters loaded (if present), disregarding the
# History. 0 means no minimum
#
MinHistoryCount=0
#
# MaxHistoryCount: Maximum number of masters loaded, even if History is larger
# 0 means ignore
#
MaxHistoryCount=0
The values above are the default values, which disables the selective loading and the cleanup of the life tables.
The selective loading (MinHistoryCount and MaxHistoryCount) helps you to reduce the required memory for the server.
More important (I think) is the DbHistory, which will cause the server to delete elder data from the tables.
This will keep the life tables small and reading them will be fast, even if the DBMS scans the tables.
If you don't want to lose the data, you can archive the rows that else would be deleted, by using the following parameters:
#
# Archive: enable/disable archiving of data when using DbHistory != 0
# removed data will be copied to ARC_... tables before removal
# specify column lists (comma seperated column names) if you do not want to archive all columns
# specify NONE if you don't want a specific table to be archived
# specify ALL if all columns should be archived, which is also the default if archiving is switched on
#
Archive=false
ArchiveSMEColumns=ALL
ArchiveAuditColumns=ALL
ArchiveDependencyColumns=ALL
ArchiveVariableColumns=ALL
ArchiveExtentsColumns=ALL
ArchiveHierarchyColumns=ALL
ArchiveKillJobColumns=ALL
ArchiveStatsColumns=ALL
If you set Archive=true, the rows will be moved to the ARC_* tables.
That way you keep the information for statistical analysis, but it is not in the way during daily operation.
It is also safe to manually remove rows from the ARC_* tables, since those aren't read, only appended, by the system.
Note that you'll have to restart the server after changing the parameters.
And that first time it'll again require 25 minutes since the cleanup hasn't been performed yet.
After some time, 15 minutes or so, the DbCleanupThread will start doing the cleanup and that might reduce the performance of the system a little while it's busy.
(There will be a lot to do initially).
One last thing.
In many environments I've seen huge lists of "red" batches and jobs.
People just don't care about them. OK, the job went wrong, so what?
But from the perspective of the scheduling system this is not true.
Those jobs aren't final (or cancelled) and some operator still has to decide what to do with them.
And even if you run the system with, say, History=14400 (10 days), if the eldest "red" job is 200 days old, a lot of extra information must be loaded to ensure a consistent state of the system.
IOW such jobs extend the load time at startup and consume memory at runtime for no other reason than that nobody feels responsible to finalize it (CANCEL or SET STATE).
I have no clue why your wsgi process runs out of memory.
It definitely isn't related to the above and I'd suggest to discuss that in another thread.
We'll need a lot more information though in order to get a clear view on the issue.
Best regards,
Ronald