Schedulix Log Retention Policy

83 views
Skip to first unread message

JRGabelo

unread,
May 2, 2019, 2:35:20 AM5/2/19
to schedulix
Does schedulix still doesn't have log retention just like what has been said 2 yrs ago (see link below).

https://groups.google.com/d/msg/schedulix/sSvR9ZjSMCM/1xRbQqpQAQAJ

Dieter Stubler

unread,
May 2, 2019, 3:37:45 AM5/2/19
to schedulix
Hi,

Regarding logs there are 3 different topics to talk about.

1. The schedulix server, web server and job server agent logfiles

Those processes are called via the scrolllog utility which does log retension on those logfiles.
You can use these utility to do log retension for your job process output also by wrapping your program call with a scrolllog in your jobs command lines.

2. Logging job and batch data in the schedulix repository

Those are submitted jobs and batcheswith state, exit code, runtime, result varaiable, ... information.
These are stored in the schedulix repository database and never deleted by default.
Configuring the server.conf DbHistory this information can be removed and optionally archived (Archive options in server.conf) automatically.

3. Logfiles written by jobs

Those are stdout and stderr logs as configured in the job definition.
This files are regarded as user files and will not be removed by the system.
Just schedule a job for each agent, removing old log files and/or doing any log other log management actions.
You can also put your log files in a filesystem managed by a document management system of your choice to handle that.
Jobs may write to other log files not known to the system at all. Those have to be managed in a some way too.
For this reason, implementing this functionality in schedulix is off topic from out point of view.

So, "still doesn't have" doesn't apply.

Hope that helps 
 
Regards
Dieter

JRGabelo

unread,
May 14, 2019, 6:25:32 AM5/14/19
to schedulix
Hi -

Regarding number 2, we have changed the DbHistory to "45000" minutes (approx 1 month) and Archive to "true" and restart schedulix server.
Do we expect the data (older than 1 month) from live tables to be transferred in arc_tables and be removed once the configuration file has been changed and server has been restarted? 

Our goal is to remove job scheduling history that is older than 1 month (both in front-end and in database). Should we change also the "History" setting in the configuration file to 1 month?

Can you please provide samples on what values should we set in the following fields to achive above inquiry.
DbHistory=
History=


Dieter Stubler

unread,
May 14, 2019, 7:27:31 AM5/14/19
to schedulix

Hi,


There is no data kept in the front-end.
There is only the data in the repository database which is loaded into memory at startup according to the history settings.

All history settings are always referring to master submitted (top level) batches or jobs and all of their children.

If DBHistory is set to a value other than 0, all data of final jobs which are no longer in the memory controlled by server.conf history settings (History, MinHistoryCount, MaxHistoryCount, HistoryLimit) and are older than the defined DbHistory will be removed from the repository live tables.

If you set Archive to true, the data is copied into arc_... tables before deletion. schedulix does not touch those archived data after creating it.
So arc_ tables will grow to infinite size if the user does not purge data from those archive tables.
It is the users responsibility to manage the archive tables.

If you set History to 1 month the system will hold 1 month of history in its memory when starting up.
So you have a big enough java heap size which can handle this.
Depending on the number of jobs you execute per day this might become huge.
Most of our customers use 5 or 10 days of History with a MinHistoryCount between 1 and 5, to hold older executions of master submitted batches or jobs running very infrequently like once a month in memory.
The HistoryLimit defines the maximum age allowed for masters older than history but covered by MinHistoryCount to be in memory.
Typically the History Limit is set to something between 2 and 12 months.

If you set DbHistory to 1, the system will delete all data from the live tables of masters no longer in memory.
MaxHistoryCount saves memory by only keeping the last MaxHistoryCount masters in memory, which is useful for jobs running every 5 Minutes.

A good starting point for History configuration might be:

History 10 days
MaxHistoryCount 50
MinHistoryCount 3
HistoryLimit 6 months
DbHistory 90 days

Archive depending on your needs.

JRGabelo

unread,
May 14, 2019, 7:39:04 AM5/14/19
to schedulix
Hi

Having that said, if we just want to purge all data of jobs that are 1 month old directly from repository live tables, we just need to set DbHistory to 30days and leave the Archive to "false"?

Dieter Stubler

unread,
May 14, 2019, 7:52:16 AM5/14/19
to schedulix
exactly

JRGabelo

unread,
May 14, 2019, 8:53:22 PM5/14/19
to schedulix
Thank you so much Dieter for all of your help!

Have a great day ahead.
Message has been deleted

JRGabelo

unread,
May 15, 2019, 10:45:04 PM5/15/19
to schedulix
Hi Dieter,

We have changed the DbHistory setting to 45000 mins (approximately 1 month) and Archive to false, but it seems like it doesn't purge 1 month old data in live tables.

submitted_entity for example is still showing data from 4/10 which is more than 1 month old. How can we address this?


Below are our current settings:

Archive=false
DbHistory=45000 (1 month)
History=14400 (10 days)
HistoryLimit=28800 (20 days)
MinHistoryCount=0
MaxHistoryCount=0


Our goal is to removed data from live tables that are 1 month and older.


Dieter Stubler

unread,
May 16, 2019, 3:23:30 AM5/16/19
to schedulix
Hi Gabelo,

Did you restart your schedulix server ?
You always have to after changing configuration in server.conf.
I do not how long you waited for the data to be removed after restarting the server ?
The DbCleanupThread doing the purge of old data in the database is running every 15 minutes and will start first 15 Minutes after server startup.
This is to give the server time after startup t do more important things like time scheduling and the load is tyically higher after a longer downtime because of the backlog.

Regards
Dieter

JRGabelo

unread,
May 16, 2019, 3:29:28 AM5/16/19
to schedulix
Hi -

Yes, we have restarted our schedulix server yesterday, and up until now, old data are still on the table.


Regards,
Jerry

Dieter Stubler

unread,
May 16, 2019, 3:55:08 AM5/16/19
to schedulix
Hi,

Have you checked whether those old masters are FINAL ?
Are those masters still visible in the Running Masters Jobs view ?

Regards
Dieter

JRGabelo

unread,
May 16, 2019, 4:21:06 AM5/16/19
to schedulix
Hi,

No, they're not visible in Running Masters Jobs view.
So the only data that are in FINAL state will be purge? Is there a way we can include all CANCELLED and other state?

And, am I looking at the right database table? I am monitoring in submitted_entity table.

Regards,
Jerry

Dieter Stubler

unread,
May 16, 2019, 5:26:33 AM5/16/19
to schedulix
Hi,
Sry for missing the CANCELLED ones in my last post, they are purged also.
All other states indicate that the batch or job is still active, so they will not be purged.
So always bring your jobs or batches to a FINAL state or cancel them.
Which database system do you use, mysql or postgres ?
Please restart the server, and run the command 'alter server with trace level = 3;' using sdmsh.
Wait for half an hour and attach the server logfiles created in this time to your next post please.
Run an 'alter server with trace level = 1;'  afterward to reduce further logging.
Yes, submitted_entity table is a good place to look.
Please make sure you are not accidently looking at a wrong repository database (like a backup copy or so), just in case ;-)
Happend to me in the past already, hunting ghosts :-)
Please also attach your server.conf ('*****' OUT SYSTEM PASSWORD PLEASE).

Regards
Dieter

JRGabelo

unread,
May 17, 2019, 4:01:45 AM5/17/19
to schedulix
Hi,

Kindly see attached file for generated logs and server.conf file.

Also, the database system we use is postgres.


We'll wait for your feedback. 


Thanks,
Jerry
schedulix.zip

Dieter Stubler

unread,
May 17, 2019, 4:35:04 AM5/17/19
to schedulix
Hi Gabelo,

did you install schedulix using the rpms or did you self compile using the git repository ?

Dieter Stubler

unread,
May 17, 2019, 6:05:52 AM5/17/19
to schedulix
Hi,

Is it possible, that you have any open connections to the repository database from psql or an other tool holding some locks on the repository database ?
If this would be the case, the DBCleanupThread may be blocking and cannot proceed, resulting in the observed behaviour.

Please send the result of the query

SELECT * SCI_SUBMITTED_ENTITY WHERE ID = 5732;

So we can do furher checking whats going on here.

Regards
Dieter

JRGabelo

unread,
May 20, 2019, 3:41:07 AM5/20/19
to schedulix
Hi,

Please see the following answers:

1. We have installed schedulix using rpms.
2. I don't think we have any open connections to the repo database or any other tool that locks the db. Is there a way we can check or confirm this?
3. See attached for the query result.


Regards,
Jerry

3.PNG

Dieter Stubler

unread,
May 20, 2019, 4:02:16 AM5/20/19
to schedulix
Hi,

if it is not there, it looks like it was deleted by the DbCleanupThread.
So the datatabase cleanup works.
Sometimes it needs just some patience.

Regards
Dieter

JRGabelo

unread,
May 21, 2019, 11:35:58 PM5/21/19
to schedulix
Hi Dieter,

Yes, row with id of 5732 has been deleted. However, a lot files that are 1 month and older are still sitting on the table. 

Kindly see attached for the comparison of two SCI_SUBMITTED_ENTITY file exported on May 15 and May 22.
(some column values were removed for security purposes)


Kindly note that server.conf has been modified a week ago.


Regards,
Jerry

sci_submitted_entity.zip

Ronald Jeninga

unread,
May 22, 2019, 7:58:58 AM5/22/19
to schedulix
Hi Jerry,

I had a look at your csv exports and it is indeed weird.
Most of the rows are OK, but there are about a hundred rows that seem to be valid candidates for cleanup.

Dieter and I had a thorough look at the source code responsible for the cleanup in combination with the trace, but didn't find any obvious mistake.

Would it be OK for you if I send you a BICsuite.jar file (2.8) that will provide some more tracing output? (I'll also explain how to install it).
That way we'll be able to find out why the system decides not to clean up rows of which we'd expect it to clean up.

It will take until tomorrow, probably, until I can send it.
Currently I'm recovering from a fresh cataract surgery and I can't spend too much time of the day behind my monitor.

Best regards,

Ronald
Message has been deleted

Ronald Jeninga

unread,
May 23, 2019, 7:59:48 AM5/23/19
to schedulix
Hi Jerry,

just a moment ago I've built a new BICsuite.jar, which I've tried to attach to this message.
It failed. Google doesn't like to have jar files in its group, and I can imagine why.

So I uploaded it to our server and you can download it from


On my system it says:

[ronald@ocelot lib]$ md5sum BICsuite.jar
6eaee4067aa28699d60f463459d72b68  BICsuite.jar
[ronald@ocelot lib]$ sum BICsuite.jar
16427  2325

If the downloaded file gives different results, please tell me.

In order to apply this patched version, proceed as following:

1. shut down the scheduling server and locally running jobservers (service schedulix-server stop && service schedulix-client stop)
2. cd $BICSUITEHOME/lib
3. mv BICsuite.jar BICsuite.jar.orig
4. Now copy the downloaded BICsuite.jar hereto
5. Start up the server and jobservers (service schedulix-server start; service schedulix-client start)

If you have jobs running, they won't be affected.

I added a few trace messages and changed the severity of several trace messages.
The effect is that you'll have loads of "WARNING"s from the DBCleanupThread, which actually aren't warnings but in fact debug messages.
The trace level of the server can be kept at 1 or 2, whatever you prefer.
This way we circumvent the trace level 3 which can lead to a very voluminous output.
The downside is, as I already pointed out, that debug messages are called warnings. I can live with that though.

If you have questions or feel unsure, please don't hesitate to ask.

Best regards,

Ronald

JRGabelo

unread,
May 30, 2019, 5:01:34 AM5/30/19
to schedulix
Hi Ronald,

What would be our next step after applying the patched version?


Thanks,
Jerry

Ronald Jeninga

unread,
May 30, 2019, 5:29:59 AM5/30/19
to schedulix
Hi Jerry,

well, the patched version does exactly the same as the original version, except that it writes more log messages.
From these log messages I hope to be able to figure out what it's actually doing and why it takes the decisions it takes.
This then will either explain the behaviour or point to the mistake we've made.

Both situations are acceptable for me.
If it works as designed and skips some masters for a valid reason, that's perfectly OK.
If it doesn't work as intended, we can fix it.

It is possible that we'll need some more iterations to pinpoint the exact cause, but I can only judge that after analysing the logs.

Fortunately you can shut down and start up the server at any point in time without losing anything.
I wouldn't exactly choose a time with high traffic, but even that would work without problems.

You can send me the log file per private mail. No need to publish it here.
ronald (dot) jeninga (at) independit (dot) de  in case my e-mail address isn't visible here.
The rest of the discussion will remain here, but I don't want to force you to publish potentially sensitive data.

If you set the trace level to 1, you'll only see warnings, errors and fatals (well, I hope not).
That'll keep the log file pretty small. Hence we can let the server run for, e.g. 24 or 48 hours before analysing the logs and the logs will still be moderate in size.
If you set the trace level to 2, you'll also see messages, typically it prints all executed statements.
Depending on the load that might add up. And it would also reveal more information of what you're doing, making the contents more sensitive.
My recommendation is to set the trace level to 1 for now, but I leave the choice up to you.

Best regards,

Ronald

JRGabelo

unread,
May 30, 2019, 11:10:22 PM5/30/19
to schedulix
Hi Ronald,

Before applying the patch version, we've extracted sci_submitted_entity table into csv file to compare it to the one extracted last friday 5/24/19, and DBHistory finally seemed to work after 15 days that the server.conf was modified.

Do you have any idea what might actually happened?

Note: there are no changes applied on the server from last friday until today.


Thanks,
Jerry

Ronald Jeninga

unread,
May 31, 2019, 2:33:16 AM5/31/19
to schedulix
Hi Jerry,

well, the DBCleanupThread might be flawed, but if, it isn't far off.
At least it isn't severely broken. In practice it seems to work pretty well and except for you no-one bothered to check the details so far.

This isn't a blame. I actually like it if people check the details.
In programming details often matter.

The basic philosophy behind its functioning is to prevent the milk from boiling over, with the least possible amount of effort.
We don't try to keep the temperature at exactly 85.0 °C.
This might lead to a situation that some masters aren't deleted yet although an exact analysis would classify them as a valid candidate for deletion.

The thread runs at a pretty low pace.
Cleaning up the database isn't a high priority task. Running jobs is. And the integrity of in memory data structures is crucial.
This is why I suggested to let the system run for 24 or better 48 hours before we start analysing the log file.
It gives the system time enough to establish an harmonic situation between the DBCleanupThread and the GarbageThread (which cleans out masters and related information from memory).

As long as the milk doesn't boil over, everything's fine.

Best regards,

Ronald
Reply all
Reply to author
Forward
0 new messages