Big logs/Jobs with multiple occurrences causes unnecessary delay with quarantine lookup.

8 views
Skip to first unread message

Kick Megatron

unread,
Dec 9, 2013, 4:27:40 AM12/9/13
to megatron...@googlegroups.com
Hi, having issue that at this moment the processing of mega-sender.py can now take up to 2 hours.

The issue is the following (I think):
- Job to process is big: ~250K lines to report

- Logfile parsed has a lot of duplicate offenders
Loglines contain source ports, meaning IP addresses are mentioned quiet some times in one logfile.  

- mega-server.py is verifying the quarantaine setting for each IP address. 
What takes a very long time is the following query:
SELECT COUNT(*) FROM log_entry le, mail_job_log_entry_mapping map, mail_job mj where mj.started >= 1386578702 AND mj.finished <= 1386578702 AND mj.id = map.mail_job_id AND map.log_entry_id = le.id AND le.org_id = 13 AND le.ip_address = 2130706433

- We have quarantaine timing set to ‘0’, which makes the above query useless :)
I do not think that setting quarantine to lets say 2 hours improves my situation, as the impact is the quarantaine lookup per IP address (not verified as we do not use quarantaine settings). 


Questions:
- When will version 1.0.11 be released, as feature OccurrenceFilter will be one of my savers? :-)
- Could there be a verification added on the following line to skip the verification when the periodInSecs = 0?

- Any other suggestion?

- Any simple way on how to clean old entries in the DB (older than X months)?
Operational since mid September - have 6M lines in log_entry. And now process more than 1.5M entries per week,


Thanks,
Kick

Tor Johnson

unread,
Dec 9, 2013, 5:07:41 AM12/9/13
to megatron...@googlegroups.com, kick_m...@live.com
> - We have quarantaine timing set to ‘0’, which makes the above query useless :)

The quarantine SQL-queries are a real performance killer. Setting "mail.ipQuarantinePeriod=0"
should turn off execution of the SQL-queries, but obviously this does not work. I have
added this bug to my TODO-list.

> - When will version 1.0.11 be released, as feature OccurrenceFilter will be one of my savers? :-)

No release-date is set. I will send you offlist a build with the latest source.
It should be rather stable. The ant-script will create a build of the project.

> - Any simple way on how to clean old entries in the DB (older than X months)?

At CERT-SE, we create a history-db once a year. Data is moved to the history database
and deleted from the current database. The process is described here:

https://github.com/cert-se/megatron-java/blob/master/doc/howto-create-history-db.txt

If you don't want to keep the data, you can just run the delete-queries.

I hope shrinking the database and use the new build with OccurrenceFilter will
fix your problem.

Best regards,

/Tor
--
CERT-SE (MSB, Swedish Civil Contingencies Agency)
Fleminggatan 14, SE-112 26 Stockholm, Sweden
Mobile: +46 730 516 733, Fax: +46 706 104 711
CERT-SE: +46 86 785 799, MSB: +46 771 240 240

PGP-Key: https://www.cert.se/tor.johnson_at_cert.se.asc
PGP-Fingerprint: 8FF2 F8CA D0D3 D063 FD79 263C 671A 9699 7167 AAD6
Reply all
Reply to author
Forward
0 new messages