Hi Jerry,
well, the patched version does exactly the same as the original version, except that it writes more log messages.
From these log messages I hope to be able to figure out what it's actually doing and why it takes the decisions it takes.
This then will either explain the behaviour or point to the mistake we've made.
Both situations are acceptable for me.
If it works as designed and skips some masters for a valid reason, that's perfectly OK.
If it doesn't work as intended, we can fix it.
It is possible that we'll need some more iterations to pinpoint the exact cause, but I can only judge that after analysing the logs.
Fortunately you can shut down and start up the server at any point in time without losing anything.
I wouldn't exactly choose a time with high traffic, but even that would work without problems.
You can send me the log file per private mail. No need to publish it here.
ronald (dot) jeninga (at) independit (dot) de in case my e-mail address isn't visible here.
The rest of the discussion will remain here, but I don't want to force you to publish potentially sensitive data.
If you set the trace level to 1, you'll only see warnings, errors and fatals (well, I hope not).
That'll keep the log file pretty small. Hence we can let the server run for, e.g. 24 or 48 hours before analysing the logs and the logs will still be moderate in size.
If you set the trace level to 2, you'll also see messages, typically it prints all executed statements.
Depending on the load that might add up. And it would also reveal more information of what you're doing, making the contents more sensitive.
My recommendation is to set the trace level to 1 for now, but I leave the choice up to you.
Best regards,
Ronald