Lock Acquisition for 1758789506055 failed

johnts...@gmail.com

unread,

Sep 25, 2025, 5:34:47 AM (6 days ago) Sep 25

to schedulix

Dear Ronald / Dieter,

After running schedulix 2.10 with mysql 8.4.4 for serval months.

I got below message from BICserver.out and unable to start schedulix-server

================================================================

INFO [main] 25 Sep 2025 08:38:26 GMT Exporting WORKDIR
INFO [main] 25 Sep 2025 08:38:43 GMT JDBC Driver used : MySQL Connector/J
INFO [main] 25 Sep 2025 08:38:43 GMT Acquire repository lock for 1758789506055
INFO [main] 25 Sep 2025 08:38:44 GMT Lock Acquisition for 1758789506055 failed
INFO [main] 25 Sep 2025 08:38:44 GMT Ticket values Read: 1753617819364, 1758786314221
INFO [main] 25 Sep 2025 08:38:44 GMT ticket deleted .....
INFO [main] 25 Sep 2025 08:38:44 GMT Acquire repository lock for 1758789506055
INFO [main] 25 Sep 2025 08:38:44 GMT Starting Renew Ticket Thread

===================================================================

Can you kindly help to share some hints to trace for it?

Thank you very much.

Regards,

John Tsui

25.Sep.2025

Ronald Jeninga

unread,

Sep 25, 2025, 6:32:08 AM (5 days ago) Sep 25

to schedulix

Hi John,

in most cases this behaviour is pretty harmless.

There's one exception though, which I'll cover later in this answer.

Let me explain the concept and the mechanism first.

It is crucial that the scheduling server is the only one that writes on the schedulix schema.

On the other hand, the scheduling server can be started on any host that can connect to the database engine.

Hence in order to guarantee that the currently starting/running server is indeed the only one, we use a table called REPOSITORY_LOCK.

If no server is running, this table is usually empty, unless the last shutdown of the scheduling server was ungraceful.

At startup time the server will attempt to insert a row with two timestamps, one is the startup time, the other the current time.

The third column is called LOCKID and is used as primary key (which implies a uniqueness constraint).

This LOCKID is always 1 (hardcoded). We call this row a ticket.

If the insert fails, there are two possible causes.

Either the DBMS engine has a severe problem, which is then not ours, or we receive a duplicate key error, which means that the repostory is already locked by some other server.

The other server can be alive or not.

In order to detect if the other server is still running, the starting server will first read the current values in the table, wait some time and read the values again.

If the two values differ, there must be someone that changed them, IOW another server must be active.

The server will terminate with exit code 1 (I think, but I'm too lazy to check), scrolllog will restart it and the procedure starts anew.

That'll make that server a warm standby server, that'll take over as soon as the running server dies unexpectedly.

If the two values are equal, they must be a leftover of some ungraceful shutdown (could be anything).

In that case the row will be deleted and a new ticket will be inserted.

Then the startup procedure can continue.

At runtime, the validity of the ticket is checked with every writing transaction and the current timestamp within the ticket is updated.

Additionally the Renew Ticket Thread will update the current timestamp every 30 seconds or so.

In your case, there's no delay between detecting the presence of a ticket and deleting it.

This means that you have the configuration parameter SingleServer set to true (you can find that parameter in the server.conf file).

That is a good idea if there is indeed only one server. It saves you the waiting time until the ticket is deleted.

But if there are potentially two servers, one main server and a second one that'll take over if the main server dies for some reason, and both run with the same configuration (i.e. have SingleServer set to true), you're in trouble.

What then happens is that one server is running and a second server starts.

The second server will then delete the ticket and start loading the data.

The first server detects this with the next writing transaction and aborts.

The scrolllog process finds that its child has terminated with an exit code unequal zero and restarts the first server.

Since the first server also thinks it is the only one, it will delete the ticket of the second server and lock the repository for itself.

I won't tell this story to an end because this ping pong game will continue untill someone kills at least one of the servers and its parent (the scrolllog process).

My guess is that you've tried to start the server twice and both servers are now playing ping pong.

Hence, kill all schedulix processes (well, the Java and scrolllog processes I mean) and then try to start the server again.

You'll still find that message in your log file (kill -9 is not a graceful shutdown), but the server will then manage to startup and start working again.

Hope that helps.

Best regards,

Ronald

John Tsui

unread,

Sep 25, 2025, 8:05:53 AM (5 days ago) Sep 25

to Ronald Jeninga, schedulix

Dear Ronald,

Thank you very much,

It's pretty clear for the concept.

schedulix-server can be start at last, but it stand for 50 minutes.

Anyway, it can resume it's work now.

Hope you have a nice day.

Best Regards.

John Tsui

25.Sep.2025

Ronald Jeninga <ronald....@independit.de> 於 2025年9月25日週四下午6:32寫道：

--
You received this message because you are subscribed to the Google Groups "schedulix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to schedulix+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/schedulix/303f9c38-6f38-4482-930c-af6818abca7bn%40googlegroups.com.

Ronald Jeninga

unread,

Sep 26, 2025, 4:58:33 AM (5 days ago) Sep 26

to schedulix

Hi John,

well, as it seems your system "repaired" itself.

But a startup time of 50 minutes isn't actually optimal and maybe it is a good idea to investigate that a little more in detail.

What I've seen so far are two reasons for extended startup times:

1. Someone else is holding a lock in the database

2. The life tables are huge

Now there's not much we can do about the first cause, at least not in advance other than to keep it in mind and check it if the system seems to hang again.

But the second possible cause can be at least relaxed in most cases.

It isn't a good idea to start taking measures in panic though.

We should first check if that known cause is in fact the reason for the extended startup time.

It is easy to get a first impression if you look at the timestamps of the load messages:

INFO [Thread-0] 24 Sep 2025 14:45:23 GMT TableLoader 0 started
INFO [Thread-1] 24 Sep 2025 14:45:23 GMT TableLoader 1 started
INFO [Thread-1] 24 Sep 2025 14:45:23 GMT Read 2, Loaded 2 rows for GROUPS
INFO [Thread-1] 24 Sep 2025 14:45:23 GMT Read 4, Loaded 4 rows for RESOURCE_REQUIREMENT

You look for the messages like above and try to find exceptional time gaps (thread wise, of course).

In my example above there's nothing to find (and all data is loaded within seconds if I look at the rest of the log file).

Those first tables shown above are definition tables that aren't much of a problem most of the time.

It is far more likely to find exceptional loading times for the "life" tables like

SUBMITTED_ENTITY, HIERARCHY_INSTANCE, DEPENDENCY_INSTANCE, ENTITY_VARIABLE to name a few.

Now if your server loads 10 million SUBMITTED_ENTITY rows within 1 minute, you have nothing to worry about; that's pretty fast.

But if it uses 1 minute to load 60 thousand SMEs, that's something to investigate deeper.

Thus that's what you should look for.

If you're interested investigating this behaviour, please check your log file and tell me what you find.

Best regards,

Ronald.

Message has been deleted

johnts...@gmail.com

unread,

4:28 PM (2 hours ago) 4:28 PM

to schedulix

Dear Ronald,

Thank you for your help.

BICserver start up quite normal after the next restart.

===============================================================

INFO [main] 25 Sep 2025 14:28:03 GMT JDBC Driver used : MySQL Connector/J
INFO [main] 25 Sep 2025 14:28:03 GMT Acquire repository lock for 1758810483315
INFO [main] 25 Sep 2025 14:28:03 GMT Starting Renew Ticket Thread
INFO [Thread-0] 25 Sep 2025 14:31:01 GMT TableLoader 0 started
INFO [Thread-1] 25 Sep 2025 14:31:01 GMT TableLoader 1 started
INFO [Thread-1] 25 Sep 2025 14:31:01 GMT Read 3, Loaded 3 rows for GROUPS
INFO [Thread-1] 25 Sep 2025 14:31:01 GMT Read 100, Loaded 100 rows for RESOURCE_REQUIREMENT
INFO [Thread-1] 25 Sep 2025 14:31:01 GMT Read 31, Loaded 31 rows for INTERVAL_DISPATCHER

=================================================================

Delay of the start up on the first time, may be due to the recovery on database side ( I think )

However, I'm not the expert in this area.

schedulix 2.10 is working quite good now.

Good luck to all!

Best regards.

John Tsui

29.Sep.2025

Reply all

Reply to author

Forward