Note, though, that this type of master-slave system doesn't handle faults well. However, if timer IDs are stored in a highly available persistent store, Vert.x's failover mechanism can make sure the master verticle gets restarted on another node if necessary. Otherwise, fault tolerance and consistency can be achieved by other distributed algorithms that get quite a bit more complex and which I assume are outside of the scope of your question.
Alternatively, if you're willing to use something like Hazelcast you can use things like distributed locks or counters to achieve these types of semantics.
Why can't you just make the timer IDs unique by prepending some sort of node ID to them? I.e. Each node is assigned a UUID at startup and that UUID is prepended to timer IDs for timers started on that node.
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Also, the solution we talked about it fine for us to know about the timer, but canceling the timer on the specific node without being on that node is still not solved. Just because we have the nodeID, I haven't seen a specific way to send a message to that particular node so that we can cancel the timer.
My co-worker used a nodeID as an address.Regarding the other solutions, another problem with non clusterable timerIDs is that you could have two timers with the exact same timerID running at the exact same time on two different nodes. So when you tried to cancel one, it cancelled the other, or both.Really if timerIDs was unique across all nodes in the cluster you wouldn't have all these problems we have right now. We had to write up a whole Verticle, manager, API and addressing to handle not having cluster able timerIDs.
My co-worker used a nodeID as an address.Regarding the other solutions, another problem with non clusterable timerIDs is that you could have two timers with the exact same timerID running at the exact same time on two different nodes. So when you tried to cancel one, it cancelled the other, or both.
Really if timerIDs was unique across all nodes in the cluster you wouldn't have all these problems we have right now. We had to write up a whole Verticle, manager, API and addressing to handle not having cluster able timerIDs.About card games, is actually irrelevant in this case. It is a game and players are on different nodes. There are timers set, usually where it starts on one node because of one players actions, and another player's actions on another node needs to cancel that other timer. Simple. These aren't delay, wait timers, they are that the player has x amount of time to do whatever their action is on the client side.
On Monday, September 29, 2014 3:57:36 PM UTC-4, bytor99999 wrote:My co-worker used a nodeID as an address.Regarding the other solutions, another problem with non clusterable timerIDs is that you could have two timers with the exact same timerID running at the exact same time on two different nodes. So when you tried to cancel one, it cancelled the other, or both.Really if timerIDs was unique across all nodes in the cluster you wouldn't have all these problems we have right now. We had to write up a whole Verticle, manager, API and addressing to handle not having cluster able timerIDs.I think you are taking a generic distributed systems problem and applying it to timers just because you happen to be using the timerID as a distrubuted value. As Tim has pointed out, in 3.0 there is more support for distributed data structures (map, counter, lock) to help solve these problems. But I currently don't see any reason why we would make timerIDs unique across a cluster.
With unique timerIDs we don't have to worry about this. Instead of long/ints they could be UUIDs.
A timer in Vert.x is simply an event that fires every x milliseconds, it doesn't know anything about clustering.
OK. One more simple quick question. Is it ever possible for a timer that expires to not be fired?
On 26/02/15 19:55, bytor99999 wrote:
OK. One more simple quick question. Is it ever possible for a timer that expires to not be fired?
Yes, if the context is blocked preventing it from being executed (e.g. something is blocking the event loop or a worker)
On Friday, 27 February 2015 06:21:22 UTC, Tim Fox wrote:On 26/02/15 19:55, bytor99999 wrote:
OK. One more simple quick question. Is it ever possible for a timer that expires to not be fired?
Yes, if the context is blocked preventing it from being executed (e.g. something is blocking the event loop or a worker)
Assmuming it unblocks eventually and you don't just mean it'll never run if it never ever unblocks:
I thought the runnable was enqueued regardless, it could just be significantly late, so long as the timer is not actively cancelled in the interim? I may well be wrong.
I think e.g. the back-of-the-envelope "too busy" check that yoke does relies on that though
- the later the timer actually fires after its initiially projected target time the more server load it assumes there is.
I don't think you can make any assumptions about when, if ever, it unblocks. This completely depends on how the context is blocked.
But we have a Watchdog, a Timer Verticle and Hazelcast maps to try to track all the timers and run "cron" jobs to find timers that expired but never fired and run the code.
These aren't cron jobs. These are single run timers of a game. For instance, it is your move, you have 12 seconds to make your move, if not we move to the next player. We move to the next player, start a new timer for that new player for 12 seconds. If they make a move, we cancel the timer, except this timer was created on the server/node that the first timer was started, but the second player has a socket connection to a different node and now has to cancel the second timer which is on another totally different node.
Anyway, explanations aside. Our application needs guarantees distributable timers. And in the end we have a complex convoluted way to hopefully do that successfully because there is no other solution. :D
Thanks.
Mark
On Monday, March 2, 2015 at 8:33:22 AM UTC-8, dgo...@squaredfinancial.com wrote:
But we have a Watchdog, a Timer Verticle and Hazelcast maps to try to track all the timers and run "cron" jobs to find timers that expired but never fired and run the code.
anwyay, whatever this code is, it clearly doesn't actually have to run in the vertx context of the verticle that originally set the timer?
While a full distributed-scheduled-executor-service might be nice (tm) (and looks like hazelcast are planning it at the hazelcast level), it sounds to me like you could try a different architecture for now, instead of considering your watchdog just the fallback, just actually have a dedicateed "cron service" module that looks after all the timers that your other verticles register scheduled tasks to /instead of/ using the local and lightweight timer facility within them.
The cron service module could use a hazelcast distributed backing store (IIRC you were already hitting hazelcast directly...) so it itself can fail over without loss. You might be concerned with scaling and timer accuracy of that, but I'd suggest trying it first - dismissing it may be premature optimisation, particularly if you only need accuracy to seconds. i.e. Consider something akin to the vertx work queue module but exposing a scheduled task service api.
I think the problem here is you don't yet know what is actually causing the issue, so you're forced to apply mitigating approaches to work around the issue, but this is not an ideal approach.
If it was me doing this I think I would try and spend some time forensically pinpointing exactly what is going - do some debugging, add some logging, takes some stack traces, and trace what happened to those timer events. Once you know exactly what is going on you might have an "aha" moment and be able to apply a fix that solves the real issue once and for all.
These aren't cron jobs. These are single run timers of a game.