I have a problem I'm hoping Hazelcast can help solve but I wanted to
check whether my proposed solution seems reasonable.
Basically I have a set of 'harvesters' who's job are to harvest a set
of URL's. I want to make sure two workers are never working on the
same URL at once.
The workers spin up to 50 threads to do the harvesting, and there
might be multiple workers across multiple machines.
Each worker basically has one 'scheduling' thread who's job is to
coordinate the worker threads in its process. It spins in a loop and
grabs a set of URLs that need harvesting from a DB. My idea would
then for each of those, to try to acquire a lock for that URL, lock
it, remove the entry from the DB, then hand the lock and URL to one of
the worker threads, which would unlock it once complete. Is this the
right way to go about it?
When using Hazelcast's locking, can I lock simple objects such as
strings, even if the instances are different across machines? Also,
can a different thread do the unlocking as long as it has a reference
to the lock that was acquired?
IE, will the following work?
.. in the scheduling thread .. (one per machine, many machines)
Lock lock = Hazelcast.getLock("
http://www.foo.com");
if (lock.tryLock()){
// spin off thread to do the harvest, handing it the lock
}
.. in the worker thread .. (many per process)
// harvest the URL
// unlock this URL
lock.unlock();