Running background jobs via unmanaged extensions

9 views
Skip to first unread message

Nikhil

unread,
Feb 27, 2018, 4:56:55 PM2/27/18
to ne...@googlegroups.com
Hey everyone!

In the past, I have performed long running tasks such as graph data imports with OSM and other sources by running java code while Neo4j server was taken offline. However, I find it incredibly inefficient to do it over again if the use case involves frequent execution of these tasks (such as updating spatial data from the freshest OSM source on a weekly basis).

I intend to achieve the following flow:

Send parameters to a REST API Endpoint -> Queue a job for background processing -> Track and report progress

I'd like to have a generic worker for specific long-running tasks, that takes parameters and performs a specific job. So far, I have not found any ready resources for this. I'd be happy to know, if they are available. That could save a lot of work for me.

Broadly, I have an unmanaged extension that:
  1. Exposes a REST API endpoint to accept parameters such as a URL to an OSM source file (.pbf, .zip, or .xml)
  2. Pushes the URL and some more meta information for logging and monitoring progress, to a message queue like Amazon SQS (or RabbitMQ or AMQP or anything else that can be implemented)
I am looking for ways to implement a background worker that:
  1. Wakes up periodically (maybe with geometric or Fibonacci backoff that resets with every message that is processed and/or upon reaching a threshold) to check for any messages in queue.
  2. Upon a new found message, it invokes the relevant factory class to perform a specific action, and logs specific progress checkpoints on a temporary sub-graph in Neo4j. I could even implement pub-sub to report real-time progress, but that is not a priority at the moment. E.g. Import OSM data and log the status of running tasks.
Finally, I'd expose another REST API endpoint to retrieve currently running job statuses, so that I could have a separately built lean front-end client implementation to display and manage jobs with specific actions such as cancellation, setting priority, etc via specific REST API endpoints to modify/delete queued jobs. The front-end client could also take parameters (such as an OSM file URL) to queue a job with Neo4j.

As mentioned above, implementing a background worker to process queued jobs is currently a blind spot for me.

I'd be very happy and grateful to know of a better way to achieve execution of long-running tasks without taking the server offline, and via REST API extensions.

--
Cheers,
Nikhil
Reply all
Reply to author
Forward
0 new messages