doit automation for following scenerio?

Sasha Kacanski

unread,

Jan 17, 2022, 1:00:58 PM1/17/22

to python-doit

Sorry to intrude,

Do not know much about this automation - workflow tool but have couple of questions, hopefully someone can help...

I have pretty complex application stack that consist of templates, execution harness, decoupled drivers, tons of inputs.

Architecture is:

driver -. statemachine (automata) -> collect info -> setup jobs (tasks)

task consist of python objects and arguments for execution of these objects

tasks should be queued to some "bus" or storage option...

I can have from tens to thousands of these tasks.

worker infrastructure is passive until started. When started workers will take one task from the queue, execute it return success or failure and dump payloads to NFS.

Parallelization is achieved on the worker side, essentially horizontal scaling of worker nodes (servers)

if the worker node is done, or has metered resources, worker node will pick more than one task and process them in parallel.

on worker nodes, each process is its own python process. Task are completely independent of each other.

When whole queue is drained, driver will collect stuff and process it in single shot.

I rad documentation briefly. Framework looks great. I did this in past with fireworks framework and mongoDB.

if someone can point me to documentation for parallelization and ability to do above i would much appreciate it ...

Regards,

Terry Brown

unread,

Jan 17, 2022, 3:06:19 PM1/17/22

to pytho...@googlegroups.com

Just at face value this sounds more like a distributed computing / task queuing problem than a dependency management problem. You state that the tasks are run independently of each other. Have you looked at things like https://docs.celeryproject.org/ and https://www.rabbitmq.com/ ? I might be missing the particular challenges you're hoping doit could help with, but those are the tools I think of reading your post.

Note that celery uses RabbitMQ (or Redis) in the background, so if you look at celery and it seems like a good fit, probably no need to dig deeper into those other things.

Cheers -Terry

On Mon, Jan 17, 2022 at 11:44 AM Sasha Kacanski <skac...@gmail.com> wrote:

Sorry to intrude,
Do not know much about this automation - workflow tool but have couple of questions, hopefully someone can help...

I have pretty complex application stack that consist of templates, execution harness, decoupled drivers, tons of inputs.
Architecture is:
driver -. statemachine (automata) -> collect info -> setup jobs (tasks)
task consist of python objects and arguments for execution of these objects
tasks should be queued to some "bus" or storage option...
I can have from tens to thousands of these tasks.
worker infrastructure is passive until started. When started workers will take one task from the queue, execute it return success or failure and dump payloads to NFS.
Parallelization is achieved on the worker side, essentially horizontal scaling of worker nodes (servers)
if the worker node is done, or has metered resources, worker node will pick more than one task and process them in parallel.
on worker nodes, each process is its own python process. Task are completely independent of each other.
When whole queue is drained, driver will collect stuff and process it in single shot

I rad documentation briefly. Framework looks great. I did this in past with fireworks framework and mongoDB.
if someone can point me to documentation for parallelization and ability to do above i would much appreciate it ...
Regards,

--
You received this message because you are subscribed to the Google Groups "python-doit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-doit...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-doit/d957999c-85d4-466a-b6ff-9310947fa487n%40googlegroups.com.

Sasha Kacanski

unread,

Jan 17, 2022, 3:29:47 PM1/17/22

to python-doit

Thanks Terry you got it right!

Yeah I will look at celery. I do not want to complicated queue bus. I used to do this with Pyro4 because abilities of shoveling python object (callable) with some arguments to remote worker.

Fireworks is perfect for that, but implementation of PyTask worries me a bit. In general I want to serialize configurations to py objects, stuff things to passive queue, and act on that queue from worker side.

MongoDB queue is relatively simple, celery claims that is supporting it but will see...

I will check doit for dependency management...

Reply all

Reply to author

Forward