Chaining dispatcher nodes

60 views
Skip to first unread message

Carlo Giesa

unread,
Apr 3, 2026, 4:41:43 AMApr 3
to gaffer-dev
Hi there!

I'm facing a problem with our inhouse dispatcher and I'm unable to figure out what I'm doing wrong.

I have a very basic task node setup for testing:

PythonCommand --> Dispatcher --> PythonCommand --> Dispatcher

The dispatcher are of the same kind and both python commands just execute a print statement.

When I use the shipped local dispatcher nodes, I see the print statements of both python command nodes in the output. When I use our inhouse dispatcher, I don't reach the first python command task node.

I compared our code with the one from the TractorDispatcher and LocalDispatcher and I can't see any difference in the job walking code in regards to the upstream dependencies. When I added some print statements in the LocalDispatcher code, I realized that we don't reach the first python command either. But how does this then get executed in my output?

I have the feeling that I'm missing a piece in how the dispatcher setup works. If anyone could shed some light on this, I would be a happy man.

I need this kind of setup since we have Box nodes that can work independently with a dispatcher inside of them, but I would also like to build a more complex workflow that contains those nodes inside with another dispatcher at the bottom of it.

Thanks,
Carlo

Daniel Dresser

unread,
Apr 4, 2026, 1:12:04 AMApr 4
to gaffer-dev
Dispatchers aren't really my area of the codebase, and I think there's been some rewriting of this code since I last looked at it in detail - someone more familiar can hopefully offer some better advice after the long weekend.

But I do recall having some kind of similar issues myself understanding the code path for simple execution vs how things get added to a TaskBatch and executed through that. You don't by any chance have "immediate" checked on any of the nodes in question, do you? That would cause them to be executed directly in Dispatcher::TaskBatch::preprocess, and would be a possible explanation for why you're not seeing the prints you expect.

-Daniel

Carlo Giesa

unread,
Apr 4, 2026, 11:28:31 AMApr 4
to gaffe...@googlegroups.com
Hey Daniel!

Thanks for your reply. As far as I can see, I have exactly the same setup, one with a local dispatcher and one with our custom dispatcher. No "immediate" checked on any task node. After further looking into that, my guess is the 'execute' method of the dispatcher that generates the upstream of that dispatcher. But I'm not sure how to handle this to submit jobs on the farm. I checked also the tests about nested dispatchers, but I didn't get further details that would help me out. In the meantime, I worked around by avoiding nested dispatchers, but I guess that there should be solution which would be handy in situations where I can't work around.

Greets,
Carlo

--
You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gaffer-dev/b51aba5b-e073-45b9-92be-f69f971c7173n%40googlegroups.com.

John Haddon

unread,
Apr 7, 2026, 8:59:24 AMApr 7
to gaffe...@googlegroups.com
Hi Carlo,
Are you able to share any details of your dispatcher? There's not much to go on here at the moment...
Cheers...
John

Carlo Giesa

unread,
Apr 7, 2026, 10:21:09 AMApr 7
to gaffer-dev
Hi John!

I attached our main python module that handles the dispatching on our side.

Greets,
Carlo
gafferdispatcher.py

John Haddon

unread,
Apr 8, 2026, 4:09:08 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
Thanks Carlo. I didn't see anything obviously wrong with that, but it's hard to be sure since a lot of it is abstracted away behind the `chore` stuff. You're right that it's `Dispatcher::execute()` that builds the tree of upstream tasks and passes them to `_doDispatch()`. More generally, all TaskNodes do their work via `execute()`, which is how one dispatcher is able to execute a nested dispatcher - because dispatchers are TaskNodes like any other.

Since your dispatcher appears to send jobs to the render farm, what I'd expect to see is this :
  • The downstream dispatcher creates a farm job with two tasks, one for the nested dispatcher and one for the last PythonCommand.
  • When the task for the nested dispatcher runs, it would create a new farm job, with just the task for the upstream PythonCommand.
Are you seeing anything like that?
Cheers...
John

Carlo Giesa

unread,
Apr 8, 2026, 4:19:06 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
Hi John!

Thanks for checking! I'm actually wondering if I should call 'Dispatcher::execute()' myself when hitting a dispatcher node. Right now, when creating 'chores', which are basically jobs that get executed on the farm, I skip dispatcher nodes in line 317. What do you think?

Greets,
Carlo

John Haddon

unread,
Apr 8, 2026, 4:42:16 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
On Wed, Apr 8, 2026 at 9:19 AM 'Carlo Giesa' via gaffer-dev <gaffe...@googlegroups.com> wrote:
Thanks for checking! I'm actually wondering if I should call 'Dispatcher::execute()' myself when hitting a dispatcher node. Right now, when creating 'chores', which are basically jobs that get executed on the farm, I skip dispatcher nodes in line 317. What do you think?

Oh, yes, I'd completely missed that! You're right. You shouldn't be skipping dispatchers - they should just be treated as TaskNodes like any other. Hopefully that fixes things for you?
Cheers...
John
 

John Haddon

unread,
Apr 8, 2026, 4:46:55 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
This is tangential to the main topic, but I couldn't help but notice this comment in the code.

```
# NOTE: Not sure exactly why, but the dispatcher's context doesn't
#       have any frame information. That's why we copy it from the
#       script context. The script context, on the other hand, is
#       missing any dispatcher context variables.
```

The reason the `TaskBatch.context()` doesn't have a frame number in it is because this information is provided by `TaskBatch::frames()`. We found several bugs where folks weren't handling the frame correctly (by setting it for the specific batch), so we removed the frame number to force people to think about it. So ideally you'd be calling `setFrame()` with the values from `batch.frames()` rather than falling back to the current frame from the script - that's the bug we were trying to avoid in the first place. This doesn't matter in many cases, but it would if a user were to keyframe your `taskMaxRunTime` plug for example.

Carlo Giesa

unread,
Apr 8, 2026, 4:54:05 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
Alright, I will test this. Thanks for the hint about the 'frames'. Will take a look at this as well. I'll keep you posted.

--
You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.

Carlo Giesa

unread,
Apr 8, 2026, 6:28:04 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
Mmh, doesn't seem to work. I double checked the shipped TractorDispatcher and I can't see any '.execute()' call either.

Carlo Giesa

unread,
Apr 8, 2026, 6:45:03 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
Ok, I got it working. I called '.execute()' on the dispatcher node, but I had to do it on it's "task" component.

I'm still trying to wrap my head around your comment about the '.frames()' topic. If I remember well, I added the current frame of the script to the context because I had python expressions that got evaluated (probably from job name evaluation) that depend on it. Should this be handled differently or is my current approach valid?

John Haddon

unread,
Apr 8, 2026, 7:42:37 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com

I double checked the shipped TractorDispatcher and I can't see any '.execute()' call either.

TractorDispatcher builds `gaffer execute` command lines, which are what make the eventual call to `execute()`. We don't call `execute()` directory from TractorDispatcher since the whole point is to move that work to the farm.

Ok, I got it working. I called '.execute()' on the dispatcher node, but I had to do it on it's "task" component.

Right, calling `node["task"].execute()` is the correct way to execute a task immediately.
 
I'm still trying to wrap my head around your comment about the '.frames()' topic.

A TaskBatch is a request for a node to be executed across a specific range of frames in a particular context. To correctly fulfill this request you have to actively manage the frame number within the context, looping over each frame. At one point the `TaskBatch.context()` had a default frame number in it, which meant it was easy to forget to do this, because the consequences were less visible.
 
If I remember well, I added the current frame of the script to the context because I had python expressions that got evaluated (probably from job name evaluation) that depend on it. Should this be handled differently or is my current approach valid?

Whenever you evaluate a plug value from the Dispatcher, you're on the hook for doing it with an appropriate context. For a TaskBatch, the most correct thing to do would be to evaluate it for all frames in the batch. But it wouldn't be surprising for a dispatcher to be unable to handle a value that changed within a batch. In which case the most proper thing to do might be to emit a warning. We take a less rigorous approach in TractorDispatcher, but it hopefully gives you the idea :

if tractorPlug is not None :
with Gaffer.Context( batch.context() ) as batchContextWithFrame:
# tags and services can not be varied per-frame within a batch, but we provide the context variable
# as a concession to existing production setups that would error without it
batchContextWithFrame["frame"] = min( batch.frames() )
command.service = tractorPlug["service"].getValue()
 
So, in theory a user could animate `service` if for some crazy reason they needed a different machine to render specific frames. But they couldn't animate it to change within a batch.

Hope that helps...
Cheers...
John

Carlo Giesa

unread,
Apr 8, 2026, 8:10:02 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
Big thanks, John! That makes a lot of sens.

Just a little additional question related to the nested dispatchers in your tractor setup: if I get it correctly, your upstream dispatcher would generate its jobs inside the farm job, right? In that case, those job wouldn't be in any kind of upstream job dependency, right?

In my approach, when creating my farm jobs (via the 'chores' system), I actually generate the upstream farm jobs if I'm looping over an upstream dispatcher which I append in the job dependency chain. In the end, I get all jobs with correct dependencies that get submitted all at once. Code-wise, it looks a bit less elegant, but the tests that I ran on my side looked correct to me.

Following code is basically my new setup:

def _create_all_chores(self, script_node: Gaffer.ScriptNode) -> None:

"""Create all chores of the stored gaffer jobs.


:param script_node: The Gaffer script node.

"""

context = _get_context_from_environment(script_node)


jobs_mapping: dict[UFXGafferJob, list[Chore]] = {}


for job in self._jobs:

if isinstance(job.gaffer_batch.node(), UFXDispatcher):

node = job.gaffer_batch.node()

node.do_submit = False


node["task"].execute()


chores = node.chores

if chores:

jobs_mapping[job] = chores

self._chores.extend(chores)

else:

chore = self._create_chore(job, context)

if chore is not None:

jobs_mapping[job] = [chore]

self._chores.append(chore)


for gafferjob, chores in jobs_mapping.items():

for chore in chores:

self._setup_chore_dependencies(chore, gafferjob, jobs_mapping)



Greets,
Carlo

--
You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.

John Haddon

unread,
Apr 8, 2026, 9:27:48 AM (14 days ago) Apr 8
to gaffe...@googlegroups.com
On Wed, Apr 8, 2026 at 1:10 PM 'Carlo Giesa' via gaffer-dev <gaffe...@googlegroups.com> wrote:
Just a little additional question related to the nested dispatchers in your tractor setup: if I get it correctly, your upstream dispatcher would generate its jobs inside the farm job, right? In that case, those job wouldn't be in any kind of upstream job dependency, right?

That's right. Any nested TractorDispatcher job would be "fire and forget" - the downstream job wouldn't wait on it to finish. So it's useful for things like using a Wedge node to spit out a job per shot or a job per render pass to simplify wrangling. But you wouldn't use multiple TractorDispatchers for anything you wanted to be a single monolithic job.

I suppose potentially we could add a mode where the TractorDispatcher execution waits for the job to finish, but I'm not sure why that would be useful, and it would tie up a worker machine just waiting idly.
 
In my approach, when creating my farm jobs (via the 'chores' system), I actually generate the upstream farm jobs if I'm looping over an upstream dispatcher which I append in the job dependency chain. In the end, I get all jobs with correct dependencies that get submitted all at once. Code-wise, it looks a bit less elegant, but the tests that I ran on my side looked correct to me.

 It sounds like that might be necessary to deal with your custom nodes that have internal dispatchers, but it isn't really the way we designed things. Having nodes with internal dispatchers doesn't sound ideal to me, unless it's just a LocalDispatcher to group subtasks on one worker. Everything is more composable if any TaskNode can be used with any Dispatcher, allowing you to do things like change render farm without having to rewrite your TaskNodes or rewire your graphs. Your call of course - you know more about what's most important to your particular pipeline.

Cheers...
John

Reply all
Reply to author
Forward
0 new messages