Running a pipeline in a single process

17 views
Skip to first unread message

Luke Zulauf

unread,
Sep 26, 2017, 12:55:40 PM9/26/17
to Google App Engine Pipeline API
Hi all,

We typically have two scenarios where we would like a graph of jobs to run; the first is on our servers in response to an event; the second is a developer running on their local machine via the remote api.

For the first case setting up a pipeline is ideal, but for the second case we'd like to be able to avoid the limitations of running a task on the server (i.e. the time and memory limits). 

Currently we need to write the work two separate ways - once as a pipeline and once as a serial (or parallel using threads/subprocesses/etc.) script.
Is there a built in way to run a pipeline without using taskqueue as the backend? This would allow us to write the logic only once (as a pipeline) but still have the flexibility of how it is executed.

Thanks in advance!
Luke

wTyeRogers

unread,
Sep 26, 2017, 6:02:01 PM9/26/17
to Google App Engine Pipeline API
For my response, I'm assuming you're using the App Engine Standard environment. If you're using the Flexible environment, mention that in response to this and I'll return with more relevant options & resources.
 
for the second case we'd like to be able to avoid the limitations of running a task on the server (i.e. the time and memory limits).
[...]

Is there a built in way to run a pipeline without using taskqueue as the backend?


It sounds like you're hitting up against the 10 minute task deadline of a service under automatic scaling. You could configure a service with manual or basic scaling instead, and that deadline would jump to 24 hours. (If your task is taking beyond 24 hours to complete, there are additional options as well, but I'd potentially consider it a code smell and would want to check that something needs to be broken down into smaller parts.) Here's an overview of the various scaling types that services support.

Regarding the memory limits, it also sounds like you're hitting up against another default configuration restraint. There are other instance classes available that have 8x the memory (and much, much more on the Flexible environment which makes use of Google Compute's machine types - will 240 GB RAM suffice for you? ;).


--
Tye

Luke Zulauf

unread,
Sep 27, 2017, 12:42:22 PM9/27/17
to Google App Engine Pipeline API
Basic scaling is an interesting option that I hadn't considered before.

Going through the code I found that start_test() bypasses the taskqueue and runs each task serially. That combined with overriding run_test() in some of our parallel generator pipelines will allow us to make use of other parallelism constructs when running locally.

--
You received this message because you are subscribed to the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to app-engine-pipeli...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages