Guidance on how to cleanup old pipeline entities safely?

101 views
Skip to first unread message

Frank Wilson

unread,
Feb 25, 2016, 6:54:23 AM2/25/16
to Google App Engine Pipeline API
Hi, 

Is there any guidance on how to safely cleanup entities left behind from previous pipeline runs? I'm referring to entities with kinds that start _AE_* . I'm worried about having a continually growing set of data I no longer need.

Thanks,

Frank

Arie Ozarov

unread,
Feb 25, 2016, 1:01:45 PM2/25/16
to app-engine-...@googlegroups.com
1. You can use the Pipeline UI to delete old jobs.

2. You can use the Pipeline API (if you need a listing functionality like the UI has, have a look at this).

Arie | Ozarov | oza...@google.com | 415-624-6429


--
You received this message because you are subscribed to the Google Groups "Google App Engine Pipeline API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to app-engine-pipeli...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frank Wilson

unread,
Feb 25, 2016, 2:52:42 PM2/25/16
to app-engine-...@googlegroups.com
Hi,

Thanks for your reply Arie. I forgot to mention I am using the Python Pipeline/MapReduce APIs perhaps the Java implementation is significantly different. 

When I list jobs at ${baseurl}/_ah/pipeline/list the list seems to just grow and grow and there doesn't seem to be any facility in the UI to delete the old job data. I'm also worried that if I delete entities without examining and understanding the source code carefully I might disrupt jobs that maybe in progress.

The links you sent to me seem to require a google.com (i.e. google intranet) login, are these the links you meant to send me? (Seems I can't use my google.com customer credentials at any rate).

Thanks,

Frank
--
FRANK WILSON
VP ENGINEERING
Memberoo

The Complete Membership Management Toolkit

t: 01225 581599
a: 30-32 Westgate Buildings, Bath, BA1 1EF

Arie Ozarov

unread,
Feb 25, 2016, 4:04:08 PM2/25/16
to app-engine-...@googlegroups.com
Apologies for the broken links. Here are the public links:


I am not that familiar with the Python code but it looks like this is how you list the root pipelines - https://github.com/GoogleCloudPlatform/appengine-pipelines/blob/master/python/src/pipeline/pipeline.py#L3215 and this is how you would cleanup/delete a pipeline - https://github.com/GoogleCloudPlatform/appengine-pipelines/blob/master/python/src/pipeline/pipeline.py#L2741 (unfortunately it does not look like the pipeline API has a nice way to do that). 

Arie | Ozarov | oza...@google.com | 415-624-6429

Nickolas Daskalou

unread,
Feb 25, 2016, 7:00:36 PM2/25/16
to app-engine-...@googlegroups.com
Slightly off topic - if one were looking at implementing a complex, async workflow on App Engine (Python), is this Pipeline API still the recommended way of doing things, or has it been superseded by something else?

Nick

Arie Ozarov

unread,
Feb 25, 2016, 7:21:57 PM2/25/16
to app-engine-...@googlegroups.com
Using AE Pipelines is completely fine and though control/maintenance was transferred to the open source community it is very useful and widely used.

The AE Pipelines is also used by the AE Mapreduce library but If the workflow is around data processing I would suggest looking at Google Cloud DataFlow instead.

Arie | Ozarov | oza...@google.com | 415-624-6429

Nickolas Daskalou

unread,
Feb 25, 2016, 7:43:08 PM2/25/16
to app-engine-...@googlegroups.com
Thanks Arie.

Nick

Nickolas Daskalou

unread,
Feb 26, 2016, 10:10:05 PM2/26/16
to app-engine-...@googlegroups.com
Arie,

Looks like the Dataflow Python SDK was just released - that's great!

How long do you think it will take for it to become "official" (like the Java SDK is)?

Nick

Arie Ozarov

unread,
Feb 27, 2016, 12:58:39 PM2/27/16
to app-engine-...@googlegroups.com
I think the goal is to get some more feedback/experience to polish the API but you will get a
better answer if you ask this question to someone from that team.

Frank Wilson

unread,
Feb 29, 2016, 5:51:38 AM2/29/16
to app-engine-...@googlegroups.com
Thanks for posting that Nickolas! I definitely want to check out Dataflow for Python!

Frank
Reply all
Reply to author
Forward
0 new messages