Feedback on a new pipe "protocol", a method for extensible method chaining

53 views
Skip to first unread message

Tom Augspurger

unread,
May 17, 2015, 9:05:01 PM5/17/15
to pyd...@googlegroups.com
Hi all,

We're looking to define and implement a new "protocol" for more easily piping data through method chains. You can see the discussion at

https://github.com/pydata/pandas/issues/10129

The short version is that instead of writing

# f, g, and h are functions that take and receive a DataFrame
result = f(g(h(df), arg1=1), arg2=2, arg3=3)

You instead write

(df.pipe(h)
  .pipe(g, arg1=1)
  .pipe(f, arg2=2, arg3=3)
)

We're hoping that if people / library authors find this useful, they'll implement it themselves.


Feedback welcome.

Paul Hobson

unread,
May 18, 2015, 1:19:38 AM5/18/15
to pyd...@googlegroups.com
This looks really cool. I'm hopeful it'll make its way into pandas and lead to adoption across the scipy/pydata stack.
-p

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John E

unread,
May 18, 2015, 9:15:34 AM5/18/15
to pyd...@googlegroups.com
Yeah, seems really cool.  An example that jumped to mind for me was a sequence of numpy.where's.  Will that fit into the concept given the output is an array?

Could I do this?

df.pipe( np.where( df.x >  0, 1, 0 ))
  .pipe( np.where( df.x > 10, 2, df.x ))

That would be a big improvement over nesting or having to create temp variables.

John E

unread,
May 18, 2015, 9:21:53 AM5/18/15
to pyd...@googlegroups.com
Argh, I really screwed up the syntax in previous post.

More like this?
 
df.pipe( np.where, df.x >  0, 1, 0 )
  .pipe( np.where, df.x > 10, 2, df.x )

I can't quite figure out how to do it, but maybe it doesn't work for numpy or I'd have to wrap each np.where in a Series or DataFrame?

Tom Augspurger

unread,
May 18, 2015, 9:31:17 AM5/18/15
to pyd...@googlegroups.com


On Monday, May 18, 2015 at 8:21:53 AM UTC-5, John E wrote:

More like this?
 
df.pipe( np.where, df.x >  0, 1, 0 )
  .pipe( np.where, df.x > 10, 2, df.x )

I can't quite figure out how to do it, but maybe it doesn't work for numpy or I'd have to wrap each np.where in a Series or DataFrame?

That would work if numpy adopted the protocol.


FYI DataFrames also have a `where` method, but it's slightly different than numpy's. I'm not sure if it would work for your example.

Michael Hooreman

unread,
May 18, 2015, 11:01:17 AM5/18/15
to pyd...@googlegroups.com
Hello,

Sounds cool feature, indeed.

I'd love to have this available with indexing feature as well. For more clarity (yes, habits...), I'd prefer, instead of np.where, passing a "where" argument which is a boolean array of same length thant the original data frame:

filt = df['x'] == y
filt &= df['y'] < 10
df.pipe(function, where=filt, *args)

Also, in order to be able to deal with side effect, it would be great to add a inplace argument which is defaulting to True. If we use False, the pipe will work on a copy, allowing side effects, etc.

But ... that's only my opinion.

Best regards.

John E

unread,
May 18, 2015, 11:49:29 AM5/18/15
to pyd...@googlegroups.com

FYI DataFrames also have a `where` method, but it's slightly different than numpy's. I'm not sure if it would work for your example.

Yeah, thanks, I am aware of pandas 'where' but I hardly ever have a use for it to be honest). I think 'np.where' is a lot more flexible and powerful b/c you can adjust one column depending on another column (or columns), and do this repeatedly.  It's not really possible to do that with 'pd.where' with the same readability and flexibility.  Actually in my simple example here, it would be possible but in other cases it wouldn't (e.g. if new column is a function of multiple other columns).

Anyway, I don't mean to complain about that.  This pipe method sounds like a good move overall and in the meantime I am fine implementing multiple np.where statements via temp variables.  But if I were wishing out loud, I wish pandas had a method like np.where.  But maybe I can do this with 'assign' although I just spent 15 minutes trying to combine assign and np.where and completely failed.  ;-)
Reply all
Reply to author
Forward
0 new messages