How to design JupyterLab extension that reacts to pandas function calls

51 views
Skip to first unread message

Adam Rule

unread,
Sep 17, 2018, 6:40:33 PM9/17/18
to Project Jupyter
I have noticed that a number of Jupyter users call df.head(), df.shape, df.describe(), or something similar almost every time they load or manipulate a dataframe to inspect what their manipulation did. I would like to develop an extension or kernel magic that prints useful information to a cell's output based on the pandas function called in that cell. For example, running pd.read_csv() might automatically print the shape and column names of the loaded dataframe and df.drop_duplicates() might automatically print how many duplicates were dropped and how many unique rows remain.

How might I architect such an extension (e.g., a JupyterLab extension, an iPython kernel magic, or something else)? I think I would need to detect when certain pandas functions are about to be run by the kernel and gather information about the dataframe immediately before and after execution. Would that even be feasible?

Brian Granger

unread,
Sep 17, 2018, 7:45:19 PM9/17/18
to Project Jupyter
I think it could be regular python code that uses the Jupyter display system to display the information. The challenge is to figure out how to detect the pandas function calls and add the needed logic before and after. A good starting point might be to just monkey patch the relevant pandas calls and wrap them in the logic you need. That would allow you to get started quickly and explore the problem space.

On Mon, Sep 17, 2018 at 3:40 PM Adam Rule <acr...@gmail.com> wrote:
I have noticed that a number of Jupyter users call df.head(), df.shape, df.describe(), or something similar almost every time they load or manipulate a dataframe to inspect what their manipulation did. I would like to develop an extension or kernel magic that prints useful information to a cell's output based on the pandas function called in that cell. For example, running pd.read_csv() might automatically print the shape and column names of the loaded dataframe and df.drop_duplicates() might automatically print how many duplicates were dropped and how many unique rows remain.

How might I architect such an extension (e.g., a JupyterLab extension, an iPython kernel magic, or something else)? I think I would need to detect when certain pandas functions are about to be run by the kernel and gather information about the dataframe immediately before and after execution. Would that even be feasible?

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/9e1dfb92-87e0-4a8b-b170-370ab7cfabc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgra...@calpoly.edu and elli...@gmail.com

Adam Rule

unread,
Sep 17, 2018, 8:42:26 PM9/17/18
to Project Jupyter
Great point Brian. For prototyping's sake monkey patching should be enough to test if the interaction is valuable or not.

Michael Milligan

unread,
Sep 18, 2018, 11:56:34 AM9/18/18
to Project
FYI you might take a look at the Jupyterlab variable inspector work being tracked here: https://github.com/jupyterlab/jupyterlab/issues/443

Seems like there could be some overlap with what you are trying to do.

Cheers,
Michael

To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Michael Milligan, Ph.D.         | Supercomputing Institute
Assistant Director for          | University of Minnesota
   Application Development      | mill...@umn.edu
www.msi.umn.edu/staff/milligan  | Phone: 612-624-8857

Adam Rule

unread,
Sep 18, 2018, 4:44:34 PM9/18/18
to Project Jupyter
Thanks! I've been keeping an eye on that thread and agree, there is a lot of overlap.

Tony Fast

unread,
Sep 18, 2018, 7:47:21 PM9/18/18
to Project Jupyter
You could likely achieve this with a custom profiler.  The profiler would provide you with the calling function and the return value; you'd have to implement the other business logic.  MonkeyType is a good example a custom profiler.  Have a look at the call tracer to understand arguments returned by a custom profiler.  This approach would allow this hinting tool to work as a cell magic or context manager.


Reply all
Reply to author
Forward
0 new messages