Exploring Code Health in a Corpus of Jupyter Notebooks

43 views
Skip to first unread message

Dave

unread,
Jan 23, 2018, 3:06:44 PM1/23/18
to Project Jupyter
Hi All,

I wanted to share the following white paper (abstract below) which details our team's work to systematically measure the code health of Jupyter notebooks from within an nbgallery instance.  As background, nbgallery is an enterprise Jupyter notebook sharing and collaboration platform developed within the Department of Defense.  Our team operates in a unique environment which requires us to take some interesting and unconventional approaches to evaluating code health.  For instance, traditional unit testing in our environment is a challenge since we use dynamic data sets with row-level security (so the data landscape shifts by the day and by the user), and many of our notebook contributors are not typical software developers with experience or interest in developing unit tests.  We think some of our practical approaches arising from evaluating code health in a complex notebook environment might extend to projects like JupyterHub and Binder.

Please check out the paper and let us know if you have any questions/comments.

Thanks!

Dave

Systems that support user-developed code are faced with a key challenge: understanding the health of that code, which we define as the expectation that existing code will function properly in the current environment. The growing popularity of Jupyter notebooks has led to the development of publishing and execution platforms such as the open-source nbgallery project. Users of nbgallery would like to understand when they can expect a notebook to work, and notebook authors may wish to monitor the execution of their code and be informed of errors. This paper describes our initial efforts to measure code health in a corpus of notebooks within an instance of nbgallery. Our vision is that this work will help address problems that arise from user-developed code and motivate further study in systems beyond Jupyter and nbgallery.

Matthias Bussonnier

unread,
Jan 23, 2018, 3:44:43 PM1/23/18
to jup...@googlegroups.com
Hi Dave and the NbGallery team.

I'll add this on our to-read list ! I think that this might be of interest for JupyteCon, the Call for Proposal was opened last week:

Happy to also see some ruby notebooks !

Nice work! Thanks,
-- 
Matthias

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/c755afc5-30ad-4add-8df1-d1a790f0ee5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave

unread,
Jan 23, 2018, 10:40:18 PM1/23/18
to Project Jupyter
Hi Matthias - sounds good.  We'll plan to submit a proposal and are looking forward to attending the conference again this year.


On Tuesday, January 23, 2018 at 1:44:43 PM UTC-7, Matthias Bussonnier wrote:
Hi Dave and the NbGallery team.

I'll add this on our to-read list ! I think that this might be of interest for JupyteCon, the Call for Proposal was opened last week:

Happy to also see some ruby notebooks !

Nice work! Thanks,
-- 
Matthias
On 23 January 2018 at 12:06, Dave <nbgall...@gmail.com> wrote:
Hi All,

I wanted to share the following white paper (abstract below) which details our team's work to systematically measure the code health of Jupyter notebooks from within an nbgallery instance.  As background, nbgallery is an enterprise Jupyter notebook sharing and collaboration platform developed within the Department of Defense.  Our team operates in a unique environment which requires us to take some interesting and unconventional approaches to evaluating code health.  For instance, traditional unit testing in our environment is a challenge since we use dynamic data sets with row-level security (so the data landscape shifts by the day and by the user), and many of our notebook contributors are not typical software developers with experience or interest in developing unit tests.  We think some of our practical approaches arising from evaluating code health in a complex notebook environment might extend to projects like JupyterHub and Binder.

Please check out the paper and let us know if you have any questions/comments.

Thanks!

Dave

Systems that support user-developed code are faced with a key challenge: understanding the health of that code, which we define as the expectation that existing code will function properly in the current environment. The growing popularity of Jupyter notebooks has led to the development of publishing and execution platforms such as the open-source nbgallery project. Users of nbgallery would like to understand when they can expect a notebook to work, and notebook authors may wish to monitor the execution of their code and be informed of errors. This paper describes our initial efforts to measure code health in a corpus of notebooks within an instance of nbgallery. Our vision is that this work will help address problems that arise from user-developed code and motivate further study in systems beyond Jupyter and nbgallery.

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages