Hi All,
I wanted to share
the following white paper
(abstract below) which details our team's work to systematically
measure the code health of Jupyter notebooks from within an nbgallery
instance. As background,
nbgallery is
an enterprise Jupyter notebook sharing and collaboration platform
developed within the Department of Defense. Our team operates in a
unique environment which requires us to take some interesting and
unconventional approaches to evaluating code health. For instance,
traditional unit testing in our environment is a challenge since we use
dynamic data sets with row-level security (so the data landscape shifts
by the day and by the user), and many of our notebook contributors are
not typical software developers with experience or interest in
developing unit tests. We think some of our practical approaches
arising from evaluating code health in a complex notebook environment
might extend to projects like JupyterHub and Binder.
Please check out
the paper and let us know if you have any questions/comments.
Thanks!
Dave
Systems that support user-developed code are faced with a key challenge:
understanding the health of that code, which we define as the
expectation that existing code will function properly in the current
environment. The growing popularity of Jupyter notebooks has led to the
development of publishing and execution platforms such as the
open-source nbgallery
project. Users of nbgallery would like to understand when they can
expect a notebook to work, and notebook authors may wish to monitor the
execution of their code and be informed of errors. This paper describes
our initial efforts to measure code health in a corpus of notebooks
within an instance of nbgallery. Our vision is that this work will help
address problems that arise from user-developed code and motivate
further study in systems beyond Jupyter and nbgallery.