[ANN] nbconvert 5.3.0 — now with tag-based element filtering!

950 views

Skip to first unread message

M Pacer

unread,

Sep 2, 2017, 12:01:59 AM9/2/17

to jup...@googlegroups.com

We are pleased to announced the release of nbconvert 5.3.0!

It is available via pypi (pip install nbconvert -U) and conda-forge (conda install nbconvert -c conda-forge).

This release has a number of bug fixes as well as docs, testing, and other miscellaneous improvements.

In addition, we're excited to share with everyone the news that nbconvert now supports tag-based element filtering!

Cell metadata tags allow filtering cell level elements, which removes:

cells with tags that match the tags in TagRemovePreprocessor.remove_cell_tags,
inputs in cells with tags that match the tags in TagRemovePreprocessor.remove_input_tags,
all outputs in cells with tags that match TagRemovePreprocessor.remove_all_outputs_tags.

To remove individual output elements (leaving others) you cannot use cell metadata tags — instead, you will need to use output metadata tags (see below for more explanation). Doing so removes:

Outputs with tags that match the tags in TagRemovePreprocessor.remove_single_output_tags

These different traitlets can be mixed and matched as desired.

A much more comprehensive explanation as to how to use tag-based element filtering is included at the end of this announcement.

For more details about the full release, see the changelog.

We thank the following 9 authors who contributed 176 commits.

* Benjamin Ragan-Kelley

* Damián Avila

* M Pacer

* Matthias Bussonnier

* Michael Scott Cuthbert

* mpacer

* Patricia Hanus

* tdalseide

* Thomas Kluyver

Cheers,

the nbconvert team

Filtering Notebook Content

Users have long asked for a way to remove content from notebooks when converting them with nbconvert. There are lots of use cases for this kind of feature.

Sometimes content makes sense in interactive contexts, but not when presenting the work to other people. For example, in a data analytic report for non-experts who don't know python, it may not make sense to show them code. It could also makes sense to show some code cells, but not others. For example, if you wanted to show data analyst colleagues how you processed the data, but not your import statements or plotting commands. Or if you were showing designer colleagues how you plotted your results you might want to hide your analyses but show your plotting code.

It would be frivolous to have to recreate the same content to generate one report for each of those cases. It should be possible to easily create each of those reports from the same source notebook. This release of nbconvert addresses these (and many other) use cases.

Global Content Filtering

With 5.2 we introduced global content filtering, which allows you to remove every instance of different kinds of elements. This allows you to remove every input, every output, every code or markdown cell, and all the input or output prompts. This release solved the filtering problem for many use-cases.

**new** Tag-based Element Filtering

But global content filtering doesn't allow you to remove only some of your code, it will remove every input no matter what you do. In order to remove only some of the content, we have to have a way to specify which content should be removed.

This release allows you to specify which elements are to be removed through the use of tags.

What are tags?

Tags are strings that cannot contain spaces or commas. You can assign tags to cells and they will be accessible in the cell's metadata. See the nbformat cell metadata docs for more information.

Tag Toolbar

In notebook 5.0, we introduced the tag toolbar, which allows you to assign and remove tags to cells in the notebook interface.

The basic user model for assigning and removing tags to one cell is demonstrated in the gif below:

Using tags for filtering cells

The simplest case of removing elements is cells, so we'll describe how to do that first.

If you wanted to remove some cells from your notebook RemoveElements.ipynb, you would apply the same tag to each of those cells. Let's say that this tag is called to_remove.

Let's say that you wanted to just remove those cells and convert it to a static html page, then you would use the following command:

jupyter nbconvert RemoveElements --TagRemovePreprocessor.remove_cell_tags={\"to_remove\"}

NB: you need to use curly brackets and escape quotes so that the value will be interpreted as a python set. You can avoid this complication by using an external config file and inside it placing the code: c.TagRemovePreprocessor.remove_cell_tags.add("to_remove"). This holds for all of the other traitlet values as well. For more on passing see the traitlets documentation on configurable objects.

Using tags for filtering inputs

In the examples we gave, we wanted to remove some inputs leaving their outputs (for example to show plots without the plotting code). In order to remove only the inputs we change the traitlet that holds the tags that we are going to use to figure out which cells should have some of their content removed. Specifically, instead of adding the tag to remove_cell_tags, we would add it to remove_input_tags. Then, all of the cells that had tags matching remove_input_tags would have their inputs removed.

For example, if we wanted to remove the inputs from those cells with the to_remove tag we would set

jupyter nbconvert RemoveElements --TagRemovePreprocessor.remove_input_tags={\"to_remove\"}

Using tags for filtering all of a cell's outputs

Cells can have multiple outputs. If we're using cell metadata, we're speaking at the cell level. So when we say we want a cell's outputs removed, we're saying that we want all the outputs removed. Again we would use a different traitlet value, this time we add the tag to remove_all_outputs_tags instead of adding it to remove_cell_tags.

For example, if we wanted to remove all the outputs of those cells tagged with to_remove we would use:

jupyter nbconvert RemoveElements --TagRemovePreprocessor.remove_all_outputs_tags={\"to_remove\"}

Using tags for filtering single outputs

It is possible to remove individual outputs, but that needs to be specified in the individual outputs' metadata, not the cells' metadata.

So, if we wanted to remove only a single output using the to_remove tag, we need to set the output metadata on that to have a tags field and set that array to include the to_remove tag. The easiest way to do this is to use IPython's display function; display is a builtin to IPython as of 5.4 and 6.1. We use display because it takes an optional metadata argument for setting output metadata.

Thus if in RemoveElements.ipynb we had a code cell with the following source:

display("hello", metadata={"tags":["to_remove"]})

display("goodbye", metadata={})

and we executed it, converted it to markdown and displayed it using

jupyter nbconvert RemoveElements --to markdown && cat RemoveElements.md

we'd see

```python

display("hello", metadata={"tags":["to_remove"]})

display("goodbye", metadata={})

```

'hello'

'goodbye'

versus

jupyter nbconvert RemoveElements --to markdown --TagRemovePreprocessor.remove_single_output_tags={\"to_remove\"} && cat RemoveElements.md

```python

display("hello", metadata={"tags":["to_remove"]})

display("goodbye", metadata={})

```

'goodbye'

Mixing and matching: more complicated filtering workflows

You can use all of these traitlets at the same to interesting effect. In some cases, you will want to use different tags for filtering different kinds of information, especially if those kinds of information could conflict. On the other hand, you might want to use the same tags for different kinds of information if they are complementary. Consider the following example.

Suppose you had a notebook My2Reports.ipynb capable of producing two completely different kinds of reports depending on which of two possible code paths it takes. One code path includes a collection of cells all of which are tagged with A, and the other code path includes cells all of which are tagged with B. In both code paths you have cells that create plots have less than beautiful code for their beautiful plots; those cells are tagged with hide_plot_code (in addition to A or B). Additionally, you have some cells that you need to run to set the data up, but in so doing these cells spit out a lot of logs that you want to remove — those cells are tagged with hide_noisy_logs. Finally, you have some cells at the end that provide results for both code paths A and B (so these cells would be tagged with neither A nor B), but the individual outputs will be tagged with either A or B. Then you would be able to get two versions of the resulting report by running the following lines of code:

jupyter nbconvert ComplicatedFilteringExample --output Assignment_A --to markdown \

--TagRemovePreprocessor.remove_cell_tags={\"B\"} \

--TagRemovePreprocessor.remove_input_tags={\"hide_plot_code\"} \

--TagRemovePreprocessor.remove_all_outputs_tags={\"hide_noisy_logs\"} \

--TagRemovePreprocessor.remove_single_output_tags={\"B\"}

jupyter nbconvert ComplicatedFilteringExample --output report_B --to markdown \

--TagRemovePreprocessor.remove_cell_tags={\"A\"} \

--TagRemovePreprocessor.remove_input_tags={\"hide_plot_code\"} \

--TagRemovePreprocessor.remove_all_outputs_tags={\"hide_noisy_logs\"} \

--TagRemovePreprocessor.remove_single_output_tags={\"A\"}

Or if you wanted to use config files, you could have configA.py:

cat configA.py

c.NbConvertApp.output_base = "report_A"

c.NbConvertApp.export_format = "markdown"

c.TagRemovePreprocessor.remove_cell_tags.add("B")

c.TagRemovePreprocessor.remove_input_tags.add("hide_plot_code")

c.TagRemovePreprocessor.remove_all_outputs_tags.add("hide_noisy_logs")

c.TagRemovePreprocessor.remove_single_output_tags.add("B")

and configB.py:

cat configB.py

c.NbConvertApp.output_base = "report_B"

c.NbConvertApp.export_format = "markdown"

c.TagRemovePreprocessor.remove_cell_tags.add("A")

c.TagRemovePreprocessor.remove_input_tags.add("hide_plot_code")

c.TagRemovePreprocessor.remove_all_outputs_tags.add("hide_noisy_logs")

c.TagRemovePreprocessor.remove_single_output_tags.add("A")

Then you could run much shorter versions of the above commands and produce the same output:

jupyter nbconvert ComplicatedFilteringExample --config configA.py

jupyter nbconvert ComplicatedFilteringExample --config configB.py

Wrapping up:

You can easily set cell metadata using the notebook's cell tag toolbar.

The traitlets for filtering at the cell level exactly match strings in the ith cell's nb.cells[i].metadata.tags value.

The traitlets that filter at the cell level are:

TagRemovePreprocessor.remove_cell_tags
TagRemovePreprocessor.remove_input_tags
TagRemovePreprocessor.remove_all_output_tags

You can easily set output metadata using IPython's display function.

The traitlet for filtering at the output level exactly matches strings for ith cell's jth output's nb.cells[ith].outputs[j].metadata.tags value.

And the traitlet that filters at the output level is

TagRemovePreprocessor.remove_single_output_tags

The remove_*_tags traitlets are sets.

On the command line you need to use curly brackets {} and escape your quotes.

Via a config file, use the .add method.

For complicated collections of filters, it is usually easier to follow and reproduce if you create a config file.

Carol Willing

unread,

Sep 5, 2017, 5:05:30 AM9/5/17

to jup...@googlegroups.com

Congrats M. Thank you to all the contributors for this release.

Warmly,

Carol

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAM3SX469%3DY_ZbevXm0VkWpE0JYuYs9HrwgJ9kFc6Q-ea6CbDKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Damián Avila

unread,

Sep 5, 2017, 6:56:48 AM9/5/17

to jup...@googlegroups.com

This is really exciting!

Thanks M for all your hard work. And thanks to all who contributed.

Cheers.

To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAM3SX469%3DY_ZbevXm0VkWpE0JYuYs9HrwgJ9kFc6Q-ea6CbDKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Project Jupyter" group.

To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAM3VvhyxuE65_eYthRbueAAMDn2LAbUS56gDYuQxRQhV74A0Uw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Damián Avila

Matt Craig

unread,

Sep 5, 2017, 9:50:09 PM9/5/17

to Project Jupyter, mpace...@gmail.com

This looks really neat -- I'm starting to try to use notebooks to do enrollment projections for a committee I'm on but we'll need to produce what would essentially be filtered output of the results for wider dissemination.

Matt Craig

John Griffiths

unread,

Mar 29, 2018, 1:15:20 AM3/29/18

to Project Jupyter

I've tried putting the examples above into notebooks and executing and currently get either thrown errors or no filtering.

Are there some worked notebook example files somewhere you could point to for this?