nbformat: Removing code to read XML notebook files

Thomas Kluyver

unread,

Feb 11, 2019, 10:01:50 AM2/11/19

to Project Jupyter

Hi all,

Way back in 2011, when the first version of the IPython Notebook was being written, there was an option to store notebooks as XML. JSON was chosen as the default, and has always been the format all of our applications use. However, the code to read XML files stayed around, and recently Danor Cohen pointed out that it is vulnerable to denial of service attacks.

I am proposing to *remove this code* rather than maintain it. None of our applications call it, and I'm not aware of anyone else using it. The XML format is, as far as I know, an abandoned concept.

If you are using that code (nbformat.v2.*_xml), and you can't practically move away from it, now would be a good time to take over maintenance of it. ;-)

The removal PR is here: https://github.com/jupyter/nbformat/pull/133

Thanks,

Thomas

Matthew Seal

unread,

Feb 11, 2019, 4:41:26 PM2/11/19

to jup...@googlegroups.com

I'm 100% for removing -- I can't think of any recent tools that even support xml. Thanks for making the PR!

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAOvn4qh_0JMrZTLweN21eTxKX54q6uy7O%2BErWc6i2PQ8f3r3BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Brian Granger

unread,

Feb 11, 2019, 5:27:30 PM2/11/19

to Project Jupyter

+1

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAJF6vz7pOa0DAP%3DKxjhbHUtA4T5ZG1f3jDoduQhj-GywVvK22w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgra...@calpoly.edu and elli...@gmail.com

Chris Holdgraf

unread,

Feb 11, 2019, 5:43:06 PM2/11/19

to jup...@googlegroups.com

The only major potential stakeholder that (I think) still cares about XML is large-scale publishers, but I don't think there's been any official adoption of Jupyter Notebooks there anyway, so it's probably fine. Thanks for bringing this up Thomas! I'm +1 as well!

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAH4pYpRKyEKjuNkQ-QR43%3D72cp3AEAt14Fq1Kn8hvYBNXoEDNw%40mail.gmail.com.

Fernando Perez

unread,

Feb 12, 2019, 3:16:29 AM2/12/19

to Project Jupyter

+1 for removal - I think that if/when we tackle the problem of publishers, a cleaner approach would probably be to output XML versions of notebook content strictly tailored to their data/metadata schemas (and thus not meant for *ingestion* by Jupyter, only as output).

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAD7kTDHdvKppDpbeOfxJaD42F2q0aGWXsFimAgtA9vg6hNYYyQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

Thomas Kluyver

unread,

Feb 12, 2019, 9:11:40 AM2/12/19

to Project Jupyter

Thanks all, I've merged the PR.

Agreed, this doesn't preclude converting notebooks to XML, but we probably wouldn't use the removed code for that anyway. And we can always resurrect this code if we realise that it is useful for something.

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAHAreOqNKX0u0e8h_jJwikFonMqWdaP%2B8jKCVKSmWWsC_-bW7Q%40mail.gmail.com.

Samuel Lelièvre

unread,

Feb 14, 2019, 4:28:15 PM2/14/19

to Project Jupyter

Related: the latest version of nbconvert added defusedxml as a dependency.

Matthew Seal

unread,

Feb 15, 2019, 8:56:30 PM2/15/19

to jup...@googlegroups.com

Do you think we should remove the functionality from nbconvert to simplify things there too?

On Thu, Feb 14, 2019 at 1:28 PM Samuel Lelièvre <samuel....@gmail.com> wrote:

Related: the latest version of nbconvert added defusedxml as a dependency.

--

You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/13b9ce48-fb1a-48a7-9fdb-b49dbe7df91f%40googlegroups.com.

Chris Holdgraf

unread,

Feb 15, 2019, 9:55:51 PM2/15/19

to jup...@googlegroups.com

Maybe deprecate it for a release cycle and see if anybody complains once they see the warning?

(also I am, in general, always in favor of simplifying things in nbconvert :-) )

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAJF6vz4qbJ5%2BY2U8G6SdeUTPcKFxkhW4rnjiQpDT25H2pb4j4Q%40mail.gmail.com.

Thomas Kluyver

unread,

Feb 16, 2019, 10:23:43 AM2/16/19

to Project Jupyter

Nbconvert doesn't have the same functionality for reading XML notebook files. It was using the standard libary ElementTree module to parse HTML for a couple of filters, and this was switched to defusedxml to prevent attacks where a (JSON) notebook containing maliciously crafted HTML was sent to something like Nbviewer. I didn't check how easily exploitable this was; it seemed reasonable to assume it could be exploited, and the fix was easy.

You can see the changes to nbconvert here: https://github.com/jupyter/nbconvert/pull/708

To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/CAD7kTDHfm6sc0T0rT%2BweLpP7LXb3gGCPGGQ7AOiQHxPp%2BLo-Ag%40mail.gmail.com.

Reply all

Reply to author

Forward