[Python-ideas] Adding jsonschema to the standard library

Demian Brecht

unread,

May 21, 2015, 1:29:56 AM5/21/15

to python...@python.org

Disclaimer: I’m not the author of jsonschema (https://github.com/Julian/jsonschema), but as a user think that users of the standard library (and potentially areas of the standard library itself) could benefit from its addition into the standard library.

I’ve been using jsonschema for the better part of a couple years now and have found it not only invaluable, but flexible around the variety of applications it has. Personally, I generally use it for HTTP response validation when dealing with RESTful APIs and system configuration input validation. For those not familiar with the package:

RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04
Home: http://json-schema.org/
Proposed addition implementation: https://github.com/Julian/jsonschema

Coles notes stats:

Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014)
Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

I’ve reached out to the author to express my interest in authoring a PEP to have the module included to gauge his interest in assisting with maintenance as needed during the integration period (or following). I’d also be personally interested in supporting it as part of the stdlib as well.

My question is: Is there any reason up front anyone can see that this addition wouldn’t fly, or are others interested in the addition as well?

Thanks,
Demian

signature.asc

Ludovic Gasc

unread,

May 21, 2015, 1:46:50 AM5/21/15

to Demian Brecht, Python-Ideas

As a end-dev that uses your library for a small time, it's an useful tool.

We're migrating quicker an Erlang application to Python with your library because the legacy application uses JSON schema.

From my point of view, validating I/O data is a common problem of most developers, however, it means that you have a lot of developers that have a strong opinion how to validate data ;-)

At least to me, it's a good idea to include this library in Python, even if you have plenty of libraries to do that with several approachs, for now, I didn't find a simpler approach that via JSON schemas.

The bonus with that is that you can reuse your JSON schemas for migrations and also in your javascript source code.

It isn't a silver bullet to resolve all validation corner cases, however enough powerful to resolve the most boring use cases.

Ludovic Gasc (GMLudo)
http://www.gmludo.eu/

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Demian Brecht

unread,

May 21, 2015, 1:54:23 AM5/21/15

to Ludovic Gasc, Python-Ideas

> On May 20, 2015, at 10:46 PM, Ludovic Gasc <gml...@gmail.com> wrote:
> As a end-dev that uses your library for a small time, it's an useful tool.

> Disclaimer: I’m not the author of jsonschema

Emphasis on /not/. I’m just another user of the library like you :) But cheers for the feedback!

signature.asc

Yury Selivanov

unread,

May 21, 2015, 2:00:14 AM5/21/15

to python...@python.org

On 2015-05-21 1:29 AM, Demian Brecht wrote:
[..]

> My question is: Is there any reason up front anyone can see that this addition wouldn’t fly, or are others interested in the addition as well?
>

I think we should wait at least until json-schema.org releases a final
version of the spec.

Thanks,
Yury

Demian Brecht

unread,

May 21, 2015, 2:19:11 AM5/21/15

to Yury Selivanov, python...@python.org

> On May 20, 2015, at 10:59 PM, Yury Selivanov <yseliv...@gmail.com> wrote:
> I think we should wait at least until json-schema.org releases a final version of the spec.

I’d thought about that as well, but here were the arguments that I could think of that led me to proposing this in the first place:

The latest draft of the RFC expired Jan 31, 2013. I’d have to try to reach out to the author(s) to confirm, but I’d venture to say there likely isn’t much more effort being put into it.

The library is in heavy use and is useful in practice in its current state. I think that in situations like this practicality of a module should come first and finalized spec second.

There are numerous places in the library that deviate from specs in the name of practical use. I’m not advocating that shouldn’t be an exception as opposed to the rule, I’m just saying that there are multiple things to consider prior to simply squashing an inclusion because of RFC draft state.

signature.asc

Stephen J. Turnbull

unread,

May 21, 2015, 3:40:22 AM5/21/15

to Demian Brecht, python...@python.org

Demian Brecht writes:

> RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04

I note that this draft, apparently written in Nov. 2011, expired
almost two years ago with no update. OTOH, 4 other RFCs related to
JSON (6901, 6902, 7386, 7396) have been published recently. (This
kind of thing is common with RFCs; people get fed up with the process
and just go off and do something that's "good enough" for them. But
it does show they've given up on the process of getting a global
standard at least for now.) Then in Oct 2012, Andy Newton wrote[1]:

Schemas. There is no one standardized schema language for JSON,
although several are presently in the works (including one by this
author). The need for a JSON schema language is controversial—JSON
is regarded by most as simple enough on its own. Indeed, there is
no shortage of JSON-based interchange specification making due
without schema formalism.

and his independent proposal[2] (confusingly called "content rules")
is current, expiring on June 5. (Note that there is no proposal
currently being discussed by the IETF APPSAWG. Newton's proposal is
independent, pending formation of a new charter for a JSON schema WG.)

> My question is: Is there any reason up front anyone can see that

> this addition wouldn’t fly?

I would say that the evident controversy over which schema language
will be standardized is a barrier, unless you can say that Newton's
proposals have no support from the community or something like that.
It's not a terribly high barrier in one sense (Python doesn't demand
that modules be perfect in all ways), but you do have to address the
perception of controversy, I think (at least to deny there really is
any).

A more substantive issue is that Appendix A of Newton's I-D certainly
makes json-schema look "over the top" in verbosity of notation -- XML
would be proud.<wink /> If that assessment is correct, the module
could be considered un-Pythonic (see Zen #4, and although JSON content
rules are not themselves JSON while JSON schema is valid JSON, see Zen
#9).

N.B. I'm not against this proposal, just answering your question.

I did see that somebody named James Newton-King (aka newtonsoft.com)
has an implementation of json-schema for .NET, and json-schema.org
seems to be in active development, which are arguments in favor of
your proposal.

Footnotes:
[1] http://www.internetsociety.org/articles/using-json-ietf-protocols

[2] https://tools.ietf.org/html/draft-newton-json-content-rules-04

Stephen J. Turnbull

unread,

May 21, 2015, 3:52:29 AM5/21/15

to Yury Selivanov, python...@python.org

Yury Selivanov writes:

> I think we should wait at least until json-schema.org releases a
> final version of the spec.

If you mean an RFC, there are all kinds of reasons, some important,
some just tedious, why a perfectly good spec never gets released as an
RFC. I agree that the fact that none of the IETF, W3C, or ECMA has
released a formal spec yet needs discussion.

Paul Moore

unread,

May 21, 2015, 3:57:57 AM5/21/15

to Demian Brecht, Python-Ideas

On 21 May 2015 at 06:29, Demian Brecht <demian...@gmail.com> wrote:
> Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014)
> Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

One key question that should be addressed as part of any proposal for
inclusion into the stdlib. Would switching to having feature releases
only when a new major Python version is released (with bugfixes at
minor releases) be acceptable to the project? From the figures you
quote, it sounds like there has been some rapid development, although
things seem to have slowed down now, so maybe things are stable
enough.

Paul

Stephen J. Turnbull

unread,

May 21, 2015, 4:05:21 AM5/21/15

to Demian Brecht, python...@python.org

Demian Brecht writes:

> The latest draft of the RFC expired Jan 31, 2013.

Actually, expiration is more than half a year fresher: August 4,
2013. But AFAICT none of the schema proposals were RFC track at all,
let alone normative. They're just in support of various other
JSON-related IETF work.

Steve

Nick Coghlan

unread,

May 21, 2015, 5:15:49 AM5/21/15

to Paul Moore, Python-Ideas

On 21 May 2015 at 17:57, Paul Moore <p.f....@gmail.com> wrote:
> On 21 May 2015 at 06:29, Demian Brecht <demian...@gmail.com> wrote:
>> Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014)
>> Heavily used by the community: Currently sees ~585k downloads per month according to PyPI
>
> One key question that should be addressed as part of any proposal for
> inclusion into the stdlib. Would switching to having feature releases
> only when a new major Python version is released (with bugfixes at
> minor releases) be acceptable to the project? From the figures you
> quote, it sounds like there has been some rapid development, although
> things seem to have slowed down now, so maybe things are stable
> enough.

The other question to be answered these days is the value bundling
offers over "pip install jsonschema" (or a platform specific
equivalent). While it's still possible to meet that condition, it's
harder now that we offer pip as a standard feature, especially since
getting added to the standard library almost universally makes life
more difficult for module maintainers if they're not already core
developers.

I'm not necessarily opposed to including JSON schema validation in
general or jsonschema in particular (I've used it myself in the past
and think it's a decent option if you want a bit more rigor in your
data validation), but I'm also not sure how large an overlap there
will be between "could benefit from using jsonschema", "has a
spectacularly onerous package review process", and "can't already get
jsonschema from an approved source".

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Julian Berman

unread,

May 21, 2015, 5:11:13 PM5/21/15

to python...@python.org

Hey, author here, thanks a lot Demian for even suggesting such a thing :).

I'm really glad that people have found jsonschema useful.

I actually tend these days to think similarly to what Nick mentioned, that the standard library really has decreased in importance as pip has shaped up and now been bundled -- so overall my personal opinion is that I wouldn't personally be pushing to get jsonschema in -- but! If you felt strongly, just some brief answers -- I think jsonschema would be able to cope with more restricted release cycles.

And there are a few areas that I don't like about jsonschema (some APIs) which eventually I'd like to fix (RefResolver in particular), but for the most part I think it has stabilized more or less.

I can provide some more details if there's any interest.

Thanks again for even proposing such a thing :)

-Julian

On Thu, May 21, 2015 at 2:15 AM, <python-ide...@python.org> wrote:

------------------------------

Message: 7
Date: Thu, 21 May 2015 19:15:20 +1000
From: Nick Coghlan <ncog...@gmail.com>
To: Paul Moore <p.f....@gmail.com>
Cc: Demian Brecht <demian...@gmail.com>, Python-Ideas
<python...@python.org>
Subject: Re: [Python-ideas] Adding jsonschema to the standard library
Message-ID:
<CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khv...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Terry Reedy

unread,

May 21, 2015, 6:38:15 PM5/21/15

to python...@python.org

On 5/21/2015 5:10 PM, Julian Berman wrote:
> Hey, author here, thanks a lot Demian for even suggesting such a thing :).

Welcome to python-ideas.

> I'm really glad that people have found jsonschema useful.

In response to Demian, the module initially strikes me, a non-json user,
as too specialized for the stdlib, even if extremely useful to people
within the specialty. The high pypi download rate could be interpreted
as meaning that the module does not need to be in the stdlib to be
discovered and used.

> I actually tend these days to think similarly to what Nick mentioned,
> that the standard library really has decreased in importance as pip has
> shaped up and now been bundled -- so overall my personal opinion is that
> I wouldn't personally be pushing to get jsonschema in -- but! If you
> felt strongly, just some brief answers -- I think jsonschema would be
> able to cope with more restricted release cycles.

As a core developer, I can see a downside for you, so I would advise you
to decline the invitation unless you see a stronger upside than is
immediately obvious.

> And there are a few areas that I don't like about jsonschema (some APIs)
> which eventually I'd like to fix (RefResolver in particular), but for
> the most part I think it has stabilized more or less.

--
Terry Jan Reedy

Demian Brecht

unread,

May 22, 2015, 12:40:00 PM5/22/15

to Nick Coghlan, Python-Ideas

First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day.

> On May 21, 2015, at 2:15 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>
> The other question to be answered these days is the value bundling
> offers over "pip install jsonschema" (or a platform specific
> equivalent). While it's still possible to meet that condition, it's
> harder now that we offer pip as a standard feature, especially since
> getting added to the standard library almost universally makes life
> more difficult for module maintainers if they're not already core
> developers.

This is an interesting problem and a question that I’ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”. I see two problems with relying on pip and PyPI as an alternative to bundling:

1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.
2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

signature.asc

Ian Cordasco

unread,

May 22, 2015, 3:16:55 PM5/22/15

to Demian Brecht, Python-Ideas

On Fri, May 22, 2015 at 11:39 AM, Demian Brecht <demian...@gmail.com> wrote:
> First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day.
>
>> On May 21, 2015, at 2:15 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>>
>> The other question to be answered these days is the value bundling
>> offers over "pip install jsonschema" (or a platform specific
>> equivalent). While it's still possible to meet that condition, it's
>> harder now that we offer pip as a standard feature, especially since
>> getting added to the standard library almost universally makes life
>> more difficult for module maintainers if they're not already core
>> developers.
>
> This is an interesting problem and a question that I’ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”. I see two problems with relying on pip and PyPI as an alternative to bundling:

Counter-point: What library is the de facto standard of doing HTTP in
Python? Requests is, of course. Discussion of its inclusion has
happened several times and each time the decision is to not include
it. The most recent such discussion was at the Language Summit at
PyCon 2015 in Montreal. If you want to go by download count, then
Requests should still be in the standard library but it just will not
happen.

> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

That's not exactly true in every case. The only library that parses
and emits YAML is PyYAML. It's both unmaintained, incomplete, and full
of bugs. That said, it's the de facto standard and it's the only onw
of its kind that I know of on PyPI. I would vehemently argue against
its inclusion were it ever purposed.

> 2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

Counter-point, once you know you want to use JSON Schema looking for
implementations in python yields Julian's implementation first.

You said (paraphrasing) in your first email that jsonschema should
only be excluded from the stdlib if people could bring up reasons
against it. The standard library has grown in the past few releases
but that doesn't mean it needs to grow every time. It also means it
doesn't need to grow to include an implementation of every possible
/thing/ that exists. Further, leaving it up to others to prove why it
shouldn't be included isn't sufficient. You have to prove to the
community why it MUST be included. Saying "Ah let's throw this thing
in there anyway because why not" isn't valid. By that logic, I could
nominate several libraries that I find useful in day-to-day work and
the barrier to entry would be exactly as much energy as people who
care about the standard library are willing to expend to keep the less
than sultry candidates out.

In this case, that /thing/ is JSON Schema. Last I checked, JSON Schema
was a IETF Draft that was never accepted and a specification which
expired. That means in a couple years, ostensibly after this was added
to the stdlib, it could be made completely irrelevant and the time to
fix it would be incredible. That would be far less of an issue if
jsonschema were not included at all.

Overall, I'm strongly against its inclusion. Not because the library
isn't excellent. It is. I use it. I'm strongly against it for the
reasons listed above.

Donald Stufft

unread,

May 22, 2015, 3:23:44 PM5/22/15

to Ian Cordasco, Python-Ideas

> On May 22, 2015, at 3:08 PM, Ian Cordasco <graffatc...@gmail.com> wrote:
>
>>
>> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.
>
> That's not exactly true in every case. The only library that parses
> and emits YAML is PyYAML. It's both unmaintained, incomplete, and full
> of bugs. That said, it's the de facto standard and it's the only onw
> of its kind that I know of on PyPI. I would vehemently argue against
> its inclusion were it ever purposed.
>
>> 2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.
>
> Counter-point, once you know you want to use JSON Schema looking for
> implementations in python yields Julian's implementation first.

I think a future area of work is going to be on improving the ability for
people who don't know what they want to find out that they want something and
which thing they want on PyPI. I'm not entirely sure what this is going to look
like but I think it's an important problem. It's being solved for very specific
cases by starting to have the standard documentation explicitly call out these
defacto standards of the Python ecosystem where it makes sense. This of course
does not scale to every single problem domain or module on PyPI so we still
need a more general solution.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

signature.asc

Andrew Barnert via Python-ideas

unread,

May 22, 2015, 3:25:21 PM5/22/15

to Demian Brecht, Python-Ideas

On May 22, 2015, at 09:39, Demian Brecht <demian...@gmail.com> wrote:

> In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”.

The other way of saying that is to say it explicitly in the stdlib docs, usage docs, and/or tutorial and link to the package. While that used to be pretty rare, that's changed recently. Off the top of my head, there are links to setuptools, requests, nose, py.test, Pillow, PyObjC, py2app, PyWin32, WConio, Console, UniCurses, Urwid, the major alternative GUI frameworks, Twisted, and pexpect.

So, if you wrote something to put in the json module docs, the input/output section of the tutorial, or a howto explaining that if you want structured and validated JSON the usual standard is JSON Schema and the jsonschema library can do it for you in Python, that would get most of the same benefits as adding jsonschema to the stdlib without most of the costs.

> I see two problems with relying on pip and PyPI as an alternative to bundling:

In general, there's a potentially much bigger reason: some projects can't use arbitrary third-party projects without a costly vetting process, or need to work on machines that don't have Internet access or don't have a way to install user site-packages or virtualenvs, etc. Fortunately, those kinds of problems aren't likely to come up for the kinds of projects that need JSON Schema (e.g., Internet servers, client frameworks that are themselves installed via pip, client apps that are distributed by bundling with cx_Freeze/py2app/etc.).

> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

Usually this is a strength, not a weakness. Until one project really is good enough to become the de facto standard, you wouldn't want to limit the competition, right? The problem traditionally has been that once something _does_ reach that point, there's no way to make that clear--but now that the stdlib docs link to outside projects, there's a solution.

> 2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

_______________________________________________

Stephen J. Turnbull

unread,

May 22, 2015, 11:09:38 PM5/22/15

to Donald Stufft, Python-Ideas

Donald Stufft writes:

> I think a future area of work is going to be on improving the
> ability for people who don't know what they want to find out that
> they want something and which thing they want on PyPI. I'm not
> entirely sure what this is going to look like

+1

> but I think it's an important problem.

+1

> It's being solved for very specific cases by starting to have the
> standard documentation explicitly call out these defacto standards
> of the Python ecosystem where it makes sense.

Because that's necessarily centralized, it's a solution to a different
problem. We need a decentralized approach to deal with the "people
who use package X often would benefit from Y too, but don't know where
to find Y or which implementation to use." IOW, there needs to be a
way for X to recommend implementation Z (or implementations Z1 or Z2)
of Y.

> This of course does not scale to every single problem domain or
> module on PyPI so we still need a more general solution.

The only way we know to scale a web is to embed the solution in the
nodes. Currently many packages know what they use internally (the
install_requires field), but as far as I can see there's no way for a
package X to recommend "related" packages Z to implement function Y in
applications using X. Eg, the plethora of ORMs available, some of
which work better with particular packages than others do.

We could also recommend that package maintainers document such
recommendations, preferably in a fairly standard place, in their
package documentation. Even something like "I've successfully used Z
to do Y in combination with this package" would often help a lot.

If a maintainer (obvious extension: 3rd party recommendations and
voting) wants to recommend other packages that work and play well with
her package but aren't essential to its function, how about a
dictionary mapping Trove classifiers to lists of recommended packages
for that implmenentation?

Andrew Barnert via Python-ideas

unread,

May 23, 2015, 1:11:28 AM5/23/15

to Stephen J. Turnbull, Python-Ideas

This is a really cool idea, but it would help to have some specific examples.

For example, BeautifulSoup can only use html5lib or lxml as optional HTML parsers, and lxml as an optional XML parser; nothing else will do any good. But it works well with any HTTP request engine, so any "global" recommendation is a good idea, so it should get the same list (say, requests, urllib3, grequests, pycurl) as any other project that wants to suggest an HTTP request engine. And as for scraper frameworks, that should look at the global recommendations, but restricted to the ones that use, or can use, BeautifulSoup. I'm not sure how to reasonably represent all three of those things in a node.

Of course it's quite possible that I jumped right to a particularly hard example with unique problems that don't need to be solved in general, and really only the first one is necessary, in which case this is a much simpler problem...

Stephen J. Turnbull

unread,

May 23, 2015, 2:55:40 AM5/23/15

to Andrew Barnert, Python-Ideas

Andrew Barnert writes:

> > If a maintainer (obvious extension: 3rd party recommendations and
> > voting) wants to recommend other packages that work and play well with
> > her package but aren't essential to its function, how about a
> > dictionary mapping Trove classifiers to lists of recommended packages
> > for that implmenentation?
>
> This is a really cool idea, but it would help to have some specific examples.
>
> For example, BeautifulSoup can only use html5lib or lxml as
> optional HTML parsers, and lxml as an optional XML parser; nothing
> else will do any good. But it works well with any HTTP request
> engine, so any "global" recommendation is a good idea, so it should
> get the same list (say, requests, urllib3, grequests, pycurl) as
> any other project that wants to suggest an HTTP request engine. And
> as for scraper frameworks, that should look at the global
> recommendations, but restricted to the ones that use, or can use,
> BeautifulSoup. I'm not sure how to reasonably represent all three
> of those things in a node.

Well, #2 is easy. You just have a special "global" node that has the
same kind of classifier->package map, and link to that. I don't think
#3 can be handled so easily, and probably it's not really worth it
complexifying things that far at first -- I think you probably need
most of SQL to express such constraints. I suspect that I would
handle #3 with a special sort of "group" package, that just requires
certain classifiers and then recommends implementations of them that
work well together. It would be easy for the database to
automatically update a group's recommended implementations to point to
the group (which would be yet another new attribute for the package).

I'll take a look at the whole shebang and see if I can come up with
something a bit more elegant than the crockery of adhoc-ery above, but
it will be at least next week before I have anything to say.

Steve

Nick Coghlan

unread,

May 23, 2015, 10:22:16 AM5/23/15

to Stephen J. Turnbull, Python-Ideas

On 23 May 2015 at 12:59, Stephen J. Turnbull <ste...@xemacs.org> wrote:
> Donald Stufft writes:

> > It's being solved for very specific cases by starting to have the
> > standard documentation explicitly call out these defacto standards
> > of the Python ecosystem where it makes sense.
>
> Because that's necessarily centralized, it's a solution to a different
> problem. We need a decentralized approach to deal with the "people
> who use package X often would benefit from Y too, but don't know where
> to find Y or which implementation to use." IOW, there needs to be a
> way for X to recommend implementation Z (or implementations Z1 or Z2)
> of Y.

https://www.djangopackages.com/ covers this well for the Django
ecosystem (I actually consider it to be one of Django's killer
features, and I'm pretty sure I'm not alone in that - like
ReadTheDocs, it was a product of DjangoDash 2010).

There was an effort a few years back to set up an instance of that for
PyPI in general, as well as similar comparison sites for Pyramid and
Plone, but none of them ever hit the same kind of critical mass of
useful input as the Django one.

The situation has changed substantially since then, though, as we've
been more actively promoting pip, PyPI and third party libraries as
part of the recommended Python developer experience, and the main
standard library documentation now delegates to packaging.python.org
for the details after very brief introductions to installing and
publishing packages.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Ludovic Gasc

unread,

May 24, 2015, 7:57:32 AM5/24/15

to Nick Coghlan, Python-Ideas

Hi all,

After to read all responses, I've changed my mind:

At the first look, the advantage to push jsonschema into Python lib is to standardize and promote an actual good practice.

But yes, you're right, it's too early to include that because the standard should be changed and/or abandonned by a new good practice, like SOAP and REST.

It's more future proof to promote PyPI and pip to Python developers.

Regards.

--

Ludovic Gasc (GMLudo)

http://www.gmludo.eu/

Demian Brecht

unread,

May 27, 2015, 2:29:25 PM5/27/15

to Nick Coghlan, Python-Ideas

> On May 23, 2015, at 7:21 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>
> https://www.djangopackages.com/ covers this well for the Django
> ecosystem (I actually consider it to be one of Django's killer
> features, and I'm pretty sure I'm not alone in that - like
> ReadTheDocs, it was a product of DjangoDash 2010).

Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while:

With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.

"Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library)

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.

signature.asc

Paul Moore

unread,

May 27, 2015, 2:46:47 PM5/27/15

to Demian Brecht, Python-Ideas

On 27 May 2015 at 19:28, Demian Brecht <demian...@gmail.com> wrote:
> This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?

It has been discussed on a number of occasions. The major issue with
the idea is that a lot of people use Python in closed corporate
environments, where access to the internet from tools such as pip can
be restricted. Also, many companies have legal approval processes for
software - getting approval for "Python" includes the standard
library, but each external package required would need a separate,
probably lengthy and possibly prohibitive, approval process before it
could be used.

So it's unlikely to ever happen, because it would cripple Python for a
non-trivial group of its users.

Paul

Ian Cordasco

unread,

May 27, 2015, 2:56:09 PM5/27/15

to Demian Brecht, Python-Ideas

On Wed, May 27, 2015 at 1:28 PM, Demian Brecht <demian...@gmail.com> wrote:
>
>> On May 23, 2015, at 7:21 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>>
>> https://www.djangopackages.com/ covers this well for the Django
>> ecosystem (I actually consider it to be one of Django's killer
>> features, and I'm pretty sure I'm not alone in that - like
>> ReadTheDocs, it was a product of DjangoDash 2010).
>
> Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while:
>
> With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.
>
> "Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library)
>
> This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

The mirror of this would be asking if Django should rip out it's base
classes for models, views, etc. I think Python 4 could move towards
perhaps deprecating any duplicated modules, but I see no point to rip
the entire standard library out... except maybe for
httplib/urllib/etc. (for various reasons beyond my obvious conflict of
interest).

> Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.
>
>

Demian Brecht

unread,

May 27, 2015, 2:57:52 PM5/27/15

to Paul Moore, Python-Ideas

> On May 27, 2015, at 11:46 AM, Paul Moore <p.f....@gmail.com> wrote:
>
> So it's unlikely to ever happen, because it would cripple Python for a
> non-trivial group of its users.

I’m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all “recommended” packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a “recommended” package was that it’s released under PSFL, I’m assuming there wouldn’t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don’t alienate the user base that can’t rely on pip?

signature.asc

Donald Stufft

unread,

May 27, 2015, 3:04:21 PM5/27/15

to Paul Moore, Demian Brecht, Python-Ideas

On May 27, 2015 at 2:57:54 PM, Demian Brecht (demian...@gmail.com) wrote:

I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that “FooLang Platform” or something.

This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well.

As far as Python is concerned, while I think the above model is better in the general sense, I think that it’s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We’re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It’s also the case (though we’re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Demian Brecht

unread,

May 27, 2015, 3:13:40 PM5/27/15

to Ian Cordasco, Python-Ideas

> On May 27, 2015, at 11:55 AM, Ian Cordasco <graffatc...@gmail.com> wrote:
>
> The mirror of this would be asking if Django should rip out it's base
> classes for models, views, etc. I think Python 4 could move towards
> perhaps deprecating any duplicated modules, but I see no point to rip
> the entire standard library out... except maybe for
> httplib/urllib/etc. (for various reasons beyond my obvious conflict of
> interest).

I can somewhat see the comparison, but not entirely because Django itself is a package and not the core interpreter and set of builtins. There are also other frameworks that split out modules from the core (I’m not overly familiar with either, but I believe both zope and wheezy follow such models).

The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as “recommended” without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I’m somewhat cutting my legs out from under me, but I’m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)).

I’m also aware of the politics of such a change. What does it mean then for core devs who concentrate on the current standard library and don’t contribute to the interpreter core or builtins?

signature.asc

Demian Brecht

unread,

May 27, 2015, 3:17:04 PM5/27/15

to Ian Cordasco, Python-Ideas

> On May 27, 2015, at 12:13 PM, Demian Brecht <demian...@gmail.com> wrote:
>
> without worrying too much about state of development

I should have elaborated on this more: What I mean is more around feature development, such as introducing HTTP/2.0 to requests. The core feature set would still have to be well proven and have minimal to no changes.

signature.asc

Wes Turner

unread,

May 27, 2015, 3:23:54 PM5/27/15

to Demian Brecht, Python-Ideas

On Wed, May 27, 2015 at 1:28 PM, Demian Brecht <demian...@gmail.com> wrote:

> On May 23, 2015, at 7:21 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>
> https://www.djangopackages.com/ covers this well for the Django
> ecosystem (I actually consider it to be one of Django's killer
> features, and I'm pretty sure I'm not alone in that - like
> ReadTheDocs, it was a product of DjangoDash 2010).

Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while:

With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.

So there is a schema.org/SoftwareApplication (or doap:Project, or seon:) Resource,

which has

* a unique URI (e.g. http://python.org/pypi/readme)

* JSON metadata extracted from setup.py into pydist.json (setuptools, wheel)

- [ ] create JSON-LD @context

- [ ] create mappings to standard schema

* [ ] http://schema.org/SoftwareApplication

* [ ] http://schema.org/SoftwareSourceCode

In terms of schema.org, a Django Packages resource has:

* [ ] a unique URI

* [ ] typed features (predicates with ranges)

* [ ] http://schema.org/review

* [ ] http://schema.org/VoteAction

* [ ] http://schema.org/LikeAction

"Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library)

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

Tox is great for this (in conjunction with whichever build system: BuildBot, TravisCI)

Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.

jsonschema // JSON-LD (RDF)

Mark Lawrence

unread,

May 27, 2015, 4:28:45 PM5/27/15

to python...@python.org

Could Python 4 tear out the stdlib completely and go to pypi, to what I
believe Nick Coghlan called stdlib+, or would this be A PEP Too Far,
given the one or two minor issues over the move from Python 2 to Python 3?

Yes this is my very dry sense of humour working, but at the same time if
it gets somebody thinking, which in turn gets somebody else thinking,
then hopefully ideas come up which are practical and everybody benefits.

Just my £0.02p worth.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Andrew Barnert via Python-ideas

unread,

May 27, 2015, 5:53:40 PM5/27/15

to Donald Stufft, Python-Ideas

On May 27, 2015, at 12:03, Donald Stufft <don...@stufft.io> wrote:
>
>> On May 27, 2015 at 2:57:54 PM, Demian Brecht (demian...@gmail.com) wrote:
>>
>>> On May 27, 2015, at 11:46 AM, Paul Moore wrote:
>>>
>>> So it's unlikely to ever happen, because it would cripple Python for a
>>> non-trivial group of its users.
>>
>> I’m just throwing ideas at the wall here, but would it not be possible to release two versions,
>> one for those who choose to use decentralized packages with out-of-band releases and
>> one with all “recommended” packages bundled (obvious potential for version conflicts
>> and such aside)? If one of the prerequisites of a “recommended” package was that it’s
>> released under PSFL, I’m assuming there wouldn’t be any legal issues with going down
>> such a path? That way, you still get the ability to decentralize the library, but don’t
>> alienate the user base that can’t rely on pip?
>
>
> I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that “FooLang Platform” or something.

Dependencies are always going to be a problem. The best way to parse XML is lxml (and the best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform requires libxml2? The best way to do numerical computing is with NumPy, and the best way to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform GUIs with desktop integration is PySide; does that mean the Python Platform requires Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; Qt would be much worse.)

You could look at it as something like the core plus distributions model used in OS's. FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX system plus enough to build ports, nothing else), and the practicality-vs.-purity decisions for how to apply that to real-life problems isn't that hard. But Linux took a different approach: it's just a kernel, and everything else--libc, the ports system, etc.--can be swapped out. There is no official distribution; at any given time in history, there are 3-6 competing "major distributions", dozens of others based on them, and some "special-case" distros like ucLinux or Android. And that means different distros can make different decisions on what dependencies are acceptable--include packages that only run on x86, or accept some corporate quasi-open-source license or closed-source blob.

Python seems to have fallen into a place halfway between the two. The stdlib is closer to FreeBSD core than to Linux. On the other hand, while many people start with the official stdlib and use pip to expand on it, there are third-party distributions competing to provide more useful or better-organized batteries than the official version, plus custom distributions that come with some OS distros (e.g., Apple includes PyObjC with theirs), and special things like Kivy.

That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year plan for how the stdlib, core distribution, and third-party ecosystem should be better, how much different would Python be today?

Donald Stufft

unread,

May 27, 2015, 5:54:44 PM5/27/15

to Andrew Barnert, Python-Ideas

On May 27, 2015 at 5:50:55 PM, Andrew Barnert (abar...@yahoo.com) wrote:

It certainly doesn’t require you to add something to the “Platform” for every topic either. You can still be conservative in what you include in the “Platform” based on how many people are likely to need/want it and what sort of dependency or building impact it has on actually building out the full Platform.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Andrew Barnert via Python-ideas

unread,

May 27, 2015, 6:09:46 PM5/27/15

to Demian Brecht, Python-Ideas

On May 27, 2015, at 12:13, Demian Brecht <demian...@gmail.com> wrote:
>
> The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as “recommended” without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I’m somewhat cutting my legs out from under me, but I’m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)).

One way to do that might be to focus the stdlib on picking the abstract interfaces (whether in the actual code, like dbm allows bsddb to plug in, or just in documentation, like DB-API 2) and providing a bare-bones implementation or none at all. It would be nice if things like lxml.etree didn't take so much work and it weren't so hard to quantify how perfect of a replacement it is. Or if we had a SortedMapping ABC so the half-dozen popular implementations could share a consistent API, so they could compete more cleanly on things that matter like performance or the need for a C extension.

But the example of requests shows how hard, and possibly undesirable, that is. Most people use requests not because of the advanced features it has that urllib doesn't, but because the intermediate-level features that both include have a nicer interface in requests. And, while people have talked about how nice it would be to restructure urllib so that it matches requests' interface wherever possible (while still retaining the existing interface for backward compat), it doesn't seem that likely anyone will actually ever do it. And, even if someone did, and requests became a drop-in replacement for urllib' new-style API and urllib was eventually deprecated, what are the odds competitors like PyCurl would be reworked into a "URL-API 2.0" module?

Nick Coghlan

unread,

May 27, 2015, 7:20:21 PM5/27/15

to Paul Moore, python...@python.org

On 28 May 2015 04:46, "Paul Moore" <p.f....@gmail.com> wrote:
>
> On 27 May 2015 at 19:28, Demian Brecht <demian...@gmail.com> wrote:
> > This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?
>
> It has been discussed on a number of occasions. The major issue with
> the idea is that a lot of people use Python in closed corporate
> environments, where access to the internet from tools such as pip can
> be restricted. Also, many companies have legal approval processes for
> software - getting approval for "Python" includes the standard
> library, but each external package required would need a separate,
> probably lengthy and possibly prohibitive, approval process before it
> could be used.
>
> So it's unlikely to ever happen, because it would cripple Python for a
> non-trivial group of its users.

I expect splitting the standard library into a minimal core and a suite of default independently updatable add-ons will happen eventually, we just need to help fix the broken way a lot of organisations currently work as we go: http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastructure/

Organisations that don't suitably adapt to the rise of open collaborative models for infrastructure development are going to have a very rough time of it in the coming years.

Cheers,
Nick.

P.S. For a less verbally dense presentation of some of the concepts in that article: http://www.redhat.com/en/explore/infrastructure/na

P.P.S. And for a book length exposition of these kinds of concepts: http://www.redhat.com/en/explore/the-open-organization-book

>
> Paul

Stephen J. Turnbull

unread,

May 27, 2015, 9:31:40 PM5/27/15

to Demian Brecht, Python-Ideas

Demian Brecht writes:

> This is probably a silly idea, but given the above quote and the
> new(er) focus on pip and distributed packages, has there been any
> discussion around perhaps deprecating (and entirely removing from a
> Python 4 release) non-builtin packages and modules?

Of course there has, including in parallel to your post. It's a dead
obvious idea. I'd point to threads, but none of the ones I remember
would be of great use; the same ideas and suggestions that were
advanced before have been reproduced here.

The problems are that the devil is in the details which are rarely
specified, and it would have a huge impact on relationships in the
community. For example, in the context of a relatively short timed
release cycle, I do recall the debates mentioned by Nick over
corporate environments where "Python" (the CPython distribution) is
approved as a single package, so stdlib facilities are automatically
available to "Python" users, but other packages would need to be
approved on a package-by-package basis. There's significant overhead
to each such application, so it is efficiency-increasing to have a
big stdlib in those environments.

OK, you say, so we automatically bundle the separate stdlib current at
a given point in time with the less frequently released Python core
distribution. Now, in the Department of Devilsh Details, do those
"same core + new stdlib" bundles get the core version number, the
stdlib version number (which now must be different!) or a separate
bundle version number? In the Bureau of Relationship Impacts, if I
were a fascist QA/security person, I would surely view that bundle as
a new release requiring a new iteration of the security vetting
process (relationship impact). Maybe the departments doing such
vetting are not as fascist as I would be, but we'd have to find out,
wouldn't we? If we just went ahead with this process and discovered
later that 80% of the people who were depending on the "Python"
package now cannot benefit from the bundling because the tarball
labelled "Python-X.Y" no longer is eternal, that would be sad.

And although that is the drag on a core/stdlib release cycle split
most often cited, I'm sure there are plenty of others. Is it worth
the effort to try to discover and address all/most/some of those?
Which ones to address (and we don't know what problems might exist
yet!)?

> I would think that if there was a system similar to Django Packages
> that made discoverability/importing of packages as easy as using
> those in the standard library, having a distributed package model
> where bug fixes and releases could be done out of band with CPython
> releases would likely more beneficial to the end users. If there
> was a “recommended packages” framework, perhaps there could also be
> buildbots put to testing interoperability of the recommended
> package set.

I don't think either "recommended packages" or buildbots scales much
beyond Django (and I wonder whether buildbots would even scale to the
Django packages ecosystem). But the Python ecosystem includes all of
Django already, plus NumPy, SciPy, Pandas, Twisted, Egenix's mx*
stuff, a dozen more or less popular ORMs, a similar number of web
frameworks more or less directly competing with Django itself, and all
the rest of the cast of thousands on PyPI.

At the present time, I think we need to accept that integration of a
system, even one that implements a single application, has a shallow
learning curve. It takes quite a bit of time to become aware of needs
(my initial reaction was "json-schema in the stdlib? YAGNI!!"), and
some time and a bit of Google-foo to translate needs to search
keywords. After that, the Googling goes rapidly -- that's a solved
problem, thank you very much DEC AltaVista. Then you hit the multiple
implementations wall, and after recovering consciousness, you start
moving forward again slowly, evaluating alternatives and choosing one.

And that doesn't mean you're done, because those integration decisions
will not be set in stone. Eg, for Mailman's 3.0 release, Barry
decided to swap out two mission-critical modules, the ORM and the REST
generator -- after the first beta was released! Granted, Mailman 3.0
has had an extremely long release process, but the example remains
relevant -- such reevaluations occur in .2 or .9 releases all the
time.) Except for Googling, none of these tasks are solved problems:
the system integrator has to go through the process over again each
time with a new system, or in an existing system when the relative
strengths of the chosen modules vs. alternatives change dramatically.
In this last case, it's true that choosing keywords is probably
trivial, and the alternative pruning goes faster, but retrofitting the
whole system to the new! improved! alternative!! module may be pretty
painful -- and there's not necessarily a guarantee it will succeed.

IMO, fiddling with the Python release and distribution is unlikely to
solve any of the above problems, and is likely to be a step backward
for some users. Of course at some point we decide the benefits to
other users, the developers, and the release engineers outweigh the
costs to the users who don't like the change, but it's never a
no-brainer.

Skip Montanaro

unread,

May 28, 2015, 10:36:30 AM5/28/15

to Donald Stufft, Python-Ideas

On Wed, May 27, 2015 at 2:03 PM, Donald Stufft <don...@stufft.io> wrote:
> I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library.

While perhaps nice in theory, the process of getting a package into
the standard library provides a number of filters (hurdles, if you
will) through which a package much pass (or surmount) before it is
deemed suitable for broad availability by default to users, and for
support by the core development team. Today, that includes
documentation, unit tests, broad acceptance by the user community (in
many cases), and a commitment by the core development team to maintain
the package for the foreseeable future. To the best of my knowledge,
none of those filters apply to PyPI-cataloged packages. That is not to
say that the current process doesn't have its problems. Some really
useful stuff is surely not available in the core. If the core
development team was stacked with people who program numeric
applications for a living, perhaps numpy or something similar would be
in the core today.

The other end of the spectrum is Perl. It has been more than a decade
since I did any Perl programming, and even then, not much, but I still
remember how confused I was trying to choose a package to manipulate
dates and times from CPAN with no guidance. I know PyPI has a weight
field. I just went back and reread the footnote describing it, but I
really have no idea how it operates. I'm sure someone nefarious could
game that system so their security compromising package drifts toward
the top of the list. Try searching for "xml." 2208 packages are
return, with weights ranging from 1 to 9. 107 packages have weights of
8 or 9. If the standard library is to dwindle down to next-to-nothing,
a better scheme for package selection/recommendation will have to be
developed.

Skip