As a end-dev that uses your library for a small time, it's an useful tool.
We're migrating quicker an Erlang application to Python with your library because the legacy application uses JSON schema.
From my point of view, validating I/O data is a common problem of most developers, however, it means that you have a lot of developers that have a strong opinion how to validate data ;-)
At least to me, it's a good idea to include this library in Python, even if you have plenty of libraries to do that with several approachs, for now, I didn't find a simpler approach that via JSON schemas.
The bonus with that is that you can reuse your JSON schemas for migrations and also in your javascript source code.
It isn't a silver bullet to resolve all validation corner cases, however enough powerful to resolve the most boring use cases.
Ludovic Gasc (GMLudo)
http://www.gmludo.eu/
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
> RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04
I note that this draft, apparently written in Nov. 2011, expired
almost two years ago with no update. OTOH, 4 other RFCs related to
JSON (6901, 6902, 7386, 7396) have been published recently. (This
kind of thing is common with RFCs; people get fed up with the process
and just go off and do something that's "good enough" for them. But
it does show they've given up on the process of getting a global
standard at least for now.) Then in Oct 2012, Andy Newton wrote[1]:
Schemas. There is no one standardized schema language for JSON,
although several are presently in the works (including one by this
author). The need for a JSON schema language is controversial—JSON
is regarded by most as simple enough on its own. Indeed, there is
no shortage of JSON-based interchange specification making due
without schema formalism.
and his independent proposal[2] (confusingly called "content rules")
is current, expiring on June 5. (Note that there is no proposal
currently being discussed by the IETF APPSAWG. Newton's proposal is
independent, pending formation of a new charter for a JSON schema WG.)
> My question is: Is there any reason up front anyone can see that
> this addition wouldn’t fly?
I would say that the evident controversy over which schema language
will be standardized is a barrier, unless you can say that Newton's
proposals have no support from the community or something like that.
It's not a terribly high barrier in one sense (Python doesn't demand
that modules be perfect in all ways), but you do have to address the
perception of controversy, I think (at least to deny there really is
any).
A more substantive issue is that Appendix A of Newton's I-D certainly
makes json-schema look "over the top" in verbosity of notation -- XML
would be proud.<wink /> If that assessment is correct, the module
could be considered un-Pythonic (see Zen #4, and although JSON content
rules are not themselves JSON while JSON schema is valid JSON, see Zen
#9).
N.B. I'm not against this proposal, just answering your question.
I did see that somebody named James Newton-King (aka newtonsoft.com)
has an implementation of json-schema for .NET, and json-schema.org
seems to be in active development, which are arguments in favor of
your proposal.
Footnotes:
[1] http://www.internetsociety.org/articles/using-json-ietf-protocols
[2] https://tools.ietf.org/html/draft-newton-json-content-rules-04
------------------------------
Message: 7
Date: Thu, 21 May 2015 19:15:20 +1000
From: Nick Coghlan <ncog...@gmail.com>
To: Paul Moore <p.f....@gmail.com>
Cc: Demian Brecht <demian...@gmail.com>, Python-Ideas
<python...@python.org>
Subject: Re: [Python-ideas] Adding jsonschema to the standard library
Message-ID:
<CADiSq7cmRPQdpC8wv3xyt20dV=Pf9uPfB1k-Q3a6kQH=khv...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Counter-point: What library is the de facto standard of doing HTTP in
Python? Requests is, of course. Discussion of its inclusion has
happened several times and each time the decision is to not include
it. The most recent such discussion was at the Language Summit at
PyCon 2015 in Montreal. If you want to go by download count, then
Requests should still be in the standard library but it just will not
happen.
> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.
That's not exactly true in every case. The only library that parses
and emits YAML is PyYAML. It's both unmaintained, incomplete, and full
of bugs. That said, it's the de facto standard and it's the only onw
of its kind that I know of on PyPI. I would vehemently argue against
its inclusion were it ever purposed.
> 2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.
Counter-point, once you know you want to use JSON Schema looking for
implementations in python yields Julian's implementation first.
You said (paraphrasing) in your first email that jsonschema should
only be excluded from the stdlib if people could bring up reasons
against it. The standard library has grown in the past few releases
but that doesn't mean it needs to grow every time. It also means it
doesn't need to grow to include an implementation of every possible
/thing/ that exists. Further, leaving it up to others to prove why it
shouldn't be included isn't sufficient. You have to prove to the
community why it MUST be included. Saying "Ah let's throw this thing
in there anyway because why not" isn't valid. By that logic, I could
nominate several libraries that I find useful in day-to-day work and
the barrier to entry would be exactly as much energy as people who
care about the standard library are willing to expend to keep the less
than sultry candidates out.
In this case, that /thing/ is JSON Schema. Last I checked, JSON Schema
was a IETF Draft that was never accepted and a specification which
expired. That means in a couple years, ostensibly after this was added
to the stdlib, it could be made completely irrelevant and the time to
fix it would be incredible. That would be far less of an issue if
jsonschema were not included at all.
Overall, I'm strongly against its inclusion. Not because the library
isn't excellent. It is. I use it. I'm strongly against it for the
reasons listed above.
> In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”.
The other way of saying that is to say it explicitly in the stdlib docs, usage docs, and/or tutorial and link to the package. While that used to be pretty rare, that's changed recently. Off the top of my head, there are links to setuptools, requests, nose, py.test, Pillow, PyObjC, py2app, PyWin32, WConio, Console, UniCurses, Urwid, the major alternative GUI frameworks, Twisted, and pexpect.
So, if you wrote something to put in the json module docs, the input/output section of the tutorial, or a howto explaining that if you want structured and validated JSON the usual standard is JSON Schema and the jsonschema library can do it for you in Python, that would get most of the same benefits as adding jsonschema to the stdlib without most of the costs.
> I see two problems with relying on pip and PyPI as an alternative to bundling:
In general, there's a potentially much bigger reason: some projects can't use arbitrary third-party projects without a costly vetting process, or need to work on machines that don't have Internet access or don't have a way to install user site-packages or virtualenvs, etc. Fortunately, those kinds of problems aren't likely to come up for the kinds of projects that need JSON Schema (e.g., Internet servers, client frameworks that are themselves installed via pip, client apps that are distributed by bundling with cx_Freeze/py2app/etc.).
> 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.
Usually this is a strength, not a weakness. Until one project really is good enough to become the de facto standard, you wouldn't want to limit the competition, right? The problem traditionally has been that once something _does_ reach that point, there's no way to make that clear--but now that the stdlib docs link to outside projects, there's a solution.
> 2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.
_______________________________________________
The mirror of this would be asking if Django should rip out it's base
classes for models, views, etc. I think Python 4 could move towards
perhaps deprecating any duplicated modules, but I see no point to rip
the entire standard library out... except maybe for
httplib/urllib/etc. (for various reasons beyond my obvious conflict of
interest).
> Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.
>
>
On May 27, 2015 at 2:57:54 PM, Demian Brecht (demian...@gmail.com) wrote:
I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that “FooLang Platform” or something.
This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well.
As far as Python is concerned, while I think the above model is better in the general sense, I think that it’s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We’re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It’s also the case (though we’re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach.
---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> On May 23, 2015, at 7:21 AM, Nick Coghlan <ncog...@gmail.com> wrote:
>
> https://www.djangopackages.com/ covers this well for the Django
> ecosystem (I actually consider it to be one of Django's killer
> features, and I'm pretty sure I'm not alone in that - like
> ReadTheDocs, it was a product of DjangoDash 2010).
Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while:
With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.
"Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst#standard-library)
This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.
Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.
Could Python 4 tear out the stdlib completely and go to pypi, to what I
believe Nick Coghlan called stdlib+, or would this be A PEP Too Far,
given the one or two minor issues over the move from Python 2 to Python 3?
Yes this is my very dry sense of humour working, but at the same time if
it gets somebody thinking, which in turn gets somebody else thinking,
then hopefully ideas come up which are practical and everybody benefits.
Just my £0.02p worth.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
Dependencies are always going to be a problem. The best way to parse XML is lxml (and the best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform requires libxml2? The best way to do numerical computing is with NumPy, and the best way to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform GUIs with desktop integration is PySide; does that mean the Python Platform requires Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; Qt would be much worse.)
You could look at it as something like the core plus distributions model used in OS's. FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX system plus enough to build ports, nothing else), and the practicality-vs.-purity decisions for how to apply that to real-life problems isn't that hard. But Linux took a different approach: it's just a kernel, and everything else--libc, the ports system, etc.--can be swapped out. There is no official distribution; at any given time in history, there are 3-6 competing "major distributions", dozens of others based on them, and some "special-case" distros like ucLinux or Android. And that means different distros can make different decisions on what dependencies are acceptable--include packages that only run on x86, or accept some corporate quasi-open-source license or closed-source blob.
Python seems to have fallen into a place halfway between the two. The stdlib is closer to FreeBSD core than to Linux. On the other hand, while many people start with the official stdlib and use pip to expand on it, there are third-party distributions competing to provide more useful or better-organized batteries than the official version, plus custom distributions that come with some OS distros (e.g., Apple includes PyObjC with theirs), and special things like Kivy.
That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year plan for how the stdlib, core distribution, and third-party ecosystem should be better, how much different would Python be today?
On May 27, 2015 at 5:50:55 PM, Andrew Barnert (abar...@yahoo.com) wrote:
It certainly doesn’t require you to add something to the “Platform” for every topic either. You can still be conservative in what you include in the “Platform” based on how many people are likely to need/want it and what sort of dependency or building impact it has on actually building out the full Platform.
---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
One way to do that might be to focus the stdlib on picking the abstract interfaces (whether in the actual code, like dbm allows bsddb to plug in, or just in documentation, like DB-API 2) and providing a bare-bones implementation or none at all. It would be nice if things like lxml.etree didn't take so much work and it weren't so hard to quantify how perfect of a replacement it is. Or if we had a SortedMapping ABC so the half-dozen popular implementations could share a consistent API, so they could compete more cleanly on things that matter like performance or the need for a C extension.
But the example of requests shows how hard, and possibly undesirable, that is. Most people use requests not because of the advanced features it has that urllib doesn't, but because the intermediate-level features that both include have a nicer interface in requests. And, while people have talked about how nice it would be to restructure urllib so that it matches requests' interface wherever possible (while still retaining the existing interface for backward compat), it doesn't seem that likely anyone will actually ever do it. And, even if someone did, and requests became a drop-in replacement for urllib' new-style API and urllib was eventually deprecated, what are the odds competitors like PyCurl would be reworked into a "URL-API 2.0" module?
On 28 May 2015 04:46, "Paul Moore" <p.f....@gmail.com> wrote:
>
> On 27 May 2015 at 19:28, Demian Brecht <demian...@gmail.com> wrote:
> > This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?
>
> It has been discussed on a number of occasions. The major issue with
> the idea is that a lot of people use Python in closed corporate
> environments, where access to the internet from tools such as pip can
> be restricted. Also, many companies have legal approval processes for
> software - getting approval for "Python" includes the standard
> library, but each external package required would need a separate,
> probably lengthy and possibly prohibitive, approval process before it
> could be used.
>
> So it's unlikely to ever happen, because it would cripple Python for a
> non-trivial group of its users.
I expect splitting the standard library into a minimal core and a suite of default independently updatable add-ons will happen eventually, we just need to help fix the broken way a lot of organisations currently work as we go: http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastructure/
Organisations that don't suitably adapt to the rise of open collaborative models for infrastructure development are going to have a very rough time of it in the coming years.
Cheers,
Nick.
P.S. For a less verbally dense presentation of some of the concepts in that article: http://www.redhat.com/en/explore/infrastructure/na
P.P.S. And for a book length exposition of these kinds of concepts: http://www.redhat.com/en/explore/the-open-organization-book
>
> Paul
> This is probably a silly idea, but given the above quote and the
> new(er) focus on pip and distributed packages, has there been any
> discussion around perhaps deprecating (and entirely removing from a
> Python 4 release) non-builtin packages and modules?
Of course there has, including in parallel to your post. It's a dead
obvious idea. I'd point to threads, but none of the ones I remember
would be of great use; the same ideas and suggestions that were
advanced before have been reproduced here.
The problems are that the devil is in the details which are rarely
specified, and it would have a huge impact on relationships in the
community. For example, in the context of a relatively short timed
release cycle, I do recall the debates mentioned by Nick over
corporate environments where "Python" (the CPython distribution) is
approved as a single package, so stdlib facilities are automatically
available to "Python" users, but other packages would need to be
approved on a package-by-package basis. There's significant overhead
to each such application, so it is efficiency-increasing to have a
big stdlib in those environments.
OK, you say, so we automatically bundle the separate stdlib current at
a given point in time with the less frequently released Python core
distribution. Now, in the Department of Devilsh Details, do those
"same core + new stdlib" bundles get the core version number, the
stdlib version number (which now must be different!) or a separate
bundle version number? In the Bureau of Relationship Impacts, if I
were a fascist QA/security person, I would surely view that bundle as
a new release requiring a new iteration of the security vetting
process (relationship impact). Maybe the departments doing such
vetting are not as fascist as I would be, but we'd have to find out,
wouldn't we? If we just went ahead with this process and discovered
later that 80% of the people who were depending on the "Python"
package now cannot benefit from the bundling because the tarball
labelled "Python-X.Y" no longer is eternal, that would be sad.
And although that is the drag on a core/stdlib release cycle split
most often cited, I'm sure there are plenty of others. Is it worth
the effort to try to discover and address all/most/some of those?
Which ones to address (and we don't know what problems might exist
yet!)?
> I would think that if there was a system similar to Django Packages
> that made discoverability/importing of packages as easy as using
> those in the standard library, having a distributed package model
> where bug fixes and releases could be done out of band with CPython
> releases would likely more beneficial to the end users. If there
> was a “recommended packages” framework, perhaps there could also be
> buildbots put to testing interoperability of the recommended
> package set.
I don't think either "recommended packages" or buildbots scales much
beyond Django (and I wonder whether buildbots would even scale to the
Django packages ecosystem). But the Python ecosystem includes all of
Django already, plus NumPy, SciPy, Pandas, Twisted, Egenix's mx*
stuff, a dozen more or less popular ORMs, a similar number of web
frameworks more or less directly competing with Django itself, and all
the rest of the cast of thousands on PyPI.
At the present time, I think we need to accept that integration of a
system, even one that implements a single application, has a shallow
learning curve. It takes quite a bit of time to become aware of needs
(my initial reaction was "json-schema in the stdlib? YAGNI!!"), and
some time and a bit of Google-foo to translate needs to search
keywords. After that, the Googling goes rapidly -- that's a solved
problem, thank you very much DEC AltaVista. Then you hit the multiple
implementations wall, and after recovering consciousness, you start
moving forward again slowly, evaluating alternatives and choosing one.
And that doesn't mean you're done, because those integration decisions
will not be set in stone. Eg, for Mailman's 3.0 release, Barry
decided to swap out two mission-critical modules, the ORM and the REST
generator -- after the first beta was released! Granted, Mailman 3.0
has had an extremely long release process, but the example remains
relevant -- such reevaluations occur in .2 or .9 releases all the
time.) Except for Googling, none of these tasks are solved problems:
the system integrator has to go through the process over again each
time with a new system, or in an existing system when the relative
strengths of the chosen modules vs. alternatives change dramatically.
In this last case, it's true that choosing keywords is probably
trivial, and the alternative pruning goes faster, but retrofitting the
whole system to the new! improved! alternative!! module may be pretty
painful -- and there's not necessarily a guarantee it will succeed.
IMO, fiddling with the Python release and distribution is unlikely to
solve any of the above problems, and is likely to be a step backward
for some users. Of course at some point we decide the benefits to
other users, the developers, and the release engineers outweigh the
costs to the users who don't like the change, but it's never a
no-brainer.
While perhaps nice in theory, the process of getting a package into
the standard library provides a number of filters (hurdles, if you
will) through which a package much pass (or surmount) before it is
deemed suitable for broad availability by default to users, and for
support by the core development team. Today, that includes
documentation, unit tests, broad acceptance by the user community (in
many cases), and a commitment by the core development team to maintain
the package for the foreseeable future. To the best of my knowledge,
none of those filters apply to PyPI-cataloged packages. That is not to
say that the current process doesn't have its problems. Some really
useful stuff is surely not available in the core. If the core
development team was stacked with people who program numeric
applications for a living, perhaps numpy or something similar would be
in the core today.
The other end of the spectrum is Perl. It has been more than a decade
since I did any Perl programming, and even then, not much, but I still
remember how confused I was trying to choose a package to manipulate
dates and times from CPAN with no guidance. I know PyPI has a weight
field. I just went back and reread the footnote describing it, but I
really have no idea how it operates. I'm sure someone nefarious could
game that system so their security compromising package drifts toward
the top of the list. Try searching for "xml." 2208 packages are
return, with weights ranging from 1 to 9. 107 packages have weights of
8 or 9. If the standard library is to dwindle down to next-to-nothing,
a better scheme for package selection/recommendation will have to be
developed.
Skip