On Sun, 11 Apr 2021 at 10:25, Stéfane Fermigier <s...@fermigier.com
> On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <gu...@python.org
>> I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future.
> 2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind.
I don't think that this distinction is so important in practice. What
matters is the boundary between different codebases. Within a codebase
a team can agree whatever conventions they want because any time
something is changed it can be changed everywhere at once in a single
commit. What matters is communicating from project A to all downstream
developers and users what can be expected to remain compatible across
different versions of project A.
> 3) A library author can decide that the public API of its library is the part exposed by its top-level namespace, e.g:
> from flask import Flask, resquest, ...
> where Flask, request, etc. are defined in subpackages of flask.
This approach really doesn't scale. For one thing imports in Python
have a run-time performance cost and if everything is imported at
top-level like this then it implies that the top level __init__.py has
imported every module in the codebase. Also organising around
sub-packages and modules is much nicer for organising documentation,
module level docstrings etc.
> I guess this issue also comes from the lack of a proper way to define (either as a language-level construct or as a convention among developers and tools authors) a proper notion of a "public API", and, once again, I don't believe __all__ helps much with this issue.
> => For these use cases, a simple solution could be devised, that doesn't involve language-level changes but needs a wide consensus among both library authors and tools authors:
> 1) Mark public API namespaces package with a special marker (for instance: __public_api__ = True).
> 2) Statical and runtime tools could be easily devised that raise a warning when:
> a) Such a marker is present in one or more modules of the package.
> b) and: one imports another module of the same package from another package.
> This is just a rough idea. Additional use cases could easily be added by adding other marker types.
> An alternative could be to use decorators (eg. @public like mentioned in another message), as long as we don't confuse "public = part of the public API of the library" with "public = not private to this particular module".
I don't think __all__ helps either but for different reasons. Python
makes everything implicitly public. What is actually needed is a way
to clearly mark the internal code (which is most of the code in a
large project) as being obviously private.
That's why I prefer the Jax approach of putting the implementation in
a jax._src package. That's a clear sign to all about what is private
meaning that all contributors and users can clearly see what is
internal to the codebase. If someone submits a patch to project B that
does from "projA._src.x.y import z" then it's clear to anyone
reviewing that patch what it means. They can do that if they want on a
consenting adults basis but there will be no ambiguity about the
privateness of that API if a refactor in project A causes B to break.
The _src approach means that project A can freely refactor its
internals including deleting, renaming, merging and splitting modules,
making a module into a package etc, without worrying about leaving
left over dummy modules or deprecation warnings. Anyone reviewing a
patch for project A can clearly see when it is and when it is not
allowed to make these kinds of changes. Within projA._src it could be
agreed that a leading underscore indicates something like "internal to
the module" but *everything* in _src is clearly internal to the
Project A can organise everything outside of _src into modules
according to what makes sense for users and for the organisation of
the documentation rather than what makes sense for the implementation.
The _src package can be cleanly separated from everything else in
automatically generated documentation. It's much better if the
top-level of the docs has clearly separate links for the public API
and the internal development docs. This also makes it clear what sort
of information should go in the different parts of the docs which
would be very different for jax._src.x.y compared to jax.x.y.
In Jax every public module jax.x.y just seems to do "from jax._src.x.y
import z, t" but it's also possible to actually put the top-level
function there in the module like:
Big module docstring for jax.x.y. This is for someone doing help in
the repl. The proper web docs are in an rst file somewhere else.
from jax._src.x.y import _do_stuff
__all__ = ['do_stuff']
Big do_stuff docstring
That way you've made a module that is entirely about defining and
documenting a public API. The do_stuff function shows as being from
this module and no automated analysis/introspection tools would get
confused about that. There could be a minimum of high-level code in
do_stuff e.g. for dispatching to different low-level routines or
checking arguments but anything more should go in _src.
Message archived at https://mail.python.org/archives/list/python...@python.org/message/YCID7L6PKRBTSB4QMNX5PGGBLF2IDWZJ/