State of PyTables

16 views
Skip to first unread message

Miroslav Šedivý

unread,
Dec 29, 2020, 2:29:45 PM12/29/20
to pytables-dev
Hi PyTables developers,

happy to be here again and looking forward to contributing more to
PyTables! While I have touched some non-destructive topics until now,
there are some upcoming issues that I'd like to understand before I can
contribute any further:


1. What are the supported Python versions? The current living Python
versions are 3.6–3.9 and they are configured in the CI. In #838 I tried
to drop the Python 3.5 (after EOL) support and Antonio correctly asked
to deprecate it properly first. That means, that right now we're at
Python 3.5–3.9 and the code snippet [1] with 3.4 in mind is just
outdated and can be fixed?

[1] https://github.com/PyTables/PyTables/blob/master/setup.py#L99-L102

Can we plan for dropping 3.5 support in the near future with proper
announcement of the last PyTables version supporting it?


2. In the same PR I touched files in the `c-blosc` directory and the
`cpuinfo.py` file, but Antonio told me they were imported (and probably
`hdf-blosc` as well). While Francesc imports the `c-blosc` version
frequently, the `cpuinfo.py` file get sporadic updates. What is exactly
the reason that these two projects are not imported the same way
`numpy` or `numexpr` are? With proper version pinning, etc. Wouldn't it
be easier to maintain if the repository contained only its own files
and depended on the rest?


More points will follow later, maybe next year :-)

Miro

Antonio Valentino

unread,
Dec 29, 2020, 3:52:33 PM12/29/20
to pytabl...@googlegroups.com
Hi Miroslav,

I have just sent you the invitation to the PyTables mintainers team on
github.

Il 29/12/20 20:29, Miroslav Šedivý ha scritto:
> Hi PyTables developers,
>
> happy to be here again and looking forward to contributing more to
> PyTables! While I have touched some non-destructive topics until now,
> there are some upcoming issues that I'd like to understand before I can
> contribute any further:
>
>
> 1. What are the supported Python versions? The current living Python
> versions are 3.6–3.9 and they are configured in the CI. In #838 I tried
> to drop the Python 3.5 (after EOL) support and Antonio correctly asked
> to deprecate it properly first. That means, that right now we're at
> Python 3.5–3.9 and the code snippet [1] with 3.4 in mind is just
> outdated and can be fixed?
>
> [1] https://github.com/PyTables/PyTables/blob/master/setup.py#L99-L102

In my opinion PyTables shall support only Python versions (and the same
holds for numpy, hdf5, etc) that have not reached the "end-of-life".
Some specific exception can be eventually considered if we intend to
support some specific linux distribution (Debian stable, Ubuntu x.y LTS
or CentOS).

Also I don't think we need a deprecation period, just we shell ensure
that the PyTables code and documentation in the master branch is updated
consistently:

* checks in the code
* PyPI classifiers
* README.rst, installation.rst and release notes
* python_requires in the setup.py
* etc.


> Can we plan for dropping 3.5 support in the near future with proper
> announcement of the last PyTables version supporting it?


To me we can drop it immediately, in principle, as soon as we do it
properly (see above).

A question: which are features of Python 3.6 and above that you think
could be interesting to use in PyTables?


> 2. In the same PR I touched files in the `c-blosc` directory and the
> `cpuinfo.py` file, but Antonio told me they were imported (and probably
> `hdf-blosc` as well). While Francesc imports the `c-blosc` version
> frequently, the `cpuinfo.py` file get sporadic updates. What is exactly
> the reason that these two projects are not imported the same way
> `numpy` or `numexpr` are? With proper version pinning, etc. Wouldn't it
> be easier to maintain if the repository contained only its own files
> and depended on the rest?

Regarding c-blosc and hdf-blosc they contain libraries that supposedly
cannot be installed via pip.

If I understand correctly this is no longer true for c-blosc.

They are bundled with PyTable to try to make it easier for the final
user to build and install PyTables for sources.
I think this has always been a very critical point and the complexity of
the setup script is indicative in this sense.

For c-blosc we have a subtree-merge-blosc.sh script.
For hdf-blosc and cpuinfo we could also consider to use git submodule.

In any case I agree with you that we have care of updating al external
code regularly and that new PyTables releases should bundle the latest
available version of th external code.



kind regards

--
Antonio Valentino

Tom Kooij

unread,
Dec 29, 2020, 3:56:37 PM12/29/20
to pytabl...@googlegroups.com
My 2 cts about python versions: I does not make sense to keep supporting python versions that NumPy no longer supports, so following NumPY NEP 29 (https://numpy.org/neps/nep-0029-deprecation_policy.html) seems reasonable to me.

Sincerely,
Tom Kooij


Op di 29 dec. 2020 om 20:29 schreef Miroslav Šedivý <mi...@sedivy.de>:
--
You received this message because you are subscribed to the Google Groups "pytables-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pytables-dev/20201229202938.55a4a173%40ins.

Miroslav Šedivý

unread,
Dec 29, 2020, 4:44:53 PM12/29/20
to pytabl...@googlegroups.com
> I have just sent you the invitation to the PyTables mintainers team
> on github.

Thank you, Antonio. I will do my best! This means setting labels and
milestones in PR/issues as well. :-)



> https://numpy.org/neps/nep-0029-deprecation_policy.html

Thank you, Tom, for the link. I didn't know about it before. According
to that list they should be at 3.7–3.9, but on PyPI the current numpy
1.19.4 supports 3.6–3.9. I am fine with the living Python versions
(3.6+) at the moment. PyTables has still some older hacks hidden
somewhere that have to be found and fixed.




> * checks in the code
> * PyPI classifiers
> * README.rst, installation.rst and release notes
> * python_requires in the setup.py
> * etc.

So that's where we're with Python 3.5 after my today's #848. As
the next step (after the 3.6.2 release that can be tagged as the last
Python 3.5 version) I would advance it to Python 3.6 which has EOL in
December 2021.



> A question: which are features of Python 3.6 and above that you think
> could be interesting to use in PyTables?

Starting with Python 3.6, typing will be interesting, at least for
public API. And f-strings, to make the code more compact. Oh, and
underscores in numbers 10_000_000 instead of 10000000, which is just
nice to have.

In the further future I'd also like to investigate whether
pathlib.PurePosixPath could be used as an alternative for the
hierarchical structure API.
Anyway, pathlib.Path should be already used for standard directory/file
operations.



Regarding the imported repos: I'll have to learn how to deal with
c-blosc and hdf-blosc, but I believe cpuinfo is pure Python and can be
just installed via pip. I'll look into it later.



Thank you, it's exciting to be part of such a team!
Miro


Antonio Valentino

unread,
Dec 30, 2020, 2:15:43 AM12/30/20
to pytabl...@googlegroups.com
Hi Miroslav,

Il 29/12/20 22:41, Miroslav Šedivý ha scritto:

[CUT]

>> * checks in the code
>> * PyPI classifiers
>> * README.rst, installation.rst and release notes
>> * python_requires in the setup.py
>> * etc.
>
> So that's where we're with Python 3.5 after my today's #848. As
> the next step (after the 3.6.2 release that can be tagged as the last
> Python 3.5 version) I would advance it to Python 3.6 which has EOL in
> December 2021.

OK, I'm fine with it, but for me it is also OK to drop 3.5 without
waiting for the next release.

>> A question: which are features of Python 3.6 and above that you think
>> could be interesting to use in PyTables?
>
> Starting with Python 3.6, typing will be interesting, at least for
> public API. And f-strings, to make the code more compact. Oh, and
> underscores in numbers 10_000_000 instead of 10000000, which is just
> nice to have.

having type annotations would be very nice.
Do you know some tool to add type annotations in a (semi-)automatic way?
At some point I heard about this kind of tools but I have never used them.


[CUT]

> Regarding the imported repos: I'll have to learn how to deal with
> c-blosc and hdf-blosc, but I believe cpuinfo is pure Python and can be
> just installed via pip. I'll look into it later.

The point is that cpuinfo is not a dependency of PyTables itself, it is
only used in the setup.py script.

The problem is that, with the current status of the setup machinery, pip
needs to execute the setup,py script to know what are PyTables
dependencies but cpuinfo must be there when you run the setup.py script.

If I remember well, if cpuinfo is not found the setup.py can go on
without it but, at this point setup.py is no longer able to determine
optimal compilation flags.

An option would be to use a pyproject.toml file (and PEP 518) [1], but
we have to be very careful because it is easy to break things.



[1] https://snarky.ca/what-the-heck-is-pyproject-toml


> Thank you, it's exciting to be part of such a team!
> Miro

you are very welcome


ciao

--
Antonio Valentino

Francesc Alted

unread,
Dec 30, 2020, 6:14:23 AM12/30/20
to pytabl...@googlegroups.com
On Wed, Dec 30, 2020 at 8:15 AM Antonio Valentino <antonio....@tiscali.it> wrote:

The point is that cpuinfo is not a dependency of PyTables itself, it is
only used in the setup.py script.

The problem is that, with the current status of the setup machinery, pip
needs to execute the setup,py script to know what are PyTables
dependencies but cpuinfo must be there when you run the setup.py script.

If I remember well, if cpuinfo is not found the setup.py can go on
without it but, at this point setup.py is no longer able to determine
optimal compilation flags.

An option would be to use a pyproject.toml file (and PEP 518) [1], but
we have to be very careful because it is easy to break things.



[1] https://snarky.ca/what-the-heck-is-pyproject-toml

Hey, for what is worth, we are using pyproject.toml for the scikit-build dependency (needed by setup.py) in python-blosc (https://github.com/Blosc/python-blosc/blob/master/pyproject.toml) and it is working great.  Maybe cpuinfo could also be handled similarly.

--
Francesc Alted

Antonio Valentino

unread,
Dec 30, 2020, 6:35:28 AM12/30/20
to pytabl...@googlegroups.com
Hi Francesc,

Il 30/12/20 12:14, Francesc Alted ha scritto:
yes, indeed.

The only drawback is that the user needs to have a relatively updated
version of setuptools and pip, bu nowadays with venv pip and conda IMHO
it is not a big issue.

There are some points that I have still not investigated:

* which is the impact of building in an isolated environment?
are we forced to do it?
* how hard is to pass additional arguments to the setup.py script?
(we have several custom command line options and some of them could
require quotes)

IMHO none of them is critical but I would like to check.

cheers

--
Antonio Valentino
Reply all
Reply to author
Forward
0 new messages