bcolz, a column store for Python, 1.1.2 released

27 views

Skip to first unread message

Francesc Alted

unread,

Feb 10, 2017, 3:51:07 AM2/10/17

to Bcolz, pyd...@googlegroups.com, python-ann...@python.org

======================

Announcing bcolz 1.1.2

======================

What's new

==========

This is a maintenance release that brings quite a lot of improvements.

Here are the highlights:

- Zstd is a supported codec now. Fixes #331.

- C-Blosc updated to 1.11.2.

- Added a new `defaults_ctx` context so that users can select defaults

easily without changing global behaviour. For example::

with bcolz.defaults_ctx(vm="python", cparams=bcolz.cparams(clevel=0)):

cout = bcolz.eval("(x + 1) < 0")

- Fixed a crash occurring in `ctable.todataframe()` when both `columns`

and `orient='columns'` were specified. PR #311. Thanks to Peter

Quackenbush.

- Use `pkg_resources.parse_version()` to test for version of packages.

Fixes #322 (PY27 bcolz with dask unicode error).

- New package recipe for conda-forge. Now you can install bcolz with: `conda install -c conda-forge bcolz`. Thanks to Alistair Miles.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

For some comparison between bcolz and other compressed data containers,

see:

https://github.com/FrancescAlted/DataContainersTutorials

specially chapters 3 (in-memory containers) and 4 (on-disk containers).

What it is

==========

*bcolz* provides **columnar and compressed** data containers that can

live either on-disk or in-memory. The compression is carried out

transparently by Blosc, an ultra fast meta-compressor that is optimized

for binary data. Compression is active by default.

Column storage allows for efficiently querying tables with a large

number of columns. It also allows for cheap addition and removal of

columns. Lastly, high-performance iterators (like ``iter()``,

``where()``) for querying the objects are provided.

bcolz can use diffent backends internally (currently numexpr,

Python/NumPy or dask) so as to accelerate many vector and query

operations (although it can use pure NumPy for doing so too). Moreover,

since the carray/ctable containers can be disk-based, it is possible to

use them for seamlessly performing out-of-memory computations.

While NumPy is used as the standard way to feed and retrieve data from

bcolz internal containers, but it also comes with support for

high-performance import/export facilities to/from `HDF5/PyTables tables

<http://www.pytables.org>`_ and `pandas dataframes

<http://pandas.pydata.org>`_.

Have a look at how bcolz and the Blosc compressor, are making a better

use of the memory without an important overhead, at least for some real

scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

bcolz has minimal dependencies (NumPy is the only strict requisite),

comes with an exhaustive test suite, and it is meant to be used in

production. Example users of bcolz are Visualfabriq

(http://www.visualfabriq.com/), Quantopian (https://www.quantopian.com/)

and scikit-allel:

* Visualfabriq:

* *bquery*, A query and aggregation framework for Bcolz:

* https://github.com/visualfabriq/bquery

* Quantopian:

* Using compressed data containers for faster backtesting at scale:

* https://quantopian.github.io/talks/NeedForSpeed/slides.html

* scikit-allel:

* Exploratory analysis of large scale genetic variation data.

* https://github.com/cggh/scikit-allel

Resources

=========

Visit the main bcolz site repository at:

http://github.com/Blosc/bcolz

Manual:

http://bcolz.blosc.org

Home of Blosc compressor:

http://blosc.org

User's mail list:

bc...@googlegroups.com

http://groups.google.com/group/bcolz

License is the new BSD:

https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

----

**Enjoy data!**

Francesc Alted

Reply all

Reply to author

Forward

0 new messages