ANN: pandaSDMX 0.4 released

18 views
Skip to first unread message

Dr. Leo

unread,
Apr 12, 2016, 9:55:41 AM4/12/16
to pyd...@googlegroups.com, sdmx-...@googlegroups.com
Hi,

I am proud to announce the release of pandaSDMX 0.4, a tool to retrieve
and process statistical data and metadata from insstitutions supporting
the SDMX standard (www.sdmx.org). It is a major feature release. All
users are encouraged to upgrade to this version.

The big idea of this release is to use pandas DataFrames to display and
search not just datasets, but also metadata such as code-lists, dataflow
definitions and category-schemes. This makes it super-easy to retrieve
the datasets you need and to understand the meaning of codes used in
attributes and dimensions. Up to now users had to write loops iterating
over code-lists or use a rudimentary text search function. As code-lists
may be very large - the longest I have seen has more than 50,000 codes -
this could be an uphill battle. The ability to export metadata to pandas
greatly flattens the learning curve as in most scenarios users need not
care at all about the information model API.

Other new features include odo support for fancy data conversions, and
out-of-the-box downloads from INSEE, the French national statistics
institute.

Install it from PyPI via
>>> pip install pandasdmx

or see the extensive documentation at
http://pandasdmx.readthedocs.org/en/v0.4/

A more comprehensive list of changes is set out below:

v0.4 (2016-04-11)
-----------------------

New features
::::::::::::::

* add new provider INSEE, the French statistics office (thanks to
Stéphan Rault)
* register '.sdmx' files with `Odo <odo.readthedocs.org/>`_ if available
* logging of http requests and file operations.
* new structure2pd writer to export codelists, dataflow-definitions and
other
structural metadata from structure messages
as multi-indexed pandas DataFrames. Desired attributes can be
specified and are
represented by columns.

API changes
:::::::::::::

* :class:`pandasdmx.api.Request` constructor accepts a ``log_level``
keyword argument which can be set
to a log-level for the pandasdmx logger and its children (currently
only pandasdmx.api)
* :class:`pandasdmx.api.Request` now has a ``timeout`` property to set
the timeout for http requests
* extend api.Request._agencies configuration to specify agency- and
resource-specific
settings such as headers. Future versions may exploit this to provide
reader selection information.
* api.Request.get: specify http_headers per request. Defaults are set
according to agency configuration
* Response instances expose Message attributes to make application code
more succinct
* rename :class:`pandasdmx.api.Message` attributes to singular form
Old names are deprecated and will be removed in the future.
* :class:`pandasdmx.api.Request` exposes resource names such as data,
datastructure, dataflow etc.
as descriptors calling 'get' without specifying the resource type as
string.
In interactive environments, this
saves typing and enables code completion.
* data2pd writer: return attributes as namedtuples rather than dict
* use patched version of namedtuple that accepts non-identifier strings
as field names and makes all fields accessible through dict syntax.
* remove GenericDataSet and GenericDataMessage. Use DataSet and
DataMessage instead
* sdmxml reader: return strings or unicode strings instead of LXML smart
strings
* sdmxml reader: remove most of the specialized read methods.
Adapt model to use generalized methods. This makes code more
maintainable.
* :class:`pandasdmx.model.Representation` for DSD attributes and
dimensions now supports text
not just codelists.

Other changes and enhancements
::::::::::::::::::::::::::::::::::

* documentation has been overhauled. Code examples are now much simpler
thanks to
the new structure2pd writer
* testing: switch from nose to py.test
* improve packaging. Include tests in sdist only
* numerous bug fixes

Cheers,

Leo
Reply all
Reply to author
Forward
0 new messages