I'm glad to announce new release of Brewery – stream based data auditing and analysis framework for Python.
There are quite a few updates, to mention the notable ones:
* new nodes: *pretty printer* node (for your terminal pleasure), *generator function* node
* many CSV updates and fixes
Added several simple how-to examples [1], such as:
aggregation of remote CSV, basic audit of a CSV, how to use a generator
function.
Note that there are couple changes that break compatibility, however they can
be updated very easily. I apologize for the inconvenience, but until 1.0 the
changes might happen more frequently. On the other hand, I will try to make
them as painless as possible. Feedback and questions are welcome. I'll help you.
Full listing of news, changes and fixes is below.
NEWS
* Changed license to MIT
* Created new brewery runner commands: 'run' and 'graph':
* 'brewery run stream.json' will execute the stream
* 'brewery graph stream.json' will generate graphviz data
* Nodes: Added pretty printer node - textual output as a formatted table
* Nodes: Added source node for a generator function
* Nodes: added analytical type to derive field node
* Preliminary implementation of data probes (just concept, API not decided yet
for 100%)
* CSV: added empty_as_null option to read empty strings as Null values
* Nodes can be configured with node.configure(dictionary, protected). If
'protected' is True, then protected attributes (specified in node info) can
not be set with this method.
* added node identifier to the node reference doc
* added create_logger
* added experimental retype feature (works for CSV only at the moment)
* Mongo Backend - better handling of record iteration
CHANGES
* CSV: resource is now explicitly named argument in CSV*Node
* CSV: convert fields according to field storage type (instead of all-strings)
* Removed fields getter/setter (now implementation is totally up to stream
subclass)
* AggregateNode: rename ``aggregates`` to ``measures``, added ``measures`` as
public node attribute
* moved errors to brewery.common
* removed ``field_name()``, now str(field) should be used
* use named blogger 'brewery' instead of the global one
* better debug-log labels for nodes (node type identifier + python object ID)
**WARNING:** Compatibility break:
* depreciate ``__node_info__`` and use plain ``node_info`` instead
* ``Stream.update()`` now takes nodes and connections as two separate arguments
FIXES
* added SQLSourceNode, added option to keep ifelds instead of dropping them in
FieldMap and FieldMapNode (patch by laurentvasseur @ bitbucket)
* better traceback handling on node failure (now actually the traceback is
displayed)
* return list of field names as string representation of FieldList
* CSV: fixed output of zero numeric value in CSV (was empty string)
LINKS
If you have any questions, comments, requests, do not hesitate to ask.