Brewery runner tool

21 views

Skip to first unread message

Stefan Urbanek

unread,

Mar 14, 2012, 4:40:31 PM3/14/12

to datab...@googlegroups.com

Hi,

I've started (re)writing the Brewery runner tool 'brewery'. Currently only one command 'run' is implemented and takes single argument: a .json file with stream network description. Example:

{

"label": "Basic Data Audit",

"description": "Provides basic data statistics, such as completeness",

"nodes" : {

"src": {

"type": "csv_source",

"resource": "https://raw.github.com/Stiivi/cubes/master/examples/hello_world/data.csv"

"audit": {

"type": "audit"

"target": {

"type": "csv_target",

"resource": "output.csv"

}

"connections": [

["src", "audit"],

["audit", "target"]

]

}

(see https://gist.github.com/2039231)

You can replace "resource" with filename or URL (for source only at this moment).

Currently only two top-level json keys are used: "nodes" and "connections". Where "nodes" is a dictionary of name -> node info and "connections" is a list of [source, target] node names.

You can run it as:

brewery run example.json

or directly an URL:

brewery run https://raw.github.com/gist/2039231/0b4db19335eb5c22a989519250ec2e6f5eae8a71/example.json

More to come:

* allow command-line configuration of some node parameters

* allow configuration parameters be taken from another json, example:

brewery run -p step1.json stream.json

brewery run -p step2.json stream.json

What do you think?

Regards,

Stefan Urbanek