Hi,
I've started (re)writing the Brewery runner tool 'brewery'. Currently only one command 'run' is implemented and takes single argument: a .json file with stream network description. Example:
{
"label": "Basic Data Audit",
"description": "Provides basic data statistics, such as completeness",
"nodes" : {
"src": {
"type": "csv_source",
},
"audit": {
"type": "audit"
},
"target": {
"type": "csv_target",
"resource": "output.csv"
}
},
"connections": [
["src", "audit"],
["audit", "target"]
]
}
You can replace "resource" with filename or URL (for source only at this moment).
Currently only two top-level json keys are used: "nodes" and "connections". Where "nodes" is a dictionary of name -> node info and "connections" is a list of [source, target] node names.
You can run it as:
brewery run example.json
or directly an URL:
More to come:
* allow command-line configuration of some node parameters
* allow configuration parameters be taken from another json, example:
brewery run -p step1.json stream.json
brewery run -p step2.json stream.json
What do you think?
Regards,
Stefan Urbanek
data analyst and data brewmaster