import bubbles
FILE = "cida.csv"
p = bubbles.Pipeline()
p.source_object("csv_source", resource=FILE, infer_fields=True)
p.aggregate("Title", "Country")
p.pretty_print()
p.run()
That produces the following error:
....
File "/Users/peder/source/bubbles/bubbles/ops/rows.py", line 337, in agg_sum
return a+value
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Is it just me, or is bubbles documentation still a bit vague? I have tried to step through the source code, but I find it a bit intimidating. Perhaps I should start with the brewery tool to get some context of where bubbles came from?
You do not show the entire traceback, but this seems to be a more generic problem. In other words in your data you have string and integer types and via the aggregrate function you are trying to add them which cannot be done. Some type casting is in order.
I see two ways of handling this.
1) At the data source, have it output number instead of string if possible.
2) Regex. Google python regex currency string
First there is no quick and easy way to manipulate CSV files:) CSV is more a concept than a firm promise, hence the rise of XML, JSON and YAML to name a few. DataBrewery/Bubbles can also pull from SQL sources as well as push to them. The use case is what you are running into, transforming data from one source and putting into another. So the process would be pull from source_1 --> transform(may be multiple transforms) --> source_2 or to screen.
http://blog.databrewery.org/
Introducing Bubbles – virtual data objects framework