TypeError: unsupported operand type(s) for +: 'int' and 'str'

Peder Jakobsen

unread,

Oct 31, 2013, 11:50:16 AM10/31/13

to datab...@googlegroups.com

Hi, I'm in the process of learning bubbles. I suppose the logical place to start is to start with the examples/hello.py script, which works nicely. Then I substitute my own CSV file and define a few files;

import bubbles

FILE = "cida.csv"

p = bubbles.Pipeline()

p.source_object("csv_source", resource=FILE, infer_fields=True)

p.aggregate("Title", "Country")

p.pretty_print()

p.run()

That produces the following error:

....

File "/Users/peder/source/bubbles/bubbles/ops/rows.py", line 337, in agg_sum

return a+value

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Is it just me, or is bubbles documentation still a bit vague? I have tried to step through the source code, but I find it a bit intimidating. Perhaps I should start with the brewery tool to get some context of where bubbles came from?

Adrian Klaver

unread,

Oct 31, 2013, 12:10:16 PM10/31/13

to datab...@googlegroups.com

On 10/31/2013 08:50 AM, Peder Jakobsen wrote:
> Hi, I'm in the process of learning bubbles. I suppose the logical place
> to start is to start with the examples/hello.py script, which works
> nicely. Then I substitute my own CSV file and define a few files;
>
> import bubbles
>
> FILE = "cida.csv"
>
> p = bubbles.Pipeline()
>
> p.source_object("csv_source", resource=FILE, infer_fields=True)
>
> p.aggregate("Title", "Country")
>
> p.pretty_print()
>
> p.run()
>
> That produces the following error:
>
> ....
>
> File "/Users/peder/source/bubbles/bubbles/ops/rows.py", line 337, in
> agg_sum
>
> return a+value
>
> TypeError: unsupported operand type(s) for +: 'int' and 'str'

You do not show the entire traceback, but this seems to be a more
generic problem. In other words in your data you have string and integer
types and via the aggregrate function you are trying to add them which
cannot be done. Some type casting is in order.

>
> Is it just me, or is bubbles documentation still a bit vague? I have
> tried to step through the source code, but I find it a bit intimidating.
> Perhaps I should start with the brewery tool to get some context of
> where bubbles came from?
>

> --

--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,

Oct 31, 2013, 12:25:46 PM10/31/13

to datab...@googlegroups.com

On Oct 31, 2013, at 12:10 PM, Adrian Klaver <adrian...@gmail.com> wrote:

You do not show the entire traceback, but this seems to be a more generic problem. In other words in your data you have string and integer types and via the aggregrate function you are trying to add them which cannot be done. Some type casting is in order.

Right, I see what you mean. My “numerical" fields are currency strings "$ 20,448,002”. How can I supply some type of a filter to change these while they are processed?

Thanks,

Peder Jakobsen

Adrian Klaver

unread,

Oct 31, 2013, 12:33:17 PM10/31/13

to datab...@googlegroups.com

On 10/31/2013 09:25 AM, Peder Jakobsen wrote:
>
> On Oct 31, 2013, at 12:10 PM, Adrian Klaver <adrian...@gmail.com

> <mailto:adrian...@gmail.com>> wrote:
>
>> You do not show the entire traceback, but this seems to be a more
>> generic problem. In other words in your data you have string and
>> integer types and via the aggregrate function you are trying to add
>> them which cannot be done. Some type casting is in order.
>

> Right, I see what you mean. My ï¿½numerical" fields are currency
> strings "$ 20,448,002ï¿½. How can I supply some type of a filter to

> change these while they are processed?

I see two ways of handling this.

1) At the data source, have it output number instead of string if possible.

2) Regex. Google python regex currency string

>
> Thanks,
>
> Peder Jakobsen
>
> --
> You received this message because you are subscribed to the Google
> Groups "databrewery" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to databrewery...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,

Oct 31, 2013, 12:43:05 PM10/31/13

to datab...@googlegroups.com

On Oct 31, 2013, at 12:33 PM, Adrian Klaver <adrian...@gmail.com> wrote:

I see two ways of handling this.

1) At the data source, have it output number instead of string if possible.

I don’t control this unfortunately, i just download the CSV file as supplied on a website.

2) Regex. Google python regex currency string

Yes, there are many ways to convert this. I’m just trying to get a handle on what bubbles is for. I seem to lack some context, it all seem rather mysterious. Since is quickly and easy to manipulate cvs files in python and put things into a database, what is the use case for using bubbles. And if it’s not set up to handle corner cases like mine, how do you know when bubbles is the right tool to use?

Peder

Adrian Klaver

unread,

Oct 31, 2013, 12:50:45 PM10/31/13

to datab...@googlegroups.com

On 10/31/2013 09:43 AM, Peder Jakobsen wrote:
>
> On Oct 31, 2013, at 12:33 PM, Adrian Klaver <adrian...@gmail.com

> <mailto:adrian...@gmail.com>> wrote:
>
>> I see two ways of handling this.
>>
>> 1) At the data source, have it output number instead of string if
>> possible.
>

> I donï¿½t control this unfortunately, i just download the CSV file as

> supplied on a website.
>
>>
>> 2) Regex. Google python regex currency string
>

> Yes, there are many ways to convert this. Iï¿½m just trying to get a

> handle on what bubbles is for. I seem to lack some context, it all seem
> rather mysterious. Since is quickly and easy to manipulate cvs files
> in python and put things into a database, what is the use case for

> using bubbles. And if itï¿½s not set up to handle corner cases like

> mine, how do you know when bubbles is the right tool to use?

First there is no quick and easy way to manipulate CSV files:) CSV is
more a concept than a firm promise, hence the rise of XML, JSON and YAML
to name a few. DataBrewery/Bubbles can also pull from SQL sources as
well as push to them. The use case is what you are running into,
transforming data from one source and putting into another. So the
process would be pull from source_1 --> transform(may be multiple
transforms) --> source_2 or to screen.

> Peder
>

--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,

Oct 31, 2013, 12:53:28 PM10/31/13

to datab...@googlegroups.com

On Oct 31, 2013, at 12:50 PM, Adrian Klaver <adrian...@gmail.com> wrote:

First there is no quick and easy way to manipulate CSV files:) CSV is more a concept than a firm promise, hence the rise of XML, JSON and YAML to name a few. DataBrewery/Bubbles can also pull from SQL sources as well as push to them. The use case is what you are running into, transforming data from one source and putting into another. So the process would be pull from source_1 --> transform(may be multiple transforms) --> source_2 or to screen.

Yes, the standard ETL use case. But it requires that one is able to intercept and manipulate data anywhere in the process, otherwise the system would be too generic, no?

Adrian Klaver

unread,

Oct 31, 2013, 1:01:00 PM10/31/13

to datab...@googlegroups.com

On 10/31/2013 09:53 AM, Peder Jakobsen wrote:
>
> On Oct 31, 2013, at 12:50 PM, Adrian Klaver <adrian...@gmail.com

You can, I would suggest taking a look at slide show here:

http://blog.databrewery.org/

Introducing Bubbles ï¿½ virtual data objects framework

--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,

Oct 31, 2013, 1:06:52 PM10/31/13

to datab...@googlegroups.com

On Oct 31, 2013, at 1:01 PM, Adrian Klaver <adrian...@gmail.com> wrote:

http://blog.databrewery.org/

Introducing Bubbles – virtual data objects framework

I’ve looked at that slide show several times, but I didn’t really understand it, again it’s really broad strokes, but I lack the context to know what the strokes are for.

I’ll look more closely at the example code and see if I can make sense of it. Once I do, I will perhaps add some documentation and issue a pull request.

I’m trying to avoid using Java/ Kettle for ETL because it’s overkill.

Peder

Adrian Klaver

unread,

Oct 31, 2013, 1:27:48 PM10/31/13

to datab...@googlegroups.com

On 10/31/2013 10:06 AM, Peder Jakobsen wrote:
>
> On Oct 31, 2013, at 1:01 PM, Adrian Klaver <adrian...@gmail.com

> <mailto:adrian...@gmail.com>> wrote:
>
>>
>> http://blog.databrewery.org/
>>

>> Introducing Bubbles ï¿½ virtual data objects framework
>

> Iï¿½ve looked at that slide show several times, but I didnï¿½t really
> understand it, again itï¿½s really broad strokes, but I lack the context

> to know what the strokes are for.
>

> Iï¿½ll look more closely at the example code and see if I can make sense

> of it. Once I do, I will perhaps add some documentation and issue a
> pull request.
>

> Iï¿½m trying to avoid using Java/ Kettle for ETL because itï¿½s overkill.

To follow up on my previous post, you pass encoding as a parameter:

p.source_object("csv_source", resource=FILE, encoding="UTF8",
infer_fields=True)

Reply all

Reply to author

Forward