Blog: Data Streaming Basics in Brewery

178 views
Skip to first unread message

Stefan Urbanek

unread,
Apr 13, 2012, 9:03:41 AM4/13/12
to datab...@googlegroups.com
Hi,

I've posted a new Data Brewery blog post about stream basics - how to create streams in different ways:


Contents:

* ways of running streams
* command line 'pipe' and 'run'
* higher order messaging construction and basic (raw) construction

Some future streaming thoughts are included at the end.

Enjoy,

Stefan Urbanek
data analyst and data brewmaster

Twitter: @Stiivi



Peder Jakobsen

unread,
Oct 31, 2013, 12:32:15 PM10/31/13
to datab...@googlegroups.com
I'm working through the blog post using a csv file supplied by my client.  Sorry about the long stack trace, but here's what happens:


brewery pipe csv_source resource=projects.csv audit pretty_printer
+-----------------------------+------------+----------+-----------------+------------------+--------------+
|field_name                   |record_count|null_count|null_record_ratio|empty_string_count|distinct_count|
+-----------------------------+------------+----------+-----------------+------------------+--------------+
|Project Number               |           0|         0|                0|                 0|             0|
|Date Modified                |           0|         0|                0|                 0|             0|
|Title                        |           0|         0|                0|                 0|             0|
|Description                  |           0|         0|                0|                 0|             0|
|Status                       |           0|         0|                0|                 0|             0|
|Start                        |           0|         0|                0|                 0|             0|
|End                          |           0|         0|                0|                 0|             0|
|Country                      |           0|         0|                0|                 0|             0|
|Executing Agency - Partner   |           0|         0|                0|                 0|             0|
|CIDA Sector of Focus         |           0|         0|                0|                 0|             0|
|DAC Sector                   |           0|         0|                0|                 0|             0|
|Maximum CIDA Contribution    |           0|         0|                0|                 0|             0|
|Expected Results             |           0|         0|                0|                 0|             0|
|Progress and Results Achieved|           0|         0|                0|                 0|             0|
+-----------------------------+------------+----------+-----------------+------------------+--------------+
Traceback (most recent call last):
  File "/Users/peder/Envs/brewery/bin/brewery", line 5, in <module>
    pkg_resources.run_script('brewery==0.8.0', 'brewery')
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/pkg_resources.py", line 540, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/pkg_resources.py", line 1455, in run_script
    execfile(script_filename, namespace, namespace)
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/EGG-INFO/scripts/brewery", line 264, in <module>
    args.func(args)
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/EGG-INFO/scripts/brewery", line 208, in run_pipe
    stream.run()
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/brewery/streams.py", line 418, in run
    self._run()
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/brewery/streams.py", line 456, in _run
    raise self.exceptions[0]
brewery.common.StreamRuntimeError: stream failed. reason:
exception: UnicodeDecodeError:
node: <brewery.nodes.source_nodes.CSVSourceNode object at 0x103328a10>
'ascii' codec can't decode byte 0xe2 in position 1084: ordinal not in range(128)
traceback
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/brewery/streams.py", line 533, in run
    self.node.run()
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/brewery/nodes/source_nodes.py", line 214, in run
    for row in self.stream.rows():
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/brewery/ds/csv_streams.py", line 59, in next
    row = self.reader.next()
  File "/Users/peder/Envs/brewery/lib/python2.7/site-packages/brewery-0.8.0-py2.7.egg/brewery/ds/csv_streams.py", line 26, in next
    return self.reader.next().encode('utf-8')

input: none
ouput: none
attributes:
    resource: projects.csv
    skip_rows: <attribute skip_rows does not exist>
    encoding: <attribute encoding does not exist>
    fields: None
    read_header: <attribute read_header does not exist>
    delimiter: <attribute delimiter does not exist>
    quotechar: <attribute quotechar does not exist>

Adrian Klaver

unread,
Oct 31, 2013, 12:41:32 PM10/31/13
to datab...@googlegroups.com
On 10/31/2013 09:32 AM, Peder Jakobsen wrote:
> I'm working through the blog post using a csv file supplied by my
> client. Sorry about the long stack trace, but here's what happens:
>


> brewery.common.StreamRuntimeError: stream failed. reason:
> exception: UnicodeDecodeError:
> node: <brewery.nodes.source_nodes.CSVSourceNode object at 0x103328a10>
> 'ascii' codec can't decode byte 0xe2 in position 1084: ordinal not in
> range(128)

It thinks it is working with ASCII data and it is not. Not sure how to
deal with this, please stand by:)

> traceback
> File



--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,
Nov 5, 2013, 12:32:03 PM11/5/13
to datab...@googlegroups.com
I would like to work with brewery, but I'm still stuck on this problem of it choking on my CSV file due to ascii characters.   Funny thing is, it doesn't happen with bubbles with the same file. 

Peder Jakobsen

unread,
Nov 5, 2013, 3:37:39 PM11/5/13
to datab...@googlegroups.com
Solution was simple:  

When opening a file, add encoding="UTF8" parameter:

    URL = "projects.csv"

    b = brewery.create_builder()
    b.csv_source(URL,encoding="UTF8")
    ....
Reply all
Reply to author
Forward
0 new messages