target with only subsection of source fields

12 views
Skip to first unread message

Peder Jakobsen

unread,
Nov 1, 2013, 6:53:15 PM11/1/13
to datab...@googlegroups.com
Hi, 

I have a CSV file with about 20 columns, but I'm only interested in 3 of them.

I tried:

fields = bubbles.FieldList(
            ["Project Number", "string"], 
            ["Description", "string"],
            ["Country", "string"]
            
        )
        
p = bubbles.Pipeline(stores=stores)
p.source_object("csv_source", resource=URL,fields=fields, infer_fields=False)

p.pretty_print()
p.run()

+--------------+-----------+----------------------------------------------------------------------------------------------------+
|Project Number|Description|Country                                                                                             |
+--------------+-----------+----------------------------------------------------------------------------------------------------+
|A018823001    |2011-11-10 |National Water Quality and Availability Management Program                                          |
|A019362001    |2012-05-01 |Microfinance Services                                                                               |
|A020246001    |2013-07-23 |Popular Economy Building                                                                            |

So..this is close, but it doesn't quite work.  And even if it did, it would make more sense to filter out the fields in the target object.  But how to do this?


Adrian Klaver

unread,
Nov 2, 2013, 4:08:10 PM11/2/13
to datab...@googlegroups.com
In what way is it not working? I am probably not sufficiently
caffeinated but I am not seeing the problem:)

And even if it did, it
> would make more sense to filter out the fields in the target object.
> But how to do this?
>

In any case for a CSV file with three fields:

In [92]: p = Pipeline()

In [93]: p.source_object('csv_source',resource='test.csv',
infer_fields=True)

In [94]: p.field_filter(keep=['Fld_1', 'Fld_2'])

In [95]: p.pretty_print()


In [96]: p.run()
+-----+-----+
|Fld_1|Fld_2|
+-----+-----+
| 1| 2|
| 4| 5|
+-----+-----+



--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,
Nov 2, 2013, 5:19:18 PM11/2/13
to datab...@googlegroups.com

On Nov 2, 2013, at 4:08 PM, Adrian Klaver <adrian...@gmail.com> wrote:

+--------------+-----------+----------------------------------------------------------------------------------------------------+
|Project Number|Description|Country
                                                        |
+--------------+-----------+----------------------------------------------------------------------------------------------------+
|A018823001    |2011-11-10 |National Water Quality and Availability
Management Program                                          |
|A019362001    |2012-05-01 |Microfinance Services
                                                        |
|A020246001    |2013-07-23 |Popular Economy Building
                                                       |

So..this is close, but it doesn't quite work.

In what way is it not working? I am probably not sufficiently caffeinated but I am not seeing the problem:)

LOL, well the Description Column has the date, and the Country column has the description, so something  went horribly wrong somewhere inside the bowels of bubbles  ;)

Peder 

Adrian Klaver

unread,
Nov 2, 2013, 5:42:18 PM11/2/13
to datab...@googlegroups.com
On 11/02/2013 02:19 PM, Peder Jakobsen wrote:
>
> On Nov 2, 2013, at 4:08 PM, Adrian Klaver <adrian...@gmail.com
I really need to look into stronger coffee. Are the data values under
the field names the first three columns? If I remember correctly from
Brewery, if you supply fields= you are agreeing to supply all the field
names, anything less and you get the above.

>
> Peder
>
> --
> You received this message because you are subscribed to the Google
> Groups "databrewery" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to databrewery...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.


--
Adrian Klaver
adrian...@gmail.com

Peder Jakobsen

unread,
Nov 4, 2013, 9:42:21 AM11/4/13
to datab...@googlegroups.com


 If I remember correctly from Brewery, if you supply fields= you are agreeing to supply all the field names, anything less and you get the above.


Since you get all the fields for free without passing fields= , what would be the purpose of passing them at all I wonder?  It must be to rename the existing fields…. or..?

Peder 



Adrian Klaver

unread,
Nov 4, 2013, 9:45:35 AM11/4/13
to datab...@googlegroups.com
> rename the existing fields�. or..?

For CSV files that do not have a header row.
Reply all
Reply to author
Forward
0 new messages