Proposal: Using pandas as backend

84 views
Skip to first unread message

Stefan Urbanek

unread,
Jul 27, 2012, 4:58:45 AM7/27/12
to datab...@googlegroups.com
Hi,

Here is a proposal of using Pandas data analysis library as one of potential backends for Brewery:


Note that having dependency on Pandas will impose dependency on numpy and couple other packages. I do not think that more dependencies it good idea, therefore it should be only an alternative. It will require different approach to stream construction and different Node interface.

What do you think?

Stefan

Twitter: @Stiivi


Adrian Klaver

unread,
Jul 28, 2012, 10:23:58 AM7/28/12
to datab...@googlegroups.com, Stefan Urbanek
On 07/27/2012 01:58 AM, Stefan Urbanek wrote:
> Hi,
>
> Here is a proposal of using Pandas data analysis library as one of
> potential backends for Brewery:
>
> http://blog.databrewery.org/post/28088920149

The blog page seems to be down. I cannot access the above post.

>
> Note that having dependency on Pandas will impose dependency on numpy
> and couple other packages. I do not think that more dependencies it good
> idea, therefore it should be only an alternative. It will require
> different approach to stream construction and different Node interface.
>
> What do you think?

As far as I gotten with Pandas is bookmarking the site for further
review. It looks interesting so I could see integrating it. I will try
to delve in deeper this weekend.

>
> Stefan

Thanks,

--
Adrian Klaver
adrian...@gmail.com

Stefan Urbanek

unread,
Jul 29, 2012, 1:38:28 PM7/29/12
to datab...@googlegroups.com, Adrian Klaver

On 28.7.2012, at 16:23, Adrian Klaver <adrian...@gmail.com> wrote:

> On 07/27/2012 01:58 AM, Stefan Urbanek wrote:
>> Hi,
>>
>> Here is a proposal of using Pandas data analysis library as one of
>> potential backends for Brewery:
>>
>> http://blog.databrewery.org/post/28088920149
>
> The blog page seems to be down. I cannot access the above post.
>

It is tumblr hosted, sometimes refreshing multiple times helps (or accessing just blog.databrewery.org and then article).

>>
>> Note that having dependency on Pandas will impose dependency on numpy
>> and couple other packages. I do not think that more dependencies it good
>> idea, therefore it should be only an alternative. It will require
>> different approach to stream construction and different Node interface.
>>
>> What do you think?
>
> As far as I gotten with Pandas is bookmarking the site for further review. It looks interesting so I could see integrating it. I will try to delve in deeper this weekend.
>

That would be nice. Here is the backend idea explained:

http://yfrog.com/z/kl45czp

Basically Brewery streams is metadata-based description of workflow/thought flow alternatives. The backend serves as computational engine.

I am going to be offline (for most of the time) for next week, be back on 5th. We can discuss it afterwards.

Naveen Michaud-Agrawal

unread,
Sep 10, 2012, 9:26:24 AM9/10/12
to datab...@googlegroups.com, Adrian Klaver
Hi Stephan,

I think this is a great idea! Has any progress been made on using pandas as a backend?

Naveen

Stefan Urbanek

unread,
Sep 13, 2012, 2:20:37 AM9/13/12
to datab...@googlegroups.com
Hi Naveen,

I didn't had too much time to implement that yet, unfortunately. However, I found another alternative: carray


Nice small framework which provides fast column-based storage layer with persistence. Implementation would be simpler than with Pandas, while still providing structures for algorithms that work with numpy arrays:


What do you think?

s.

Stefan Urbanek
data analyst and data brewmaster

Naveen Michaud-Agrawal

unread,
Oct 4, 2012, 11:26:30 AM10/4/12
to datab...@googlegroups.com

Carray looks like a good alternative for persistence. I was thinking that pandas would help in implementing some of the higher level backend interface, since it has very fast grouping and filtering.

Naveen
Reply all
Reply to author
Forward
0 new messages