Web2py and Big Data

505 views
Skip to first unread message

Paul Gerrard

unread,
Jun 20, 2013, 6:51:52 AM6/20/13
to web2py-d...@googlegroups.com
The whole Big Data thing seems to be really taking off now and part of that (what I think will be 'the next big thing') will be the dependency on NoSQL-type databases. 

Now, creating bindings and updating DAL to access these DBs (google/nosql, MongoDB exist already, UQLite is bbeing discussed in another thread) is kind of straightforward.

Is it (is there?) time to step back and think what role or position web2py could have in the Big Data domain?

Is anyone else on the list thinking about this (or working on big Data apps) already? I think it's going to dominate the industry in a year or two.

Paul.

Massimo Di Pierro

unread,
Jun 20, 2013, 1:01:46 PM6/20/13
to web2py-d...@googlegroups.com
I am very interested.

Paul Gerrard

unread,
Jun 21, 2013, 4:11:05 AM6/21/13
to web2py-d...@googlegroups.com
I don't have a view on how Web2py might provide some Big Data features, so just wanted to start a little debate on the sort of thing that *might* be done. I'm researching Big Data now with a focus on how testing is affected by the new risks of this area. Right now, it seems the Hadoop is the product to watch. I came across this "Guide to Python Frameworks for Hadoop" http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/ which looks good (but might need an update).

Paul.

Franziskos Kyriakopoulos

unread,
Jul 5, 2013, 11:29:32 AM7/5/13
to web2py-d...@googlegroups.com

Great topic.
@Paul:
My company (www.simplectix.com ,young start-up from Austria) is developing Big Data & Predictive Analytics solutions.
We rely on python (and it's powerful libraries: numpy, scikits-learn, pytables, pandas etc.) and we use web2py as a web frontent.
@Massimo : First of all thanks for web2py :). And it's very nice that you are interested in this topic.

Here my personal view on this subject:
1. Big data is just 2 buzzwords. And behind them there are many many technologies for storage, processing, transformation, analysis, machine learning,  and visualization of small, large, huge, simple, complex, structured & unstructued datasets.
2. Big data is not just Hadoop and NoSQL. These two terms have come in fashion but there are still many, many companies or organizations which do not use either.
    I just mean that you can do big data analytics also with SQL and regular scripts. You can also do it with text files on a laptop.
3. A very importand step is to make the Results of Big Data & Predictive Analytics accessible and understandable to people who don't have the technical backround or the tools to do it themselves.
    and exactly at this point web2py could play the role of the deliverer.  To make clear what I mean here is a toy model of a business intelligence app.

    First you have some data sources: Operational SQL db's (or No SQL), Data coming from api's, (google, analytics, twitter, facebook, data you scrap etc.) , Flat files (excel sheets etc.),
                                                        data from ERP& CRM systems, or even m2m measurements or data from scientific simulations.
    Second you build a Data Warehouse: This is a lot of work and there are many ways to do it depending on your needs.
    My favorite way is to use MySQL with a star schema and write python scripts for the   ETL process.
    Third you do (Predictive) Analytics: If you have some time dependent data then do time series forecasting, outlier detection etc.
                     You have some user-item-preference dataset the you can do Collaborative filtering.
                                         When you have documents you do text mining etc. etc. (There are so many use cases).
    Now you can save the results and metrics of your analytics in the data warehouse and use web2py to fetch the data and sent them to the client.
    In the client you can use jquery + your favorite visualization library (I prefer d3.js) to present the analytics to the end-user.

    You need a server who does all the etl and analytical processing outside of web2py. There are many python and non python frameworks for this.
    I plan to try out pulsar.      
   
   
    There are two special points which i find very interesting:
   
    1. It would be great if one could trigger an ajax request from a client side(d3.js) visualization.
    Example: You have a map and by clicking on some country you trigger an ajax request to the db which sends detailed country info.
    This Info is displayed in a separate chart or a table. This way you can do visual OLAP.
   
    2. Start a python script from a view by clicking on a button.
       Example: Your analytical server ran a machine learning algorithm with some parameters. You can see the parameters and performance metrics in a view.
       So you change the parameters (In a prepopulated form) and by submission the form process triggers the analytical server to restart the machine learning task.


     
    I hope we can continue this very interesting discussion here.

      Kind Regards
      Franziskos
            

Paul Gerrard

unread,
Jul 5, 2013, 1:26:37 PM7/5/13
to web2py-d...@googlegroups.com
Hi Franziskos,

Thanks for the background - it sounds like exciting times :O)

I have something like 500 pages of text downloaded on the Big Data 'thing'. I've bought a few books too on NoSQL, Predictive Analytics  and 'Big Data'. I'm definitely forming the view that of the 3/4 V Velocity, Volume, Variety and Veracity it's the 3rd and 4th that are key. 

20 years ago I worked on what would now be called Data Warehouses etc. Some things haven't changed much but the demand for clever statistics and analysis seems to be racing ahead of the skills and technology (mostly skills). My particular interest is in testing Big Data systems.

At the Test Management Forum on 31 July, http://uktmf.com with a friend who works for another Big Data product company I'll be doing the intro/survey of what's out there and he'll be talking about testing a Big Data product. I'll plough through all the content that I have and share what comes of that when I can :O)

Paul.

Michele Comitini

unread,
Jul 5, 2013, 4:33:33 PM7/5/13
to web2py-developers


2013/7/5 Paul Gerrard <pa...@gerrardconsulting.com>
--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/
---
You received this message because you are subscribed to the Google Groups "web2py-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py-develop...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages