Great topic.
@Paul:
My company (
www.simplectix.com ,young start-up from Austria) is developing Big Data & Predictive Analytics solutions.
We rely on python (and it's powerful libraries: numpy, scikits-learn, pytables, pandas etc.) and we use web2py as a web frontent.
@Massimo : First of all thanks for web2py :). And it's very nice that you are interested in this topic.
Here my personal view on this subject:
1. Big data is just 2 buzzwords. And behind them there are many many technologies for storage, processing, transformation, analysis, machine learning, and visualization of small, large, huge, simple, complex, structured & unstructued datasets.
2. Big data is not just Hadoop and NoSQL. These two terms have come in fashion but there are still many, many companies or organizations which do not use either.
I just mean that you can do big data analytics also with SQL and regular scripts. You can also do it with text files on a laptop.
3. A very importand step is to make the Results of Big Data & Predictive Analytics accessible and understandable to people who don't have the technical backround or the tools to do it themselves.
and exactly at this point web2py could play the role of the deliverer. To make clear what I mean here is a toy model of a business intelligence app.
First you have some data sources: Operational SQL db's (or No SQL), Data coming from api's, (google, analytics, twitter, facebook, data you scrap etc.) , Flat files (excel sheets etc.),
data from ERP& CRM systems, or even m2m measurements or data from scientific simulations.
Second you build a Data Warehouse: This is a lot of work and there are many ways to do it depending on your needs.
My favorite way is to use MySQL with a star schema and write python scripts for the ETL process.
Third you do (Predictive) Analytics: If you have some time dependent data then do time series forecasting, outlier detection etc.
You have some user-item-preference dataset the you can do Collaborative filtering.
When you have documents you do text mining etc. etc. (There are so many use cases).
Now you can save the results and metrics of your analytics in the data warehouse and use web2py to fetch the data and sent them to the client.
In the client you can use jquery + your favorite visualization library (I prefer d3.js) to present the analytics to the end-user.
You need a server who does all the etl and analytical processing outside of web2py. There are many python and non python frameworks for this.
I plan to try out pulsar.
There are two special points which i find very interesting:
1. It would be great if one could trigger an ajax request from a client side(d3.js) visualization.
Example: You have a map and by clicking on some country you trigger an ajax request to the db which sends detailed country info.
This Info is displayed in a separate chart or a table. This way you can do visual OLAP.
2. Start a python script from a view by clicking on a button.
Example: Your analytical server ran a machine learning algorithm with some parameters. You can see the parameters and performance metrics in a view.
So you change the parameters (In a prepopulated form) and by submission the form process triggers the analytical server to restart the machine learning task.
I hope we can continue this very interesting discussion here.
Kind Regards
Franziskos