Real-time/quick analytics reports?

otis.gos...@gmail.com

unread,

Mar 2, 2009, 5:48:07 PM3/2/09

to CloudBase

Hello,

I have a basic question:
Is CloudBase suitable for, say, large-scale web-analytics applications
AND is it capable of producing quick/real-time reports (e.g. if the
app is web analytics app, a report might be: "show top N most popular
search terms used by visitors from Germany who visited between 8 AM
and 8 PM. If you use Google Analytics or any other such service/tool
you'll know what I mean).

I'm asking because Hadoop is not designed for such real-time
operations, and since CloudBase is built on top of Hadoop I'm
wondering if/how it can be different?

Thanks,
Otis

Tarandeep Singh

unread,

Mar 2, 2009, 6:17:13 PM3/2/09

to cloudba...@googlegroups.com

Hi Otis,

CloudBase is not meant for real time queries. CloudBase provides database abstraction layer on top of Hadoop and coupled with its SQL interface makes it easier to use Hadoop to query/mine your data.

We do use CloudBase to generate reports but not real time reports. With CloudBase-1.2 release, it is possible to index the table data that will help to reduce query execution time but again the performance won't be closer to real time/quick reporting.

-Taran

Leo Dagum

unread,

Mar 2, 2009, 7:13:42 PM3/2/09

to cloudba...@googlegroups.com

Most real time reports work off aggregate or rollup tables. Where CloudBase fits in is generating the aggregates as part of a nightly batch processing, but the real time reporting is executed by a traditional rdbms.

- leo

otis.gos...@gmail.com

unread,

Mar 3, 2009, 11:11:42 AM3/3/09

to CloudBase

OK, so the idea is:

Data -> CloudBase (on top of Hadoop)
SQL -> CloudBase -> Aggregates/Rollups -> Import to external RDBMS
SQL -> RDBMS -> Real-time Report

Is this correct?

Would there be additional stage between Aggregates/Rollups and Import
to external RDBMS? That is, where do Aggregates/Rollups live before
they are imported in a RDBMS? I can imagine this being up to do the
implementer, but I'm wondering if you could recommend an approach?
For example, maybe Aggregates/Rollups would get stored in HDFS? Or in
custom CloudBase tables? Or stored in memory and loaded into RDBMS
after in-memory Aggregates/Rollups are created?

Thanks,
Otis

On Mar 2, 7:13 pm, "Leo Dagum" <leo_da...@yahoo.com> wrote:
> Most real time reports work off aggregate or rollup tables. Where CloudBase
> fits in is generating the aggregates as part of a nightly batch processing,
> but the real time reporting is executed by a traditional rdbms.
>
> - leo
>
> _____
>

> From: cloudba...@googlegroups.com
> [mailto:cloudba...@googlegroups.com] On Behalf Of Tarandeep Singh
> Sent: Monday, March 02, 2009 3:17 PM
> To: cloudba...@googlegroups.com
> Subject: Re: Real-time/quick analytics reports?
>
> Hi Otis,
>
> CloudBase is not meant for real time queries. CloudBase provides database
> abstraction layer on top of Hadoop and coupled with its SQL interface makes
> it easier to use Hadoop to query/mine your data.
>
> We do use CloudBase to generate reports but not real time reports. With
> CloudBase-1.2 release, it is possible to index the table data that will help
> to reduce query execution time but again the performance won't be closer to
> real time/quick reporting.
>
> -Taran
>

> On Mon, Mar 2, 2009 at 2:48 PM, otis.gospodne...@gmail.com

Tarandeep Singh

unread,

Mar 3, 2009, 12:17:46 PM3/3/09

to cloudba...@googlegroups.com

On Tue, Mar 3, 2009 at 8:11 AM, otis.gos...@gmail.com <otis.gos...@gmail.com> wrote:

OK, so the idea is:

Data -> CloudBase (on top of Hadoop)
SQL -> CloudBase -> Aggregates/Rollups -> Import to external RDBMS
SQL -> RDBMS -> Real-time Report

Is this correct?

Yes, this is correct.

Would there be additional stage between Aggregates/Rollups and Import
to external RDBMS? That is, where do Aggregates/Rollups live before
they are imported in a RDBMS? I can imagine this being up to do the
implementer, but I'm wondering if you could recommend an approach?
For example, maybe Aggregates/Rollups would get stored in HDFS? Or in
custom CloudBase tables? Or stored in memory and loaded into RDBMS
after in-memory Aggregates/Rollups are created?

You have couple of options for storing Aggregates/Rollups-

1) Store in HDFS as a new CloudBase table-
SELECT * INTO newTable FROM oldTable

2) Store on local file system-
SELECT * INTO 'path/to/local/dir' FROM oldTable
You can then use RDBMS specific tools (e.g. bulk loader) to load the data into RDBMS tables.

3) Export the results directly into RDBMS table-
INSERT INTO rdbmsTableName@databaseLinkName
SELECT * FROM tablename

A Database link forms a bridge between CloudBase and RDBMS using the RDBMS JDBC driver. Please see the Database link section (under CloudBase objects) and Insert Statement section (under DML statements) on the CloudBase website-
cloudbase.sourceforge.net

4) Insert the results into an already existant CloudBase table-
INSERT INTO cloudbaseTableName
SELECT * FROM tablename

Please note that you can specify column names in the INSERT statement in option (3) and (4). Refer to documentation for details.

Thanks,
Taran

Reply all

Reply to author

Forward