Thank you for your time to answer this question,
>
> The multidimensional database technologies commonly used in Online
> Analytical Processing (OLAP) generally provide efficient means for
> aggregation on one or more dimensions of data. A classic example, which you
> mention, is storing and doing analytic calculations over point of sale (POS)
> data.
>
> It is possible to build an OLAP system on top of Hypertable. Here's how you
> could do it for the POS example that you mention. Create two tables, POSRaw
> and POSRollup. The POSRaw table would contain all of the raw point of sale
> transaction information. The row key would be the timestamp of the
> transaction and would include the following column families:
>
> item
> price
> region
> store_id
> [...]
I'm assuming that you meant that the row key would be the timestamp
and the POSRaw
would contain the above column families?
> For the the sparse information about each product, you could have a column
> family called "attr" and use the column qualifiers (word after the colon) to
> specify each attribute. Since Hypertable stores information in a sparse
> representation, it is well-suited for this kind of data. So, the qualified
> "attr" columsn might include the following:
>
> attr:color
> attr:size
> attr:count
> attr:weight
> attr:length
> attr:units
> [...]
Ok, so this is a completely separate table with products only, which
is modelled using
the star schema and the proper record would be selected by looking up
the item using
'item' column family (possibly the 'id' column within that family)
from the POSRaw table.
Is that correct?
> With the 1.0 release, the query language will be limited to simple SELECT
> with a predicate. The next major release after 1.0 will contain a richer
> query language that will include a full suite of basic stats functions for
> generating summaries.
Is hypertable OK with building Index tables which contain ORDER BY indexes
for some subsets of columns from some subset of column families?
If not how would you do the ORDER BY selects at the current state of hypertable?
Mateusz
>
> It is possible to build an OLAP system on top of Hypertable. Here's how youI'm assuming that you meant that the row key would be the timestamp
> could do it for the POS example that you mention. Create two tables, POSRaw
> and POSRollup. The POSRaw table would contain all of the raw point of sale
> transaction information. The row key would be the timestamp of the
> transaction and would include the following column families:
>
> item
> price
> region
> store_id
> [...]
and the POSRaw
would contain the above column families?
> For the the sparse information about each product, you could have a columnOk, so this is a completely separate table with products only, which
> family called "attr" and use the column qualifiers (word after the colon) to
> specify each attribute. Since Hypertable stores information in a sparse
> representation, it is well-suited for this kind of data. So, the qualified
> "attr" columsn might include the following:
>
> attr:color
> attr:size
> attr:count
> attr:weight
> attr:length
> attr:units
> [...]
is modelled using
the star schema and the proper record would be selected by looking up
the item using
'item' column family (possibly the 'id' column within that family)
from the POSRaw table.
Is that correct?
Hello Doug !
> Sorry for the delayed response. I don't have direct experience with data
> wherehousing and OLAP technology, so I'm having to review some of the
> literature. Comments inline ...
Again, thank you very much for your time spent on this answer.
Ah, so that's the way. I'm definitely going to try this out and share some
information about my experience wrt to compression efficiency.
> Once you have this denormalized POSRaw table, you could create materialized
> views using map-reduce. The map-reduce system would have to be able to read
> from a Hypertable and write to a Hypertable (easily implementable
> extensions). So for example, let's say you want to answer questions about
> total sales per product. You could write a map-reduce program that operated
> on the POSRaw table and mapped each transaction into the following key/value
> pair
>
> key=(product, month) value=1
>
> The reduce() function would aggregate (sum together) the monthly sales
> volume for each product. The output of the reduce() function could be
> written directly into Hypertable. This new table, TotalSalesByProduct,
> would be a materialized view of the raw data, where the row key would be
> (product, month) and the one column would be sales volume.
I'm still trying to connect the dots and this seems to me as the most
appropriate solution to the problem. I'm definitely going to try this out
very very soon.
> Does this sound like a solution that would work for you?
Definitely! Thank you very much for the comprehensive explanation.
As soon as implement this in practice I'm going to share my experiences
with the list.
best regards,
Mateusz Berezecki