List partitioning

Lee

unread,

Dec 1, 2007, 9:05:14 PM12/1/07

to

<BackgroundInfo>:

Overimplifying for the sake of brevity:

1. An "RDF Triple" is simply a tuple consisiting of a subject, predicte
and object, thus (S,P,O)

2. A "triple store" is a a table of triples.

They are of interest when they get to the Mega Row size and beyond.

3. Here is a snipet of a discussion about "MPT" a particular
implementation of a triple store:

The traditional way of storing RDF in a relational database is to use a
"big table of triples". Variations on this basic approach are employed
by popular RDF storage engines including Jena, Sesame, and 3store.

Our approach, MPT ("Mapped Predicate Tables"), is different. Recognizing
that the number of relationship types in real RDF data is much lower
than the number of nodes, MPT distributes triples across several tables,
each holding all the relationships of a certain type. This design offers
efficient query plans for complex queries as well as an opportunity to
scale across storage devices.

</BackgroundInfo>

I am not a DBA, however, it seems to me that what our heros have done is
to invent (Re invent?) the idea of table partioning.

Their insight into their data is that the number of "predicates" is
typically way smaller (they say 50, but lets say 50->1000 different
predicates) than the number of triples (10s or even 100s of MegaRows)

But couldnt we get the same effect in Oracle by doing list partioning
(one partition for every predicate) or range partitioning (one partion
for each group of predicates, say one partion for predicate starting
with a-h, another for predicates starting wih i-m, etc ) ?

Oh yes, I know that Oracle provides its own triple store as part of the
RDF support which in turn is part of Oracle Spatial. Leave that aside
for the moment, unless someone here has knowlege of some of the
internals about how the Oracle triple store works.

This communication is about inviting informed comments about how one
might use partitioning to advantage in the context above.

So what do you say? If I discovered that I had a table of three columns
and Mega rows, but I could say that one of the columns had no more than
(50? 100? 1000? ) distinct values could I use partitioning to good
effect and if so, what sort ... i.e. are more partions always better or
is a smaller set but uniformly populated better or what?

csn

unread,

Dec 2, 2007, 5:20:29 AM12/2/07

to

Aren't you mixing logical and physical here?... partitioning is the
physical implementation.

Lee

unread,

Dec 2, 2007, 11:18:45 AM12/2/07

to

csn wrote:
> Aren't you mixing logical and physical here?... partitioning is the
> physical implementation.

No, not at all.

It may be clearer in the full paper (I just showed a snipet to give
context to what I hoped to be the main topic, that is, whether the
implementation of a large triple store would be good candidate for
partitioning, and if so, to tap the wisdom of this group about the best
way to go about it.

The authors are quite clear that they see the size of their table as the
problem, and that the cure is to break the table into several (over
fifty, in their case) smaller tables and then to hide the dirty deed
from the end user by writing a Java wrapper around the whole thing so
that the end users "sees" the data as if it were the original one huge
table and writes queries against it accordingly.

Naturally, making several physical tables appear to be one larger table
involves some physical to logical mapping (call it "virtuality") but
thats just what happens, as I understand it, with Oracle's partitioned
tables, except that the virtuality (making the many tables appear to be
one table) is built into the database and not tacked on as an external
application layer wrapper.

csn

unread,

Dec 2, 2007, 9:55:18 PM12/2/07

to

Partitioning is built into the DB, yes. This is transparent to SQL, ie
the SQL does not know or care whether a table is partitioned or not.
Having said that, there are SQL extensions that can access the
partitions directly.

Tables (and indexes) may be partitioned by range, hash, list, and
composites thereof. The number of columns is not relevant, but the
total table size (rows * columns * column_size) is, as the main
reasons for partitioning are administration and performance. If you
partition say by range on month, then it is a simple administrative
matter to drop off the oldest partition to keep a fixed number of
rotating months in the partitioned table. If you partition say by list
and query for a particular value, then query performance is improved
because you only need to access the partition that contains that
value.

More info can be found here: http://download.oracle.com/docs/cd/B28359_01/server.111/b32024/partition.htm#g471747