RDF Alchemy for Scientific Data

23 views

Skip to first unread message

percious

unread,

Feb 9, 2008, 10:28:08 AM2/9/08

to rdfalchemy-dev

Phil,

I put your project in the back of my mind, and now I have a potential
idea for use of your software. I'd like to see this software used for
storing arbitrary scientific data, where the meta data may change from
project to project, or even over time as projects evolve.
Furthermore, relationships between different data sets should be able
to be made based on related data.

The problem is that the amount of raw data may exceed terabytes in
some instances, and it would be valuable to have searches that are
still fast when there is 1,000 records vs. 100,000 records. Is this
practical? I see you are using SleepyCat, I am assuming by this you
mean bdb?

Anyway, interested to know what you think.

-chris

Philip Cooper

unread,

Feb 10, 2008, 4:55:38 PM2/10/08

to rdfalch...@googlegroups.com

percious at about 2/9/08 8:28 AM said:
> .. potential idea ...

> storing arbitrary scientific data,
> where the meta data may change from project to project,
> or even over time as projects evolve.

Sounds like a great application for a triplestore. Without it, your
relational tables would keep changing and cause havoc and obsolescence
in you code.

> The problem is that the amount of raw data may exceed terabytes in
> some instances, and it would be valuable to have searches that are
> still fast when there is 1,000 records vs. 100,000 records. Is this
> practical?

Sure, there are triplestores running with hundreds of millions of
triples and larger ones that have been tested.

> I see you are using SleepyCat, I am assuming by this you
> mean bdb?

Yes but, Check out the rdfalchemy.engine module
http://www.openvest.com/public/docs/rdfalchemy/api/rdfalchemy.engine-module.html
You can use a mysql (good for huge apps) or zodb (questionable for
size) backend.

Also, you are not limited to rdflib stores as the current trunk also
allows read/write access to a Sesame triplestore. See my posting
http://groups.google.com/group/rdfalchemy-dev/browse_thread/thread/b43143fbda3118de
about adding Jena and Sesame access via a jython branch. If there is
a triplestore out there, there's a chance we can get you full access.

>
> Anyway, interested to know what you think.
>

I think you are on the right track. The namechange of the class
rdfObject to be rdfSubject reflects a more "SubjectOriented"
programming paradigm. The data is the data and our code changes.
relational structures force too much structure on the data and object
oriented structures require advanced knowledge of how the data will be
used *forever*.