RDF Schema

91 views
Skip to first unread message

Herbi

unread,
Aug 9, 2013, 5:31:22 AM8/9/13
to brightsta...@googlegroups.com
Hi,
how do I define a RDF schema in BrightstarDB?

I´m working on the implementation of a document management system. Therefore I want to store the metadata of documents in a RDF triple store like BrightstarDB.
Beside the metdata of documents, I want to connect a document to "DocumentClasses" (for classification) and "Depots" (for structuring).

For example: I have a pdf-document, which is a bill. In object-oriented notation I want to have following classes:
 - class "Document" (with properties "Filename", "ArchiveDate", "FileType",...)
 - class "Bill" (with properties "BillNr", "BillDate", "Amount"...). "Bill" is a DocumentClass, other DocumentClasses can be e.g. "Email", "Offer",...
 - class "Customer" (with properties "CustomerNr", "Name",...). "Customer" is a Depot, other depots can be "Project", "Article",...

1. How do I define such a schema in RDF? 
2. Is a RDF triplestore appropriate for storing such data? 
3. Can I enforce the database to allow to store only data, which fits the defined RDF schema?

Khalil Ahmed

unread,
Aug 9, 2013, 5:54:59 AM8/9/13
to brightsta...@googlegroups.com
Hi Herbi

If you want to use native RDF technologies, you can use RDF Schema or OWL to express your classes and their properties. These are both defined in terms of RDF, so the schema is expressed in triples which you can store in BrightstarDB. 

Enforcing the schema is a different matter though. RDF makes an "open-world" assumption so in fact an RDF schema is not used to constrain instances but rather to enable a reasoner to deduce things about those instances based on the properties they have. So if a resource has a "CustomerNr" property a reasoner can assume that the resource is an instance of "Customer" - a resource can have multiple overlapping types. This is effectively the reverse of the OO situation where a class defines the properties that an instance has (e.g. an instance of "Customer" must have a property "CustomerNr"). The main reason for this difference is that RDF is intended to support data integration at web-scale where you cannot have prior knowledge of what kinds of assertions might be made about resources you create.

Because of this open-world assumption, BrightstarDB doesn't have any notion of database-level constraints, so all constraint validation has to be handled by the application.

The other approach with BrightstarDB would be to use our "entity framework". This allows you to define  C# types (as interfaces) for your entities so you are much closer to the OO world if you go down this route. You also then gain access to LINQ as a query language which may be useful to you. Your application would still need to do validation, but you at least have your OO model to use to perform that validation. The data is still stored in RDF and you can still add more properties that aren't mapped to your C# interfaces so data integration is not impaired by using the entity framework; it is just maybe a bit more comfortable and natural for C# developers :)

Cheers

Kal


--
You received this message because you are subscribed to the Google Groups "BrightstarDB Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brightstardb-us...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Kal Ahmed
Director, Networked Planet Limited
e: kal....@networkedplanet.com
w: www.networkedplanet.com

Herbi

unread,
Aug 9, 2013, 7:35:47 AM8/9/13
to brightsta...@googlegroups.com
Hey Kal,
thx for the detailed answer.

I´m comparing different types of NoSQL databases to find the best for my needs. Therefore I made a pros/cons list comparing BrightstarDb and RavenDb (which is a NoSQL document-store):
RavenDb and BrightstarDb are both ACID compliant and scalable.
The difference is in extensibility and search capabilities: 

 - extensibility: Although both are schemaless, RavenDb has an implicit schema, which is defined by the stored aggregates. Therefore BrightstarDb is easier to extend, because it has no schema. For instance combining different domains is easy, which is very useful in my case (different applications with different domains want to use 1 document management system (DMS), so the DMS is able to store the metadata of all domains). So the application is very open and also complies to a standard.

 - search capabilities: RavenDb uses Lucene.net to provide fast search capabilities. As far as I have read most triple strores save everything as a string. Is this also the case for BrightstarDb? How can I perform for instance "range-searches" using SPARQL? (e.g. Bills with BillNr > 10 AND BillNr <20). What about date-time ranges (e.g. between 1.1.2013 and 1.2.2013)? Is this supported on the server side?
Additionally, as far as I understand there is only 1 index (over the "statements") for the whole database in triplestores, right? As RavenDb let´s me define my own indices over the important part of the data (using the implicit schema), this might have a better performance than a triplestore, right?

So far I think the big advantage I have when using BrightstarDb is the possibility to easily combine and extend different domains in 1 database.

thx
Herbi
To unsubscribe from this group and stop receiving emails from it, send an email to brightstardb-users+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Khalil Ahmed

unread,
Aug 9, 2013, 8:07:59 AM8/9/13
to brightsta...@googlegroups.com
Hi Herbi

I think your analysis is pretty much spot on. BrightstarDB does support range queries over certain supported data-types - these include float/double, int/long, date/time and of course string. However, we don't currently have specific indexes so range queries tend to have to iterate all matches and filter on the range - which obviously could be slow for some queries. Additional indexing and / or integrated Lucene support are things I would be interested in adding to BrightstarDB in the future. The indexes we do have are essentially over the statements to enable us to quickly match on a subject/predicate or predicate/object pair (and reasonably fast to match when you only have one of subject, predicate or object). As you say RavenDB has the ability for you to define your own indexes, which can lead to some nice performance when you get the indexing right for your application - though that comes at the price of having to define and redefine those indexes as the shape of your data changes.

Cheers

Kal


To unsubscribe from this group and stop receiving emails from it, send an email to brightstardb-us...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages