The second described way would be more fast but requires some
technical steps. In any case, at the end of the process you will
have
So you can start loading and querying data.
---------------------------
Having said that, I would say that CumulusRDF could be also used as an API in your code for loading (and also querying) data; in this way things are a bit more performant because basically no HTTP transfer is involved. As last note, you can also use a mixed approach: loading data fast by embedding CumulusRDF client API and publicy expose loaded data by using the SPARQL / HTTP approach.
Let me know if you have / meet some problem with all above...I'll
be happy to help you.
Andrea
[1] https://code.google.com/p/cumulusrdf/wiki/GettingStarted
[2] http://cassandra.apache.org/
[3] http://planetcassandra.org/companies/
[4] https://code.google.com/p/cumulusrdf/source/checkout
--
You received this message because you are subscribed to the Google Groups "cumulusrdf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cumulusrdf-li...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Enayat,
thanks for entering this post.
Let me briefly say two words about CumulusRDF (you can find more detailed information in our Wiki [1])
CumulusRDF is an RDF store that uses Apache Cassandra as underlying storage. Apache Cassandra is a good and proven NoSql (specifically column-oriented) storage for managing huge volume of data. You can see here [3] medium / big companies that are using Cassandra.
The very first way of using CumulusRDF is as an HTTP service (i.e a web application providing SPARQL 1.1 and other data services over HTTP protocol).
In this case CumulusRDF will be running as a web application and will provide a REST interface (i.e. you will be able to load, update and query your data); it needs
In order to have that you can follow these 2 alternatives :
- a servlet engine or an application server (e.g. Apache Tomcat, Jetty, JBoss, Oracle Weblogic, IBM Websphere, Glassfish)
- a Cassandra ring. With "ring" I mean a cluster of Cassandra nodes, which could be composed also by one single node
- User
- Download and start Cassandra 1.2.x (a single node for testing and trying is good)
- Download and start Tomcat 6.x or greater. Also another servlet engine that supports 2.5 specs is good
- (I assume you aolready have a JVM 1.6)
- Deploy the CumulusRDF war in the servlet engine (in tomcat is just a matter of copying the war archive in webapps folder)
- Once make sure all is working, use the interface to load some data and query it
- Technical user (requires no download of external middleware like Cassandra or Tomcat)
- assuming you have
- JDK1.6
- Maven 3.x
- SVN client
- checkout the latest stable version (i.e trunk) fror our repository [4]
- open a shell or a DOS prompt and type
- mvn clean cassandra:stop cassandra:start tomcat7:run
The second described way would be more fast but requires some technical steps. In any case, at the end of the process you will have
- a running Cassandra node
- a running servlet engine
- a deployed CumulusRDF web application
So you can start loading and querying data.
---------------------------
Having said that, I would say that CumulusRDF could be also used as an API in your code for loading (and also querying) data; in this way things are a bit more performant because basically no HTTP transfer is involved. As last note, you can also use a mixed approach: loading data fast by embedding CumulusRDF client API and publicy expose loaded data by using the SPARQL / HTTP approach.
Let me know if you have / meet some problem with all above...I'll be happy to help you.
Best,
Andrea
Hi,
You're right. At the moment we have, as official documentation, papers Andreas indicated in his previous email.
On top of that, we added and we're adding a lot of improvements so shortly we will come out with new benchmarks. In 1.1.x (next release) we have a dedjcated module for benchmarking that we will use for providing fresh and updated benchmark data
Can I ask you
- what is the (moreless) expected amount of data you have to manage
- what is the (morless) expexted queries / second you should support?
Best,
Andrea