Hello All -
WordNet is a large database of words, and it includes lexical and semantic relationships between the lemmas (words) and synsets (synonym sets). I loaded it into Neo4j, along with all of the semantic and lexical relationships.
Originally I was going to load an RDF (http://semanticweb.cs.vu.nl/lod/wn30/) through SailGraph, but I decided to write a program to load it directly out of NLTK (
http://www.nltk.org/) into Neo4j so that I could easily modify it if needed. This process also helped me understand WordNet's structure.
Loading over REST through the normal transitional API was loading at a rate of about 50 entries per second. This was taking too long, and I wanted to be able to reload it if we needed to change something without it taking hours.
To speed up the process, I wanted to use the new Neo4j Batch Loader that Marko just created, but it's written in Java and NLTK is a CPython library that won't work with Jython. I decided to use this as an opportunity to experiment with ZeroMQ. And so I wrote Lightocket.
Lightsocket is a lightweight socket server that Bulbs uses to plug into the JVM. But it doesn't require that you use use Bulbs or even an Python client -- you can use any language that has a ZeroMQ binding (http://www.zeromq.org/bindings:c).
It's written in Jython (for now), and it was inserting ~3,000 entries per second from a single Python client on my dev server, and going up to 10K/sec with multiple clients.
I modeled WordNet in Bulbs (http://bulbflow.com/) and created a library for interacting with it through Rexster. That library is called WordGraph, and I'll post it in the next few days.
I also used this as an opportunity to experiment with binary serialization and RPC protocols, primarily MessagePack. JSON serialization is one of the biggest performance killers, and so I wanted to see how much we could improve upon that by using MessagePack.
Unfortunately the documentation is somewhat lacking and it's not a simple drop in with Jython. So right now Lightsocket is still using JSON, but I am continuing to experiment with it.
Soon I'll put up more detailed docs at (
http://lightsocket.bulbflow.com/), but if you want to play around with it, look at the README on Github along with the comments in the example client and example resource.
- James