Import dbpedia data into neo4j using clojure

328 views
Skip to first unread message

Himakshi Mangal

unread,
Dec 2, 2013, 5:11:53 AM12/2/13
to clo...@googlegroups.com
Hi...


I am using clojure to import dbpedia data into neo4j.

Here's the code:
(ns opal.dbpedia
  (:use [clojure.tools.logging :only [log]])
  (:require [clojure.java.io :as io])
  (:import [uk.ac.manchester.cs.owl.owlapi.turtle.parser TurtleParser]
           [org.neo4j.unsafe.batchinsert BatchInserters]
           [org.neo4j.graphdb DynamicRelationshipType]))

;; PARSING METHODS

(defn get-next-tuple
  [parser]
  (let [last-item (atom nil)
        tuple (atom [])]
    (while (and (not= "." @last-item)
                (not= "" @last-item))
      (reset! last-item
              (-> parser
                (.getNextToken)
                (.toString)))
      (swap! tuple conj @last-item))
    (when-not (empty? (first @tuple)) ; .getNextToken returns "" once you are out of data
      @tuple)))

(defn seq-of-parser
  [parser]
  (if-let [next-tuple (get-next-tuple parser)]
    (lazy-cat [next-tuple]
              (seq-of-parser parser))))

(defn parse-file
  [filename]
  (seq-of-parser
    (TurtleParser.
      (io/input-stream filename))))

;; BATCH UPSERT METHODS

(def id-map (atom nil))
(defn insert-resource-node!
  [inserter res]
  (if-let [id (get @id-map res)]
    ; If the resource has aleady been added, just return the id.
    id
    ; Otherwise, add the node for the node, and remember its id for later.
    (let [id (.createNode inserter {"resource" res})]
      (swap! id-map #(assoc! % res id))
      id)))

(defn connect-resource-nodes!
  [inserter node1 node2 label]
  (let [relationship (DynamicRelationshipType/withName label)]
    (.createRelationship inserter node1 node2 relationship nil)))

(defn insert-tuple!
  [inserter tuple]
  ; Get the resource and label names out of the tuple.
  (let [[resource-1 label resource-2 & _ ] tuple
        ; Upsert the resource nodes.
        node-1 (insert-resource-node! inserter resource-1)
        node-2 (insert-resource-node! inserter resource-2)]
    ; Connect the nodes with an edge.
    (connect-resource-nodes! inserter node-1 node-2 label)))

(defn -main [graph-path & files]
  (let [inserter (BatchInserters/inserter graph-path)]
    (doseq [file files]
      (log :debug (str "Loading file: " file))
      (let [c (atom 0)]
        (doseq [tuple (parse-file file)]
          (if (= (mod @c 10000) 0)
            (log :debug (str file ": " @c)))
          (swap! c inc)
          (insert-tuple! inserter tuple))))
    (log :debug "Loading complete.")
    (log :debug "Shutting down.")
    (.shutdown inserter)
    (log :debug "Shutdown complete!")))

I am getting the following errors:

IllegalAccessError Transient used by non-owner thread  clojure.lang.PersistentArrayMap$TransientArrayMap.ensureEditable (PersistentArrayMap.java:449) 

&&


IllegalArgumentException No matching method found: createNode for class org.neo4j.unsafe.batchinsert.BatchInserterImpl  clojure.lang.Reflector.invokeMatchingMethod


Can anyone please help me in this.. Am doing something wrong or am i missing something.. I am completely new to clojure. Is there a working example for this?


Please help..

Thanks 


Joel Holdbrooks

unread,
Dec 2, 2013, 9:43:34 PM12/2/13
to clo...@googlegroups.com
I'm not certain where the Transient error is coming from but as far as Neo4J is concerned have you considered using the neocons library to help you with your import? It provides a decent wrapper for working with Neo4J and perhaps it will spare you some headache. IIRC it does batch inserts. Have a look: https://github.com/michaelklishin/neocons

Joseph Guhlin

unread,
Dec 3, 2013, 10:34:16 AM12/3/13
to clo...@googlegroups.com
What version of neo4j are you using?

For your swap! you are mixing transients and atoms
(swap! id-map #(assoc! % res id)) 
should be:
(swap! id-map assoc res id)

The swap! fn will apply the value of the atom id-map to the fn as the first argument, followed by the remaining arguments, and then set the value of the atom to what the fn spits out:
so (swap! id-map assoc res id) is taking the value of (assoc @id-map res id) and storing it as the atom's new value.

I'm still not certain why you are having issues with the .createNode inserter, if you are on 2.0.0.* you may need to pass a labels array. You can use this code to create it and pass it:

(into-array org.neo4j.graphdb.Label labels)


Make sure your import has all of those:
Mine looks like (in the ns declaration):
  (:import (org.neo4j.graphdb NotFoundException
                              NotInTransactionException
                              RelationshipType
                              DynamicLabel
                              Label)
            (org.neo4j.kernel EmbeddedGraphDatabase
                              AbstractGraphDatabase)
           (org.neo4j.unsafe.batchinsert BatchInserter
                                         BatchInserters
                                         BatchInserterIndexProvider
                                         BatchInserterIndex)
           (org.neo4j.index.lucene.unsafe.batchinsert LuceneBatchInserterIndexProvider)))


For reference, my create-node fn calls like this:
(.createNode db
                    (-convert-props node-properties)
                    (into-array org.neo4j.graphdb.Label labels))

It's part of a larger series of fn's, which return a promise for each queued node, and only creates the relationship when all of those promises have been fulfilled, or I would post more code. I've been meaning to work on a simple tutorial though.

Neocons is a great library, but uses the REST API, which may be too slow or may not meet your requirements. 

Let me know how it goes.

Best,
--Joseph

Himakshi Mangal

unread,
Dec 6, 2013, 1:56:40 AM12/6/13
to clo...@googlegroups.com
Hi Joseph Guhlin,

Thanks your idea helped and i could send some sample data to my neo4j database.

Thank you very much... :)

Joseph Guhlin

unread,
Dec 6, 2013, 2:45:26 PM12/6/13
to clo...@googlegroups.com
Glad it worked, if you have any further questions feel free to ask. I'm using it extensively and it and Clojure seem to be a perfect match these days, especially on very large datasets.

--Joseph

Himakshi Mangal

unread,
Dec 10, 2013, 12:45:55 AM12/10/13
to clo...@googlegroups.com
Hi...

I have query regarding the clojure and neo4j..

I have started the program to load the dataset and also allocated 4G ram to it.

But after processing around 272000 records, the process hangs and nothing happens. If i try to stop the process then, and copy the data in neo4j folder i get this error :
 'neostore' does not contain a store version, please ensure that the original database was shut down in a clean state.


Can you please advice what can be done on that and how should i proceed further?

Many Thanks,

Kind Regards
Himakshi


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/HpI6wKHb8Pc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply all
Reply to author
Forward
0 new messages