Bulk upsert / upserting in general

316 views
Skip to first unread message

Martin Klepsch

unread,
May 10, 2014, 6:57:13 AM5/10/14
to clojure-el...@googlegroups.com
Hello,

I'm parsing lots (1M maybe) of small XML snippets to maps that I want to index with elastisch.
Right now I'm parsing chunks of them and then insert them à 2000 using bulk-operations.

The problem I now noticed is that this bulk insertion doesn't update an existing document 
with the matching ID if it exists. Now after looking into the elasticsearch docs I understood 
that bulk upsertion is generally possible (but I'd have to construct that on my own as there
are no fns in elastisch to generate bulk-upsert operations).

I then saw that the native client provides an upsert fn for single documents
(clojurewerkz.elastisch.native.document/upsert). On a side note: is there a reason
why this is not implemented for the rest client?

Given the amount of data I want to index (ideally as fast as possible) I'm not sure
which route to take.

Using the native client (with higher throughput) could upserting single documents work just fine?

Or would I be better of generating a bulk upsert query?

Thanks for any help, I hope my problem description makes sense :)

Martin Klepsch

unread,
May 12, 2014, 5:43:21 AM5/12/14
to clojure-el...@googlegroups.com
I figured out how to do it after some helpful hints by karmi in IRC (#elasticsearch).

The following code can generate operations that can be executed with Elastisch's bulk function.

(def ^:private special-operation-keys
  [:_index :_type :_id :_routing :_percolate :_parent :_timestamp :_ttl])

(defn upsert-operation [doc]
  {"update" (select-keys doc special-operation-keys)})

(defn upsert-document [doc]
  {:doc (dissoc doc :_index :_type)
   :doc_as_upsert true})

(defn bulk-upsert
  "generates the content for a bulk insert operation"
  ([documents]
     (let [operations (map upsert-operation documents)
           documents  (map upsert-doc documents)]
       (interleave operations documents))))

I wonder if it would make sense to include parts of this (bulk-update potentially?) in Elastisch?

Cheers!

David Smith

unread,
Feb 5, 2015, 5:10:20 AM2/5/15
to clojure-el...@googlegroups.com
Hi Martin, I'm about to copy and paste your code, why don't you issue a PR for this?

Martin Klepsch

unread,
Feb 5, 2015, 12:33:03 PM2/5/15
to clojure-el...@googlegroups.com
I'm not using Elasticsearch anymore so testing a PR etc. is out of my range.
Feel free to take the code and use it to open a PR yourself :-)

Great to hear this has been helpful to someone :)
Reply all
Reply to author
Forward
0 new messages