partial or atomic updates in Solr 4.0-BETA

714 views
Skip to first unread message

Aron

unread,
Sep 26, 2012, 2:32:22 PM9/26/12
to sol...@googlegroups.com
I've been using solrpy with Solr 3.6.1, and it's been working great.  I've recently upgraded to Solr 4.--BETA because it has some new features I need.  In particular, it has the ability to do partial or atomic updates (i.e., the ability to add, remove, change, and incrementfields of an existing document without having to send in the complete document again.).

For instance, suppose I have the following schema:

 <fields>
 <field name="id" type="string" indexed="true" stored="true" required="true" />
 <field name="name" type="text_general" indexed="true" stored="true"/>
 <field name="price" type="float" indexed="true" stored="true" />
 <field name="description" type="text_general" indexed="true" stored="true" />
 <field name="_version_" type="long" indexed="true" stored="true"/>
 </fields>


And, suppose I have a single document with ID=1 in this index that is returned with a *:* query.

In Solr 4, you can update a single field (e..g, the price of this single document) like this:

curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json'-d '[{"id":"1","price":{"set":100}}]'

In Solr 4, this will update only the price of the document with ID=1 and leave the remaining fields intact (under the hood - Solr 4 saves the other fields, deletes the document, and re-indexes the entire document with all the fields - not just the id and price).  In Solr 3, on the other hand, it will delete the document with ID=1 and add another entry with only two fields set (ID=1 and PRICE=100) - that is, the remaining fields  (e.g., name and description) that were previously set when first indexed get deleted.

The problem is that when using Solrpy with Solr 4.0-BETA and using the add function (instead of calling curl as shown above), it results in the Solr 3 behavior (even though I'm using Solr 4).  In other words, if doing add(id=1, price=100),  it doesn't update just the price field.  Rather, it over-writes the document with only the ID and PRICE fields set (the other fields like name and description are deleted).

Does anyone have any suggestions on how to do these kinds of partial updates available in Solr 4 using Solrpy?

Thanks.



Joel Nothman

unread,
Oct 27, 2012, 9:18:41 PM10/27/12
to sol...@googlegroups.com
Hi Aron,

I'd like this feature too. It is not currently available in Solrpy, but I'd be happy to implement it -- at least in a solrpy fork -- if:
  • we can be assured that the 4.0-BETA interfaces are stable enough.
  • we can agree on how we would like it to look from a Python perspective.
On the first point:

The feature is at present poorly documented. At the moment, Solrpy only supports update, etc. instructions in XML, for which the syntax is like:

<field name="price" update="set">300</field>

I'm not really a big fan of this syntax, or the JSON, because it makes things very verbose if you want to add a single field to thousands of documents. I also don't understand why the distinction between an 'insert' and 'update' operation is made implicitly by providing information on fields.

I also don't like the JSON syntax because it would appear in [{"id":"1","price":{"set":100}}]
As if you could simultaneously set, add and inc a field by writing [{"id":"1","price":{"set":100, "add":10, "inc": 1}}]. Nonsense! Semantically, we're trying to say [{"id":"1","price":{"op": "set", "val": 100}}], but that's also horrendously verbose.

On the second point:

Obviously, conn.add(id=1, price=100) doesn't provide sufficient information for solrpy to send this as a 'set' operation. There are a few ways we could make this look:

  1. conn.add(id=1, price__set=100)
  2. conn.add(id=1, price={'set': 100})
  3. conn.add(id=1, __set={price: 100})
  4. conn.updater(price='set').add(id=1, price=100)
The first is Django-inspired, and so will be familiar to many Python devs. The second is close to the JSON syntax, meaning that if solrpy moves to JSON update requests, it'll be trivial to pass it on to the server. The third has the advantage of being succinct when there are multiple fields being updated, because you can say __set={price: 100, name: "Better name", description: "Better description"}, etc.

The last is my favourite as a programmatic interface, because it is clear semantically and robust to a change in the Solr syntax, especially if better accommodating batch updates.

Thoughts?

Cheers,

- Joel

Joel Nothman

unread,
Oct 27, 2012, 10:57:26 PM10/27/12
to sol...@googlegroups.com
On Sun, 28 Oct 2012 12:18:41 +1100, Joel Nothman <joel.n...@gmail.com>
wrote:

> 4. conn.updater(price='set').add(id=1, price=100)

I've implemented this approach in:
http://code.google.com/r/joelnothman-solrpy/source/list?name=features%2Fpartialupdate

~J

Artashes Aghajanyan

unread,
Jul 30, 2013, 2:28:46 PM7/30/13
to sol...@googlegroups.com, jnot...@student.usyd.edu.au
Reply all
Reply to author
Forward
0 new messages