Updating a document (with a new facet value)

55 views
Skip to first unread message

Thibaut

unread,
Nov 29, 2011, 2:15:38 PM11/29/11
to zoie
An example will probably be easier:

Say for example I have a user and I want to store what schools he went
to (for facetted search)

I would use a multivalue facet field in my user document, for example
it would be called user_schools - It will store the school names.

Now i can create that document fine, but I'm not sure on what is the
proper approach on updating the facet values, say at some point I want
to add(or remove) a new school for that existing user.

Do i have to:
- find the existing doc, update the user_schools values, save the
document.
- can i just do a save request with the doc unique id and just the new
value for that facet field (since in theory if the id alreday exists
it should just update it) .... but how would that work as far as
removing a facet value ?

Any examples on updating a zoie document, in particular facet values ?

Thanks.

Lei Wang

unread,
Nov 29, 2011, 2:23:10 PM11/29/11
to zo...@googlegroups.com
for now, you have to find the doc first update with new values and save back to zoie.

Partial update has not been supported yet.

- Lei

--
You received this message because you are subscribed to the Google Groups "zoie" group.
To post to this group, send email to zo...@googlegroups.com.
To unsubscribe from this group, send email to zoie+uns...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/zoie?hl=en.


Thibaut Colar

unread,
Nov 29, 2011, 4:46:15 PM11/29/11
to zo...@googlegroups.com
Ok, that's what I'm doing but here are to more questions in regards to that:

- When i read the document via a BrowseResult and add some fields, can I save that updated document directly,or do I have to copy all the fields again into a ZoieIndexable object to be able to save it (and make a consumable event to be consumed from that)?

- It seem if the doc was just created before I try to update it, the search might fail to fond it, unless i do indexingSystem.refreshDiskReader() and indexingSystem.flushEventsToMemoryIndex() ... now of course doing it that way makes it much much slower, am I missing something ?

Thanks

Lei Wang

unread,
Nov 29, 2011, 5:04:05 PM11/29/11
to zo...@googlegroups.com
BrowseResult is only for searching, you have to create ZoieIndexable objects for indexing.

and the indexing is asynchronous, take awhile for it showing up in your search results. if you want synchronous indexing, call syncWithVersion after consuming your data.

- Lei

Thibaut

unread,
Dec 5, 2011, 6:15:04 PM12/5/11
to zoie
Sorry if that's a silly question, but I'd like to make sure of what is
the proper way to use "version" in particular in relation to
syncWithVersion() and indexing documents.

- Is version meant to be updated(+1) at each document modification ?
- Is version meant to be updated only when the document "model" has
changed (ie: new fields) ?
- Does ZoieSystem.getCurrentVersion() has to do with this or not ?

In general what does version refer to, and what is the suggested way
to maintain/use it.
If sone doc explains this let me know.

Thanks.

Lei Wang

unread,
Dec 5, 2011, 7:29:16 PM12/5/11
to zo...@googlegroups.com
Versions are comparable and keep increasing strings you may have to define in your own application.
It's related only to the indexing data events. has nothing to do with the indexing data itself.

for example, you have three indexing events (d1, v1), (d2, v2), (d3, v3), you have to ensure v1<v2<v3.

And d3 can be an update event to d1 which overwrite the original d1 data.

If you syncWithVerion(v3, timeout), the call will block until event3 is indexed or time-outed.

- Lei

Thibaut Colar

unread,
Dec 5, 2011, 7:38:40 PM12/5/11
to zo...@googlegroups.com
I see, so it's just an event id basically.

Makes sense and simple enough.

Is it fine to just start the event id at zero at application start, or would it be better to try to persist that number at shutdown and continue there when restarting ?

Thanks

Lei Wang

unread,
Dec 5, 2011, 8:10:06 PM12/5/11
to zo...@googlegroups.com
you have to persist the number. you still have to make your number keep increasing, even if you restart your system.

- Lei

Thibaut

unread,
Dec 7, 2011, 7:58:37 PM12/7/11
to zoie
OK, go tall that working.

Now one more Question:
You said : "BrowseResult is only for searching, you have to create
ZoieIndexable
objects for indexing."

Is there an existing facility to create a ZoieIndexeable from a
BrowseResult, or do I have to roll my own ? (or alternatively can i
find/get a ZoieIndexeable directly by it's id rather than having to
deal with the BrowseResult).

Thanks.

Thanks.

Lei Wang

unread,
Dec 7, 2011, 8:28:22 PM12/7/11
to zo...@googlegroups.com
You have to write your own, we do not have it right now.

and you cannot get ZoieIndexable directly from it's id. You have to build it from other data sources (BrowseResult for example, if your BrowseResult have enough information).

- Lei

Thibaut

unread,
Dec 7, 2011, 8:48:38 PM12/7/11
to zoie
OK.

It's not a problem, but I'm a bit surprised, don't you guys ever have
to update a document like i want to do here ?

Or do you actually do this and rolled your own implementations whihc
just aren't n OSS part of the project (zoie/bobo)

Thanks.

Lei Wang

unread,
Dec 7, 2011, 8:54:59 PM12/7/11
to zo...@googlegroups.com
We use sensei which is build on top of bobo and zoie. and you can get original document from its id from sensei.

also, we are planing to add partial updates support in a few weeks to sensei. may be you can take a look at sensei. see if it works for your project.

- Lei

Andy

unread,
Jan 20, 2012, 3:26:53 AM1/20/12
to zoie
>
> also, we are planing to add partial updates support in a few weeks to
> sensei. may be you can take a look at sensei. see if it works for your
> project.

Is partial update included in Sensei 1.0?

Also can you explain a bit more how that feature works? My documents
have an integer field "vote_count" that holds the number of votes each
document receives. I use it to rank search results. "vote_count" is
being updated constantly. Can I use partial update to just update the
value of "vote_count" without reindexing the entire document?

Lei Wang

unread,
Jan 20, 2012, 1:34:33 PM1/20/12
to zo...@googlegroups.com
It's now in latest sensei, you can update one field by giving the data event type to "update". {"id": 123, "_type": "update", "vote_count": 321} for example.

but we have not finalized the implementation, it may change someday in the future.

- Lei

Andy

unread,
Jan 22, 2012, 3:55:16 AM1/22/12
to zoie

> It's now in latest sensei, you can update one field by giving the data
> event type to "update". {"id": 123, "_type": "update", "vote_count": 321}
> for example.

What's the syntax for updating multiple fields? Say I have another
field "tags" which is used for faceting, and I want to update both
vote_count and tags. Is it something like:

{"id": 123, "_type": "update", "vote_count": 321, "tags": "java,
python"}

And in this case the actual document itself would not be re-indexed,
correct?

> but we have not finalized the implementation, it may change someday in the
> future.

Would you recommend using partial update then? Does it offer better
performance than replacing the entire document?

Thanks

John Wang

unread,
Jan 22, 2012, 11:40:50 AM1/22/12
to zo...@googlegroups.com
Hi Andy:

     Partial updates implementation is essentially a get(uid), update the changed fields, delete and reapply.

     So performance-wise there is no benefit. 

     It is for convenience.

-John

Andy

unread,
Jan 23, 2012, 9:53:52 AM1/23/12
to zoie
Hi John,

Is there any plan to implement true partial update in Sensei where
only the specified fields are changed without re-indexing the entire
document? It'd be very useful for frequently updated fields such as
"vote_count"

IndexTank FAQ mentions "document variables can be updated without
having to re-index the whole document and they can be updated very
quickly and at a very rapid pace without affecting the index
performance." Any plan to incorporate that functionality into Sensei?

John Wang

unread,
Jan 23, 2012, 10:56:20 AM1/23/12
to zo...@googlegroups.com
Hi Andy:

     True partial updates into an inverted index (which is the persistence for sensei) is very difficult.

     The indextank developers on this mailing list can elaborate or correct this:

     What indextank has is something called variables, which lives outside of underlying index and is in memory. The idea is for things like voting, high durability requirement is not necessary, which is a good trade-off for update performance.

      We intend to incorporate similar idea in Sensei, instead of voting, we are planning to implement an activity index that lives along side the sensei index, on the node level. The call would be something like, updateActivityCount, this would then be factored into faceting and relevance calculation.

       There is no hard date for this yet, but we plan to work on this soon.

-John

chen jun

unread,
Jan 28, 2012, 9:43:44 PM1/28/12
to zo...@googlegroups.com
hi,John Wang:
     I‘m so excited to hear that Sensei will bring updateActivityCount on the node level when the first work day after Spring Festival. Although some developers implemented it outside the Sensei index( eg: use redis to store and fetch with id when sorting),but still not covinient compared to bring it to the Sensei node level. 
      Thanks to John Wang and Sensei develop team, it will help a lot!

2012/1/23 John Wang <john...@gmail.com>
Reply all
Reply to author
Forward
0 new messages