View update daemon not working with updateInterval

216 views
Skip to first unread message

Lasse Schou

unread,
May 2, 2013, 6:05:10 AM5/2/13
to couc...@googlegroups.com
I'm testing Couchbase Server 2.0 for a use case that heavily depends on the views, and in particular the freshness of the views. I've been experimenting with the updateInterval and updateMinChanges parameters. It wasn't clear from the docs whether ANY of the two criteria had to be met before the view update daemon would trigger an update, or BOTH of the criteria. Couchbase replied that the update daemon triggers updates when the first of the two criteria is met (every X seconds OR after Y document changes).

However that's not the behavior I see. I can make the updateMinChanges work fine, but the updateInterval parameter doesn't seem to have any effect. The views aren't updated automatically. 

Have you seen this before? And do you know how to check the status of the daemon, and perhaps the logs?

Thanks,

Lasse Schou

Aliaksey Kandratsenka

unread,
May 2, 2013, 2:17:10 PM5/2/13
to couc...@googlegroups.com
Looks like there was misunderstanding.

Here's how it works internally.

Every updateInterval milliseconds it checks if index file is more than updateMinChanges behind .couch files (which is itself behind in-memory source of truth, potentially for tens of seconds). And true, it triggers view update.

Sorry for confusion. Hope that helps.

Lasse Schou

unread,
May 2, 2013, 4:15:35 PM5/2/13
to couc...@googlegroups.com
Ok, this is great news, because that's exactly the behavior I've seen, as opposed to the information I got from Perry Krug. He said:

"To answer your question, those two metrics trigger an automatic index update on whichever is met first. So if you have it set to 5 seconds and only one document is changed, it will happen on the 5 seconds. If 5000 changes happen before that 5 seconds is up, it will also be triggered."

What Perry explained seemed more intuitive, but not what I saw when testing.

Would be great if this could be added to the documentation.


2013/5/2 Aliaksey Kandratsenka <alkond...@gmail.com>

--
You received this message because you are subscribed to a topic in the Google Groups "Couchbase" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/couchbase/YSrft42xs-M/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to couchbase+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Bibhas B

unread,
Mar 17, 2014, 9:18:04 AM3/17/14
to couc...@googlegroups.com
FYI, the 2.5 documentation seems to have been fixed. It correctly describes the updateInterval parameter.

Lasse Schou

unread,
Mar 17, 2014, 9:23:52 AM3/17/14
to couc...@googlegroups.com
Cool, thanks.


2014-03-17 14:18 GMT+01:00 Bibhas B <bib...@gmail.com>:
FYI, the 2.5 documentation seems to have been fixed. It correctly describes the updateInterval parameter.

--
You received this message because you are subscribed to the Google Groups "Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email to couchbase+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Fill

unread,
Mar 27, 2014, 5:36:45 PM3/27/14
to couc...@googlegroups.com
Lasse,

I was interested in what you came up with here.  We are in a situation where we really need to be able to read the updated data from the views almost instantly and are trying to figure out best practices.  I posted more details over on the Couchbase site (http://www.couchbase.com/communities/q-and-a/getting-most-updatedreliable-data-back-view) on some options we are considering - but really curious to see how you handled it.

Any info you (or anyone) can provide would be greatly appreciated.

Thanks,

Jason

Lasse Schou

unread,
Mar 28, 2014, 5:06:34 AM3/28/14
to couc...@googlegroups.com
Hi,

After reading your use case I would consider using stale=false. It's a synchronous call that takes a little while longer, but the performance hit on the cluster shouldn't be that bad. Of course this depends on how often you call the view.

You mentioned that it's important that a city is included in the view right after inserting it. This sounds like the view is requested by a user who just inserted a city. If other users requested the view, would it be a problem if they didn't see the city until a few seconds later? If not you could even consider adding the just-added city manually in the view results, if it's not there already, and then not using stale=false. Managing big data systems is a matter of embracing and designing for stale data.

I hope this helps.


--

Jason Fill

unread,
Mar 28, 2014, 10:49:14 AM3/28/14
to couc...@googlegroups.com
Lasse,

Thank you for your reply.  I think what might be the best is to call all the views with stale=false that relate to city after a city is updated or changed as part of the save process.  There will be less changes to city than there will be reads in cities.  So I would hate for the reads to take a hit each time, when we can just kind of force a rebuild on save. 

The only issue with them seeing items a few seconds later is the product is all API based, so we are not sure how the customers will utilize the API.  They might create some process where they insert a record, then the next call could be listing the records.  In that case they would get odd results b/c the item they just inserted would not be in the list yet.  

So the save process might be slightly longer in this setup, but I think it will be ok since the reads are really what needs to be super fast since there will much more of those.

One quick followup.  From your experience (or anyone out there seeing this), about how long does it take a simple view with 2 million records to run when specifying stale=false?  Are we talking 5 secs, 10+ secs, etc.  Trying to get a feel for how bad of a high things will take once we start getting into a larger number of records.

Again, thanks for the time!

Lasse Schou

unread,
Mar 28, 2014, 10:59:06 AM3/28/14
to couc...@googlegroups.com
Hi,

First of all, I would design my API to be CQRS, meaning that you can write and read, but not both at the same time. Once you write something to the API, there's no guarantee that the items will be available if you read from the API a few ms later. If you document that this is the way your API works, your clients should be able to deal with it. If you know the CAP theorem you can only choose two out of the three exhibited behaviors: consistency, availability and partition tolerance. When dealing with big data, you sacrifice consistency and get eventual consistency instead. 

To answer your question the update time is not dependent on the total elements in your bucket, only the number of new elements since last view update. This is because Couchbase Views use incremental map/reduce in their views (unlike Hadoop).

I haven't seen latencies over 1-2 seconds with stale=true, but I would still consider it an unfortunate design decision in your API.

Lasse
Reply all
Reply to author
Forward
0 new messages