Distinct, unique count of users with map reduce

1,305 views
Skip to first unread message

Thomas

unread,
Dec 27, 2013, 11:05:48 AM12/27/13
to couc...@googlegroups.com
Hi,

I was googling around for a solution of a map/reduce where I will be able to perform a distinct/unique count of user ids, but I wasn't able to find a concrete solution/answer to this topic. My case is to count the unique users per day for example as well as other criteria.

I have for example the following JSON documents of events:


{"user":"user1", "color":"blue"}
{"user":"user1", "color":"blue"}
{"user":"user1", "color":"red"}
{"user":"user2", "color":"blue"}


And with my map/reduce view I want to do the following 

* number of distinct users per color

{ color: blue, count: 2}
{ color: red, count: 1}

in SQL terms 

select color, count(distinct user) as users from test group by color

Thanks




Chad Kouse

unread,
Dec 27, 2013, 11:27:22 AM12/27/13
to couc...@googlegroups.com, couc...@googlegroups.com
You would need to either start with unique user id's or reduce this data twice (not sure if couchbase let's you create essentially a "subview" or not)

But basically you'd need to reduce this data to have only 1 row per user/color combination (grouping level of 2 with a key of [user, color]) and then reduce it again with a simple built in reducer _count using the color as the key. 

--chad


--
You received this message because you are subscribed to the Google Groups "Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email to couchbase+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chad Kouse

unread,
Dec 27, 2013, 11:28:38 AM12/27/13
to couc...@googlegroups.com, couc...@googlegroups.com
(Sorry in my first sentence I meant start with unique user/ color combinations)
--chad


On Fri, Dec 27, 2013 at 11:05 AM, Thomas <thomas...@gmail.com> wrote:

Gerald

unread,
Dec 28, 2013, 10:40:36 AM12/28/13
to couc...@googlegroups.com
FYI,

Your SQL statement also works in N1QL:


select color, count(distinct user) as users from test group by color


Thomas

unread,
Dec 30, 2013, 4:32:06 AM12/30/13
to couc...@googlegroups.com
Hi Gerald,

Yes indeed I have already tested it with success, but I find difficulties to implemented with a Map/Reduce :S is it possible, I cannot find a simple way, or to be able to do it with two map/reduce jobs one for distinct users and one for counting results.

If i dare to ask how it is implemented in N1QL?

Thanks for your time again
Thomas

Gerald Sangudi

unread,
Dec 30, 2013, 2:51:01 PM12/30/13
to couc...@googlegroups.com
Yes, you can do two passes in map-reduce. One pass to do grouping / distinct, and then another pass to do the counting.

The N1QL implementation doesn't map directly to map-reduce. But the general approach is to have a grouping phase and an aggregation phase.


--
You received this message because you are subscribed to a topic in the Google Groups "Couchbase" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/couchbase/ujEZzp1XXlk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to couchbase+...@googlegroups.com.

Thomas

unread,
Dec 31, 2013, 2:29:57 AM12/31/13
to couc...@googlegroups.com
Hi Gerald,

Thx again

How it is possible to do two map/reduces in CouchBase? Can you combine View results like the data of the first map/reduce to be the input on the second map/reduce? I run the Community edition, is this a feature of the Enterprise edition maybe? 

T.

On Friday, December 27, 2013 6:05:48 PM UTC+2, Thomas wrote:

Thomas

unread,
Dec 31, 2013, 7:02:27 AM12/31/13
to couc...@googlegroups.com
Hi and sorry for the spam, I have also found this post which says that it is not possible to do a second map/reduce on the results on a first map/reduce?


Any ideas of how to implement this

Thanks

Gerald Sangudi

unread,
Dec 31, 2013, 1:11:18 PM12/31/13
to couc...@googlegroups.com
Thomas,

It seems that you have two choices:

- N1QL

- write the first map-reduce view, query it and store the results in documents, then write a second map-reduce view


Chad Kouse

unread,
Dec 31, 2013, 4:17:17 PM12/31/13
to couc...@googlegroups.com
Or maybe you can modify your original data set so it doesn’t have those duplicates? Not sure that’s possible.

-- chad
> > To unsubscribe from this group and all its topics, send an email to couchbase+...@googlegroups.com (mailto:couchbase%2Bunsu...@googlegroups.com).
> > For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Couchbase" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to couchbase+...@googlegroups.com (mailto:couchbase+...@googlegroups.com).

Matt Ingenthron

unread,
Dec 31, 2013, 4:55:06 PM12/31/13
to couc...@googlegroups.com
Hi Thomas,

The idea here is that you'd have two or more views and you'd implement the join or aggregation logic at the client side.  

At some level or another that is, in effect, what N1QL would be doing for you.  With current GA software, you'll need to implement some of that logic at the client side.  It's not too bad.

Matt

--
You received this message because you are subscribed to the Google Groups "Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email to couchbase+...@googlegroups.com.


-- 
Matt Ingenthron
Couchbase, Inc.

Thomas

unread,
Jan 2, 2014, 3:42:51 AM1/2/14
to couc...@googlegroups.com
Thank you for your replies and I wish you a Happy new year :) 

Currently N1QL seems a better choice for me, and I will put it to the test

T.


On Friday, 27 December 2013 18:05:48 UTC+2, Thomas wrote:
Reply all
Reply to author
Forward
0 new messages