//This is hanging and taking forever, over 10 minutes now
There are 21m documents in this collection. And, for a given ts=1350585328, there are 26 records. The goal here is to count how many unique cookies are in the matched records. So it should find all the match documents, 26 of them. And then drop them in buckets based on cookie, and then count the buckets. I would presume that it would do the find, based on the query, FIRST, and then do a map/reduce with those returned values. If that's the case, the size of the collection shouldn't matter at all. It did the count() in the second line in a blink of the eye.
I have this indexed on ts, and I have a three shard setup with three replica sets.
Why is this taking so long? For prod, I'm going to open up ts to be a range, so it will have many many more matched documents than just 26.
> //This is hanging and taking forever, over 10 minutes now
> There are 21m documents in this collection. And, for a given > ts=1350585328, there are 26 records. The goal here is to count how many > unique cookies are in the matched records. So it should find all the match > documents, 26 of them. And then drop them in buckets based on cookie, and > then count the buckets. I would presume that it would do the find, based > on the query, FIRST, and then do a map/reduce with those returned values. > If that's the case, the size of the collection shouldn't matter at all. > It did the count() in the second line in a blink of the eye.
> I have this indexed on ts, and I have a three shard setup with three > replica sets.
> Why is this taking so long? For prod, I'm going to open up ts to be a > range, so it will have many many more matched documents than just 26.
> //This is hanging and taking forever, over 10 minutes now
> There are 21m documents in this collection. And, for a given > ts=1350585328, there are 26 records. The goal here is to count how many > unique cookies are in the matched records. So it should find all the match > documents, 26 of them. And then drop them in buckets based on cookie, and > then count the buckets. I would presume that it would do the find, based > on the query, FIRST, and then do a map/reduce with those returned values. > If that's the case, the size of the collection shouldn't matter at all. > It did the count() in the second line in a blink of the eye.
> I have this indexed on ts, and I have a three shard setup with three > replica sets.
> Why is this taking so long? For prod, I'm going to open up ts to be a > range, so it will have many many more matched documents than just 26.
to be more clear, the query should be in the same object as "out", i.e.- {out: { replace : "garbage"}, query: {ts:1350585328}}
Your current MR command probably doesn't use the query and must scan all of the documents in the collection. The documentation admittedly doesn't do a great job illustrating the correct syntax, and I will see if we can make the docs more clear.
> //This is hanging and taking forever, over 10 minutes now
> There are 21m documents in this collection. And, for a given > ts=1350585328, there are 26 records. The goal here is to count how many > unique cookies are in the matched records. So it should find all the match > documents, 26 of them. And then drop them in buckets based on cookie, and > then count the buckets. I would presume that it would do the find, based > on the query, FIRST, and then do a map/reduce with those returned values. > If that's the case, the size of the collection shouldn't matter at all. > It did the count() in the second line in a blink of the eye.
> I have this indexed on ts, and I have a three shard setup with three > replica sets.
> Why is this taking so long? For prod, I'm going to open up ts to be a > range, so it will have many many more matched documents than just 26.