How to query first few rows where sum of specific column is greater or equal than C

130 views

Skip to first unread message

PAN, Myautsai

unread,

Aug 7, 2016, 10:40:33 PM8/7/16

to mongodb-user

Hi there,

I've been searching and think about this question in a few days, but I still can't find the answer. Then I think maybe I should ask in the mail list. My question is as follows:

Suppose we have a mongodb collection with 2 columns: id, c

1,2

2,6

3,1

...

Now I would like to select first few rows where sum of column c is greater or equal than C

In the above case,

if C=1, return first 1 row.

if C=8, return first 2 rows.

if C=9, return first 3 rows.

```python

def find_the_correct_ending_index():

i = 0

sum_count = 0

while sum_count < C and i < len(rows):

sum_count += rows[i].c

i += 1

return i

```

It looks impossible to such query in mongodb for me. I'd appreciate if anybody have any idea.

NOTE: I also asked this question in stackoverflow: http://stackoverflow.com/questions/38807271/mongodb-query-first-few-rows-where-sum-of-specific-column-is-greater-or-equal-t

Kevin Adistambha

unread,

Aug 15, 2016, 2:36:41 AM8/15/16

to mongodb-user

Hi,

Suppose we have a mongodb collection with 2 columns: id, c
1,2
2,6
3,1

Now I would like to select first few rows where sum of column c is greater or equal than C

It looks impossible to such query in mongodb for me. I’d appreciate if anybody have any idea.

What is the use case that you are trying to solve?

My understanding is that your requirement implies that each document is not a separate entity, but depends upon all other documents with a lower id. Is this correct?

If this is correct, then I am not aware if there is any built-in feature in MongoDB that lets you create this kind of relationship between rows like you described.

Having said that, if applicable, you could potentially add an additional field (e.g. sum) that pre-calculates the cumulative sum of the column c of the current document and all previous documents. Using your example:

> db.test.insert({_id: 1, c: 2, sum: 2})
> db.test.insert({_id: 2, c: 6, sum: 8})
> db.test.insert({_id: 3, c: 1, sum: 9})

and you could satisfy one of your example queries like (in the mongo shell):

> var maxid = db.test.find({sum: 8}).next()._id  // find the maximum _id having the required sum
> db.test.find({_id: {$lte: maxid}})
{"_id": 1, "c": 2, "sum": 2}
{"_id": 2, "c": 6, "sum": 8}

or in one line:

> db.test.find({ _id: { $lte: db.test.find({sum:8}).next()._id } })
{"_id": 1, "c": 2, "sum": 2}
{"_id": 2, "c": 6, "sum": 8}

Please note that for this method to be performant, documents should only be added to the end of the collection (i.e. having a higher _id compared to the last inserted document). Otherwise, the sum field must be recalculated for the whole collection.