MongoDB NodeJS aggregations - stops finding data after a while

222 views
Skip to first unread message

Rodolfo Perottoni

unread,
Mar 23, 2018, 6:16:59 AM3/23/18
to mongodb-user
Bit of a tricky issue I'm facing and can't find any clue of what might be going on.
My backend is made of an Express application exposing many endpoints that interact with the MongoDB nodeJS driver.
I have set up a cron that calls one of my endpoints every 15 minutes, and in this specific endpoints I have a huge aggregation pipeline being executed.

The issue that I'm facing is that after 4-5 hours this query will simply stop returning documents.

Important considerations:
- My crontab is running without any problem. This endpoint is sending a message to a private slack channel and I can see it popping up every 15 minutes like I wanted.
- I know that my aggregation stopped working because this message that I receive in Slack tells me how many documents were found by that query.
- I also log this aggregation's result to a separate collection, so the connection with my DB is not a problem.

CATCH 1: I can copy this same aggregation command (with the Express app still running) and put it into Robo 3T/Mongo Shell, execute it and STILL get results out of it. It's as if my application simply can't see that data in the database.
CATCH 2: As soon as I restart my express application this aggregation will find all the data that has been "stuck" since the past few hours. This makes me think that the MongoDB node driver might be the culprit.

I've found a guy with the same issue on StackOverflow but no one has answered that question yet.
The only difference between this issue and mine is that I don't use Mongoose.

Thanks everyone.

Kevin Adistambha

unread,
Apr 3, 2018, 1:29:59 AM4/3/18
to mongodb-user

Hi

Could you please post more information about your issue:

I know that my aggregation stopped working because this message that I receive in Slack tells me how many documents were found by that query.

My understanding is your end point continues to send a message to Slack every 15 minutes but the count of documents returned by your aggregation query is 0 after 4-5 hours. Is this correct?

I also log this aggregation’s result to a separate collection, so the connection with my DB is not a problem.

How are you logging the result into a separate collection? Are you using the $out operator?

I can copy this same aggregation command (with the Express app still running) and put it into Robo 3T/Mongo Shell, execute it and STILL get results out of it.

Please post the aggregation query in question, along with some example documents. How long does the aggregation usually run for? Does it take more than 15 minutes at some times?

As soon as I restart my express application this aggregation will find all the data that has been “stuck” since the past few hours

Please post a small example of the endpoint code. It would be very helpful if other people can reproduce the issue you’re seeing.

Could you also post the node driver version you’re using?

Are you seeing the same issue using a local deployment as well, or is it only when you’re connecting to Atlas?

Best regards
Kevin

Rodolfo Perottoni

unread,
Apr 3, 2018, 10:34:56 PM4/3/18
to mongodb-user
Pipeline attached to this message
pipeline.txt

Rodolfo Perottoni

unread,
Apr 3, 2018, 10:34:56 PM4/3/18
to mongodb-user
Hi Kevin. Thanks for coming back to me.

My understanding is your end point continues to send a message to Slack every 15 minutes but the count of documents returned by your aggregation query is 0 after 4-5 hours. Is this correct?
 
Correct. 

How are you logging the result into a separate collection? Are you using the $out operator?
 
No, I'm manually logging it within my application's code. E.g. calling *insertOne* in a separate logs collection.

Please post the aggregation query in question, along with some example documents. How long does the aggregation usually run for? Does it take more than 15 minutes at some times?

You can find the aggregation attached to this reply. Unfortunately I can't disclose the structure of my database objects.
And no, it never took more than 300-500ms to run.

Please post a small example of the endpoint code. It would be very helpful if other people can reproduce the issue you’re seeing.

While trying to fix this issue this was one of my first assumptions. So I moved my whole query to an empty Express route just like the ones you see on their docs to test whether it had something to do with my protected routes or business logic. This is what it would like (I can't post the original code), e.g.:

app.post('/route', async (req, res) => { const results = await collection.aggregate(<pipeline>).toArray(); <my business logic that uses the "results" object> });

I verified this issue happens on both the Node version 2.2 and 3.0;
I tried upgrading my Atlas server to M10 to guarantee this was not a free tier limitation and the issue was still there after 4-5hs of server uptime.

On a side note, I already "translated" my aggregation pipeline into individual queries (had to deliver this functionality ASAP) and everything is working fine. 
This is proof enough to me that the problem lives in the aggregation framework.
Let me know if there's any other information that might help you on this.

Kevin Adistambha

unread,
May 7, 2018, 2:16:51 AM5/7/18
to mongodb-user

Hi Rodolfo

Apologies for the delay in getting back to you, but nothing seems to jump at me. However, the fact that it seems to freeze after a certain amount of time while a manually run query would return something seems to suggest that there is unlikely that anything is wrong with the query.

If you’re still having this issue and would like to dig deeper, I would suggest to try:

  • Try running the code by pointing the application toward a local MongoDB instance instead of Atlas, and see if it also stops after 4-5 hours. If it does, then there could be something in the interaction of Atlas and the Javascript code.
  • Try running the periodic querying using another driver, e.g. the Python driver to check if it also stops after 4-5 hours. If it does, then the node driver is not the issue.
  • After the application stops receiving results in 4-5 hours, does an application restart immediately resume the stream of results? If it does, then it could be some timeout, or some limitation being hit.
  • Any other method you could think of to narrow down the issue will be helpful.

Best regards
Kevin

miquel ferrando

unread,
Aug 16, 2018, 3:02:11 AM8/16/18
to mongodb-user
Hi Rodolfo

Did you found how to solve that problem? I'm having the same issue but in my case is with a find not an aggregate.

I also have an API to call my MongoDB (I use Mongoose) and it stops responding even with an empty find.

In my case i saw that the issue ocurs to a certain collections while the other ones still working and after restarting the API all starts working again.

I think that the issue might be related to the driver.

Regards Miquel.

Rodolfo Perottoni

unread,
Aug 16, 2018, 7:04:41 PM8/16/18
to mongodb-user
Hi Miquel. 
I haven't found a solution at all after 1 month of investigation and frustration. I also had a chat with MongoDB through the phone and all they did was push me an enterprise plan so I could get "enterprise-grade" support.
We (business) decided not to spend more time on it and gave up on aggregations - all of our business logic now runs with find queries. No issues since then. 


Wan Bachtiar

unread,
Aug 19, 2018, 9:49:29 PM8/19/18
to mongodb-user

app.post(‘/route’, async (req, res) => { const results = await collection.aggregate().toArray(); });

Hi Rodolfo,

Do you wrap the await aggregation in a try/catch ?
Just in case there’s any error with your code before the return of the endpoint response.

Also do you verify whether session always has a value ?
Do you check whether the conditional $filter for session (dynamic periodic date range) always return a value ?

had a chat with MongoDB through the phone and all they did was push me an enterprise plan so I could get “enterprise-grade” support.

I would think that is a reasonable approach, as you said that you can’t disclose some of the proprietary code and environment. Signing an official support contract would allow MongoDB support to look further into your use case with some sort of NDA in place.

Best regards,
Wan.

Rodolfo Perottoni

unread,
Aug 19, 2018, 10:17:06 PM8/19/18
to mongodb-user
Hi Wan.
 

Do you wrap the await aggregation in a try/catch ?
Just in case there’s any error with your code before the return of the endpoint response.


Yep, the code had way more than just a aggregate().toArray(). I do have every route on our API encapsulated in an error handling library and never had any errors being thrown by the aggregation.

Also do you verify whether session always has a value ?
Do you check whether the conditional $filter for session (dynamic periodic date range) always return a value ?

I replicated this issue on a new Atlas environment during my tests and my query kept returning values for only 2 to 3 hours. On my first message in this discussion I've outlined how I could still execute the query on the Mongo Shell and get results, hence why I'm pretty sure there's something wrong with the NodeJS driver.


This issue has drove me nuts. Every time I had to explain it to someone the outcome is always the same - doubtful or skeptical questions. And I get it, it sounds like either I'm lying or just a beginner in this ecosystem. I'd react the same way if someone told me such an issue exists.
After more than 20 hours working and debugging to ensure this is a NodeJS driver issue, I'm gonna put a nail to this coffin by withdrawing myself from this post and leaving it for posterity. I wish I had an enterprise plan to get someone from MongoDB to look at this, but at this stage in our business this is simply out of equation.

Thanks everyone for your input. Over and out.

miquel ferrando

unread,
Aug 20, 2018, 6:06:26 PM8/20/18
to mongodb-user
Hi Rodolfo.

In our case that happends with finds :(

Wan Bachtiar

unread,
Aug 21, 2018, 2:24:58 AM8/21/18
to mongodb-user

Every time I had to explain it to someone the outcome is always the same - doubtful or skeptical questions. And I get it, it sounds like either I’m lying or just a beginner in this ecosystem. I’d react the same way if someone told me such an issue exists.

Hi Rodolfo,

This problem doesn’t appear to be trivially reproducible and without further details it’s likely there won’t be any progress. It would be great if you could provide a minimal, complete, and reproducible code example.

If this is a bug with MongoDB Node.JS driver, it would be preferable if a bug report could come out from this discussion and fix it for others too.

In our case that happens with finds :(

Miquel,
To help focus on the details of your issue, please start a new discussion with relevant details including:

  • Mongoose version
  • MongoDB Node.JS driver version
  • A minimal, complete code that would be able to test/reproduce the issue
  • MongoDB server version
  • Any mongod log events around the time of the find not returning any data

Regards,
Wan.

miquel ferrando

unread,
Aug 21, 2018, 4:11:59 AM8/21/18
to mongodb-user
Hi Wan,
Those are the versions of my envirorment.

Mongoose:
I tried several versions of mongoose and it happends with all
mongoose 5.0.9 wich use (mongodb 3.0.4)
mongoose 5.2.7 & 5.2.8 (mongodb 3.1.1)

MongoDB server 3.6.3

Node.JS version 9.10.1

Server Ubuntu 16.04.4 LTS

About my scenario, it happends with several collections but i will descrive one.
We have an API (Nose.JS & Mongoose) and some services (Node.JS) that calls to our API. 
In one of that services we have two process, one that insert documents with an ReportRequestID that an Amazon API provides us and a ReportReadyDate (about 1 minute later the date of the insert), and another proces (like an infinite loop) that every minute, queries the DB asking for documents with the ReportReadyDate < now, when it's ready, we procces the report and delete the document, the query asking for the documents is the one that fails, and the documents in the collection starts to stack (we can see that througt Robo 3T and the mongoShell), and even if we call our API from another service, even with an empty query, doesn't respond with any of that documents until we restart the API.

I'm prety sure that the query is not the problem because it happends also with other collections with much simpler queries (like find id == queryId) but this is the case that I have studied the most because is one of the most critical. 

Our query it's constructed depending on a GET but in that case the resulting query it's like that:

Mongoose Log:
amz_reports.find((
   {
       $and: [
           { reportId: { $exists: false } },
           {
               $or: [
                   { readyDate: { $exists: false } },
                   {
                       readyDate: {
                           $lt: new Date('Tue, 21 Aug 2018 07:53:16 GMT')
                       }
                   }
               ]
           },
           {
               $or: [
                   { lastChecked: { $exists: false } },
                   {
                       lastChecked: {
                           $lt: new Date('Mon, 20 Aug 2018 07:53:16 GMT')
                       }
                   }
               ]
           }
       ]
   },
   { sort: { submitDate: 1 }, limit: 1, fields: {} }
)


And the code that calls it is like 

amz_reports.find(query, project).limit(1).sort({ submitDate: 1 }).lean().exec((err, results) => {
   if (err) Promise.reject(err);
   else Promise.resolve(results);
});

Thanks for your support Wan.

Regards, Miquel.

Daniele Tassone

unread,
Aug 22, 2018, 9:34:07 AM8/22/18
to mongodb-user
Rodolfo,

can you post the MongoDB connection string? (without use/pass).

D.

Wan Bachtiar

unread,
Aug 27, 2018, 2:21:22 AM8/27/18
to mongodb-user

Hi Miquel,

Could you please start a new discussion thread (new post) to avoid mixing with Rodolfo question about aggregation.

I’ve tried your setup briefly, but as mentioned previously this is not trivial to reproduce.
In the new discussion post, please provide a minimal and complete code that would be able to test the issue.

Regards,
Wan.

Reply all
Reply to author
Forward
0 new messages