Big Data Aggregation Process optimization

26 views
Skip to first unread message

Boas Enkler

unread,
May 11, 2016, 3:27:31 AM5/11/16
to mongodb-user
We have a nightly process running on one of our datasets. 

This process primarly aggregates data and creates a in memory graph representation of the data. This data is not critical when looking at changes and eventual consistenacy wouldn't be any problem. The Primary requirement is that it should be fast.

It iterates through a collection with has about 10 mio entries. the current given structure required to have a callback called for each entry.

The code curretnly looks like this:

IAsyncCursor<StoredConnectionMemorySet> cursor = await collection.FindAsync(filter.Gte(i => i.DepartureDate, minDate.Date));
              while (await cursor.MoveNextAsync())
              {
                  IEnumerable<StoredConnectionMemorySet> batch = cursor.Current;
 
                  foreach (StoredConnectionMemorySet connectionSet in batch)
                  {
                      if (connectionSet.Modified.AddDays(MaxAge) > timestamp)
                      {
                          handler(connectionSet, connectionSet.Connections.Where(x => x.IsValidConnection()));
                      }
                  }
              }

Now my question are there any performance optimiazations / setting i should consider?


Reply all
Reply to author
Forward
0 new messages