MongoDB problems and typical usage

70 views
Skip to first unread message

Luca Bruno

unread,
Mar 24, 2017, 5:51:27 PM3/24/17
to mongodb-user

i'm a beginner in MongoDB so i have a couple of question about it. First of all my code is written in PHP. My context of use is as follows:

  • I have like 80K documents stored in one collection
  • Each document has this structure ( Key can be from 1 to 20 an different from each document ):


{
"_id": "58c91c8059b0a500083dab0c",
"year": "2017",
"month": "03",
"day": "15",
"data": {
"key-1": "value-1",
"key-2": "value-2",
"key-n": "value-n" 
}
}
 

I would like to get all document with a specific key and sum their value. So i used this code:


$manager = new \MongoDB\Driver\Manager( MONGO_SERVER );

$filter = array ( 
    "year" => $_POST['time_year'],
    "month" => $_POST['time_month'],
    "day" => $_POST['time_day']
);

$options = [
    'projection' => ['_id' => 0],
];

$query = new \MongoDB\Driver\Query($filter, $options);
$rows = $manager->executeQuery('REPORT.productionbacklog', $query);

foreach ($rows as $entry) {
    [..]
}

But i'm getting this error:


Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)


So, from this problem i have this questions:

  1. With MapReduce could be possible avoid this problem?
    • I looked to some example of MapReduce with PHP but every example used MongoCode. Making some research i discovered the class MongoCode is deprecated and there's no alternative.
    • There is another way to get the same results with another way ( MongoDB\Driver )?
  2. Is PHP the best language for use MongoDB? Node.js could be a better choise?
  3. Storing a bigger numbers of files ( 80K each day ) could be a problem for MongoDB?

Any suggestions could be help. Thanks!

P.S. Sorry for my bad english.

Jeremy Mikola

unread,
Mar 25, 2017, 9:01:04 PM3/25/17
to mongod...@googlegroups.com
The MongoCode documentation states that MongoDB\BSON\Javascript is the alternative class in the newer "mongodb" extension. That class may be used any time you need to supply a Javascript BSON type (e.g. mapReduce command's map, reduce, and finalize options).

We've yet to implement a mapReduce helper in the MongoDB PHP Library (PHPLIB-53 has this slated for the upcoming 1.2.0 release); however, the upgrade guide presents an example to execute mapReduce commands using the library. That example can easily be translated to use the driver's Manager::executeCommand() method.

Based on what you shared above, unless you've uncovered a memory leak in the driver, you have a general memory consumption issue. Are you collecting the database results in memory (e.g. some global array) during iteration? It's quite possible that your result set exceeds PHP's memory limit, in which case you'll need to process them iteratively without accumulating the documents in memory.

Switching to mapReduce may indirectly solve the issue. By default, mapReduce returns its results inline (in an array field within the command's response document). Since MongoDB is limited to 16MB documents, you'd have no problem fitting that into memory; however, I expect your actual result set would exceed this. Alternatively, mapReduce can output its results to a collection, which would require another query as you're doing now.

Here a few steps I would suggest given your current approach:

  • Review how you're collecting results during iteration and examine how memory is accumulating.
  • Consider using a more restrictive projection to disregard fields in the document that you may not actually need. Currently, you're only discarding the _id field, but the data field may be quite large and contain fields that aren't relevant. If you're able to restrict your projection to only include specific fields, you should be better off.
  • Do not directly incorporate request parameters (e.g. $_POST['time_year']) into a query filter. This exposes you to request injection (see this article in the documentation). I would at least advise that you cast those values to expected types (e.g. (string) or (int)) before using them, which would prevent someone from injecting a PHP array and thus unexpected query criteria.
  • Since the year, month, and day fields are integers, I would suggest you store them as actual integers instead of strings to maximize indexing/storage efficiency.
  1. Is PHP the best language for use MongoDB? Node.js could be a better choise?
The general advise is to use what language you are most comfortable with. The various MongoDB drivers roughly share the same feature set; however, the PHP driver is single-threaded and does not have connecting pools as found in other drivers. It's connection handling is thoroughly described on this page.
 
  1. Storing a bigger numbers of files ( 80K each day ) could be a problem for MongoDB?
There is no one-size-fits-all answer, as this depends entirely on your schema, indexing, application, and deployment. There are numerous write-ups of users handling far greater amounts of data (see: this old blog post and use cases for additional references).


Message has been deleted

Luca Bruno

unread,
Mar 27, 2017, 3:36:52 AM3/27/17
to mongodb-user
Hi, thanks for the answer, I will consider your suggestions.

Luca
Reply all
Reply to author
Forward
0 new messages