MongoDB bulk write memory cost

106 views
Skip to first unread message

sharmasi...@gmail.com

unread,
Jul 29, 2019, 11:43:29 AM7/29/19
to golang-nuts
I'm using the official golang driver for mongo. What I'm doing is that I'm reading a csv file upto a certain lines, parsing the data and then inserting the data in the DB. Let's say I have 10K contacts and the readLimit is 1000, then I read csv till 1000 lines and then do a bulk write with ordered=false. I assumed this would take a constant memory as the data size for each bulk write is the same. However that is not the case. The memory consumed by the bulk write increases significantly with the size of the CSV. Can anyone explain this?

Here's some data I collected

batchSize = 1000


10k - 14 MB

20K - 30MB

30K - 59MB

40K - 137 MB

50K -241 MB

Robert Engels

unread,
Jul 29, 2019, 1:01:23 PM7/29/19
to sharmasi...@gmail.com, golang-nuts
Did you try testing with the MongoDB driver code removed - that is, just the CSV processing ?

Then maybe you can post the code as a simplified test.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/027d3785-099c-4acf-8373-3f9ac948983b%40googlegroups.com.



Jim Robinson

unread,
Jul 30, 2019, 12:46:58 AM7/30/19
to golang-nuts
How are you determining that the memory is consumed by the bulk write?

Based on my own experience I'd have expressed doubts that it's an mgo driver leak.

I've got the mgo.v2 drivers in use for a project and it's been running a daemon process for about 2 months now.  It periodically spins up and writes bulk records into mongo.  It's capped at 1k updates per call to Run, but runs off an input queue, so it might run a number of 1k updates in a row. When I dump the memory profile it shows effectively flat memory usage.

The relevant section of my code is:

        // send active values to mongodb for update
        bulk := session.DB(dao.database).C(dao.collection).Bulk()
        bulk.Unordered()
        for _, v := range apply {
                field := fmt.Sprintf("%s.%s", v.Register, v.Monitor)
                bulk.Update(bson.M{"_id": v.Path}, bson.M{"$push": bson.M{field: v}})
        }

        _, err := bulk.Run()

        // send any errors that we received and close/nil the channels for those
        // resources
        if err != nil {
                bulkErr, ok := err.(*mgo.BulkError)
                if ok {
                        for _, caseErr := range bulkErr.Cases() {

I've attached a pprof snapshot to show what the memory profile looks like after it's been running for ~60 days.

profile001.png
Reply all
Reply to author
Forward
0 new messages