Inserting the value in Collection schema in MongoDB from different data frames using python.

18 views

Skip to first unread message

Aman Sinha

unread,

Jul 23, 2017, 5:36:58 PM7/23/17

to mongodb-user

I have the three data frame and I want to select the subset of each data frame based from the specific value in the column.

data1 = pd.read_csv('path_of_the_file', sep=",") data2 = pd.read_csv('path_of_the_file', sep=",") data3 = pd.read_csv('path_of_the_file', sep=",")

there is variable named "MAP" in all the data frame's . I want to create subset from all of the data frame like: From data1 val1 val2 val3 MAP 12 78 45 101010101 13 71 76 101010101From data2 val4 val5 val6 MAP 12 7190 89 101010101 13 2353 78 101010101 12 0 65 101010101From data3 val7 val8 val9 MAP 1 78 0 101010101I want to insert this values of each row all together in my mongo db Scheme. This is what i have tried `

uniqueVar_detail = data2.MAP.unique()
for i in range(0,len(uniqueVar_detail)):
    data1= data1[data1['MAP'] == uniqueVar_detail[i]]
    data2 = data2[data2['MAP'] == uniqueVar_detail[i]]
    data3 = data3[data3['MAP'] == uniqueVar_detail[i]]
    db.EVR.insert_one({"MAP_MESSAGE_ID":str(data1.MAP),
                       "data1" : [{
                         "val1":str (data1.val1),
                          "val2":str(data1.val2,
                          "val3" :str (data1.val3)
                        }],
                         "data2":[{
                          "val4" : str(data2.val4),
                           "val5": str(data2.val5),
                           "val6": str(data2.val6)
                            }],
                         "data3" :[{
                          "val7" : str(data3.val7),
                          "val8" : str(data3.val8),
                          "val9" : str(data3.val9)
                        }]
                       })

While Inserting I am also inserting the index of row. Like inserting in "MAP" variable 960 101010101 , 960 is also inserted. Any suggestion or some changes I need to do with the code?

Wan Bachtiar

unread,

Aug 6, 2017, 9:32:15 PM8/6/17

to mongodb-user

I want to insert this values of each row all together in my mongo db Scheme.

Hi Aman,

Please consider how you’re going to query or retrieve these data later from MongoDB.
I would strongly recommend to review Data Modeling Introduction to see more examples of how documents are modelled.
For example, consider if you have many rows of MAP 101010101 stored into a single document

Any suggestion or some changes I need to do with the code?

uniqueVar_detail = data2.MAP.unique()

I’d like to point out that you’re assuming data2 will have all of the unique MAP id from data1 and data3.

In the code below:

data1 = data1[data1['MAP'] == uniqueVar_detail[i]]
data2 = data2[data2['MAP'] == uniqueVar_detail[i]]
data3 = data3[data3['MAP'] == uniqueVar_detail[i]]

You’re re-assigning the variable data1, data2, and data3 which affects the second iteration of the loop.

str(data1.MAP)

The line above is actually converting an object of pandas Series into a string. i.e.

0    1010\n1    1010\nName: map, dtype: int64

“data1” : [{“val1”:str (data1.val1), …

Note that depending on your use case, having an array of subdocuments may create difficulty in querying the data later on. Especially if you’re interested in saving the index of the variable MAP.

Given your provided data example, and without drastic changes to your schema, see an example code below to insert CSV data into MongoDB:

# Add your own error handling as appropriate
import json 

for unique_id in data2.MAP.unique():
    d1 = data1[data1['MAP'] == unique_id]
    d2 = data2[data2['MAP'] == unique_id]
    d3 = data3[data3['MAP'] == unique_id]

    d1_json = json.loads(d1.to_json(orient="index"))
    subdoc1 = {}
    for ix, value in d1_json.iteritems():
        subdoc1[str(ix)] = value

    d2_json = json.loads(d2.to_json(orient="index"))
    subdoc2 = {}
    for ix, value in d2_json.iteritems():
        subdoc2[str(ix)] = value 

    d3_json = json.loads(d3.to_json(orient="index"))
    subdoc3 = {}
    for ix, value in d3_json.iteritems():
        subdoc3[str(ix)] = value 

    db.collectionName.insert_one({"map_message_id": unique_id, 
           "data1": subdoc1, 
           "data2": subdoc2, 
           "data3": subdoc3 })

The content of document MAP 101010101 would look like below :

{
  "_id": ObjectId("..."),
  "map_message_id": 101010101,
  "data1": {
    "1": {"map": 101010101, "val3": 76, "val2": 71, "val1": 13 },
    "0": {"map": 101010101, "val3": 45, "val2": 78, "val1": 12 }
  },
  "data3": {
    "0": {"map": 101010101, "val9": 9, "val8": 78, "val7": 1 }
  },
  "data2": {
    "1": {"map": 101010101, "val6": 78, "val5": 2353, "val4": 13 },
    "0": {"map": 101010101, "val6": 89, "val5": 7190, "val4": 12 },
    "2": {"map": 101010101, "val6": 65, "val5": 0, "val4": 12 }
  }
}