I have the three data frame and I want to select the subset of each data frame based from the specific value in the column.
data1 = pd.read_csv('path_of_the_file', sep=",")
data2 = pd.read_csv('path_of_the_file', sep=",")
data3 = pd.read_csv('path_of_the_file', sep=",")
there is variable named "MAP" in all the data frame's . I want to create subset from all of the data frame like: From data1
val1 val2 val3 MAP
12 78 45 101010101
13 71 76 101010101
From data2
val4 val5 val6 MAP
12 7190 89 101010101
13 2353 78 101010101
12 0 65 101010101
From data3
val7 val8 val9 MAP
1 78 0 101010101
I want to insert this values of each row all together in my mongo db Scheme. This is what i have tried `
uniqueVar_detail = data2.MAP.unique()
for i in range(0,len(uniqueVar_detail)):
data1= data1[data1['MAP'] == uniqueVar_detail[i]]
data2 = data2[data2['MAP'] == uniqueVar_detail[i]]
data3 = data3[data3['MAP'] == uniqueVar_detail[i]]
db.EVR.insert_one({"MAP_MESSAGE_ID":str(data1.MAP),
"data1" : [{
"val1":str (data1.val1),
"val2":str(data1.val2,
"val3" :str (data1.val3)
}],
"data2":[{
"val4" : str(data2.val4),
"val5": str(data2.val5),
"val6": str(data2.val6)
}],
"data3" :[{
"val7" : str(data3.val7),
"val8" : str(data3.val8),
"val9" : str(data3.val9)
}]
})
`
While Inserting I am also inserting the index of row. Like inserting in "MAP" variable 960 101010101
, 960 is also inserted. Any suggestion or some changes I need to do with the code?
I want to insert this values of each row all together in my mongo db Scheme.
Hi Aman,
Please consider how you’re going to query or retrieve these data later from MongoDB.
I would strongly recommend to review Data Modeling Introduction to see more examples of how documents are modelled.
For example, consider if you have many rows of MAP
101010101 stored into a single document
Any suggestion or some changes I need to do with the code?
uniqueVar_detail = data2.MAP.unique()
I’d like to point out that you’re assuming data2
will have all of the unique MAP
id from data1
and data3
.
In the code below:
data1 = data1[data1['MAP'] == uniqueVar_detail[i]]
data2 = data2[data2['MAP'] == uniqueVar_detail[i]]
data3 = data3[data3['MAP'] == uniqueVar_detail[i]]
You’re re-assigning the variable data1
, data2
, and data3
which affects the second iteration of the loop.
str(data1.MAP)
The line above is actually converting an object of pandas Series into a string. i.e.
0 1010\n1 1010\nName: map, dtype: int64
“data1” : [{“val1”:str (data1.val1), …
Note that depending on your use case, having an array of subdocuments may create difficulty in querying the data later on. Especially if you’re interested in saving the index
of the variable MAP
.
Given your provided data example, and without drastic changes to your schema, see an example code below to insert CSV data into MongoDB:
# Add your own error handling as appropriate
import json
for unique_id in data2.MAP.unique():
d1 = data1[data1['MAP'] == unique_id]
d2 = data2[data2['MAP'] == unique_id]
d3 = data3[data3['MAP'] == unique_id]
d1_json = json.loads(d1.to_json(orient="index"))
subdoc1 = {}
for ix, value in d1_json.iteritems():
subdoc1[str(ix)] = value
d2_json = json.loads(d2.to_json(orient="index"))
subdoc2 = {}
for ix, value in d2_json.iteritems():
subdoc2[str(ix)] = value
d3_json = json.loads(d3.to_json(orient="index"))
subdoc3 = {}
for ix, value in d3_json.iteritems():
subdoc3[str(ix)] = value
db.collectionName.insert_one({"map_message_id": unique_id,
"data1": subdoc1,
"data2": subdoc2,
"data3": subdoc3 })
The content of document MAP 101010101
would look like below :
{
"_id": ObjectId("..."),
"map_message_id": 101010101,
"data1": {
"1": {"map": 101010101, "val3": 76, "val2": 71, "val1": 13 },
"0": {"map": 101010101, "val3": 45, "val2": 78, "val1": 12 }
},
"data3": {
"0": {"map": 101010101, "val9": 9, "val8": 78, "val7": 1 }
},
"data2": {
"1": {"map": 101010101, "val6": 78, "val5": 2353, "val4": 13 },
"0": {"map": 101010101, "val6": 89, "val5": 7190, "val4": 12 },
"2": {"map": 101010101, "val6": 65, "val5": 0, "val4": 12 }
}
}
See also DataFrame.to_json
Regards,
Wan.