Hello Kate,
I don't see a way to do all this within the database. You would have to pull each document to the client side, calculate the score there, and then update the document with the score. Below are the steps in more detail (in Python/pymongo) and the code. Please use the code only as a guideline and double-check it, it may be slightly different for you, depending on your setup, mongod port, etc.
1. import the scores from the csv file and create a dictionary in python with the word as key and the score as value.
2. get a cursor to all documents in the sentences collection that don't have a score yet (this is important because it allows you to restart the script if it doesn't complete in one run. If you are updating 40 million documents, this can take a long time)
3. go through each document, calculate the score (ignoring words that don't have a score) and update the document with the score.
from pymongo import Connection
import csv
# create csv reader to import scores
reader = csv.reader(open('scores.txt', 'r'))
# skip header in csv file
headerline = reader.next()
# read all scores and create scores dictionary
scores = {}
for line in reader:
scores[line[0]] = int(line[1])
# establish connection to mongodb
con = Connection(port=30000) # add your host/port here if not using default values
db = con['test'] # use the correct database (here: test)
# get a cursor to all documents in the collection that don't have score yet
cursor = db.sentences.find({'score': {'$exists': False}})
# iterate over all documents in the cursor
for doc in cursor:
# calculate score
sentence = doc['text']
words = sentence.split()
score = sum([scores[w] for w in words if w in scores])
# update document
db.sentences.update({'_id': doc['_id']}, {'$set': {'score': score}})
I hope this helps you solve your problem.
Best regards,
Thomas