It's been a while since I did it. As I recall it was a highly iterative process. It went something like:
Run through species list, looking for species that are not in the OTM database
Review list. For species that are not in the OTM list but should be (slight spelling difference for example) change the import code to clean up the names so they match
Repeat until the list of species not in the OTM db is clean (the species really aren't in the db)
Add the clean list of species to the db
Import trees
Expect to write a lot of special case code for your dataset. For example my main loop start like this (and proceeds for about 50 more lines before trying to look up the species name):
for name in sdot_names:
sname = name.scientific_name
cultivar = ''
terms = sname.split()
status = 'unknown'
if len(terms) == 1 or sname == 'Impervious surface':
status = 'unknown'
continue
elif terms[1] == 'sp.' or terms[1] == 'spp.':
status = 'generic species'
elif len(terms) == 2 or terms[1][0] == '\'' or terms[1][0] == '`' :
if terms[1][0] == '\'' or terms[1][0] == '`': # second word starts with a quote mark
status = 'cultivar'
cultivar = " ".join(terms[1:])
self.stdout.write( "\tcultivar:'" + cultivar + "'\n")
else:
status = 'species'
elif terms[1] == 'x' or terms[0] == 'x' or terms[2] == 'x':
if len(terms) == 3:
status = 'hybrid'
else:
status = 'hybrid cultivar'
elif len(terms) > 2:
status = 'cultivar'
cultivar = ' '.join(terms[2:])
Good luck!
- Alan