mass tree import

47 views
Skip to first unread message

Mathew Brown

unread,
Oct 9, 2014, 11:42:08 AM10/9/14
to opentree...@googlegroups.com
Hi there,

I'm attempting to import a large tree database (about 500,000) and I'm having an issue getting the trees saved as the correct species. I've done some matching by species and/or genus name and common name, which works most of the time, but there are cases where the trees are being saved as the incorrect species. It's mostly do to the city database I'm using, for example botanical_field =Tilia sp., common_nam = 'LINDEN'. 

Has anyone else had to fight through data like this, and if so it would be great to see your code.

Thanks
 


Alan Humphrey

unread,
Oct 9, 2014, 1:23:56 PM10/9/14
to opentree...@googlegroups.com
It's been a while since I did it. As I recall it was a highly iterative process. It went something like:

Run through species list, looking for species that are not in the OTM database
Review list. For species that are not in the OTM list but should be (slight spelling difference for example) change the import code to clean up the names so they match
Repeat until the list of species not in the OTM db is clean (the species really aren't in the db)
Add the clean list of species to the db
Import trees

Expect to write a lot of special case code for your dataset. For example my main loop start like this (and proceeds for about 50 more lines before trying to look up the species name):

for name in sdot_names:
sname = name.scientific_name
cultivar = ''
terms = sname.split()
status = 'unknown'
if len(terms) == 1 or sname == 'Impervious surface':
status = 'unknown'
continue
elif terms[1] == 'sp.' or terms[1] == 'spp.':
status = 'generic species'
elif len(terms) == 2 or terms[1][0] == '\'' or terms[1][0] == '`' :
if terms[1][0] == '\'' or terms[1][0] == '`': # second word starts with a quote mark
status = 'cultivar'
cultivar = " ".join(terms[1:])
self.stdout.write( "\tcultivar:'" + cultivar + "'\n")
else:
status = 'species'
elif terms[1] == 'x' or terms[0] == 'x' or terms[2] == 'x':
if len(terms) == 3:
status = 'hybrid'
else:
status = 'hybrid cultivar'
elif len(terms) > 2:
   status = 'cultivar'
   cultivar = ' '.join(terms[2:])

Good luck!

- Alan

--
You received this message because you are subscribed to the Google Groups "opentreemap-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opentreemap-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mathew Brown

unread,
Oct 9, 2014, 1:42:32 PM10/9/14
to opentree...@googlegroups.com
Yeah, I figured as much. I think my jumble of code is getting there. Thanks!

You received this message because you are subscribed to a topic in the Google Groups "opentreemap-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/opentreemap-user/cVs0Jcjl6bA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to opentreemap-us...@googlegroups.com.

Michał Stępniewski

unread,
Mar 4, 2017, 1:44:35 PM3/4/17
to opentreemap-user, mathewbr...@gmail.com
So You guys now how to mass import trees directly into the database?
Reply all
Reply to author
Forward
0 new messages