mass tree import

Mathew Brown

unread,

Oct 9, 2014, 11:42:08 AM10/9/14

to opentree...@googlegroups.com

Hi there,

I'm attempting to import a large tree database (about 500,000) and I'm having an issue getting the trees saved as the correct species. I've done some matching by species and/or genus name and common name, which works most of the time, but there are cases where the trees are being saved as the incorrect species. It's mostly do to the city database I'm using, for example botanical_field =Tilia sp., common_nam = 'LINDEN'.

Has anyone else had to fight through data like this, and if so it would be great to see your code.

Thanks

Alan Humphrey

unread,

Oct 9, 2014, 1:23:56 PM10/9/14

to opentree...@googlegroups.com

It's been a while since I did it. As I recall it was a highly iterative process. It went something like:

Run through species list, looking for species that are not in the OTM database

Review list. For species that are not in the OTM list but should be (slight spelling difference for example) change the import code to clean up the names so they match

Repeat until the list of species not in the OTM db is clean (the species really aren't in the db)

Add the clean list of species to the db

Import trees

Expect to write a lot of special case code for your dataset. For example my main loop start like this (and proceeds for about 50 more lines before trying to look up the species name):

for name in sdot_names:

sname = name.scientific_name

cultivar = ''

terms = sname.split()

status = 'unknown'

if len(terms) == 1 or sname == 'Impervious surface':

status = 'unknown'

continue

elif terms[1] == 'sp.' or terms[1] == 'spp.':

status = 'generic species'

elif len(terms) == 2 or terms[1][0] == '\'' or terms[1][0] == '`' :

if terms[1][0] == '\'' or terms[1][0] == '`': # second word starts with a quote mark

status = 'cultivar'

cultivar = " ".join(terms[1:])

self.stdout.write( "\tcultivar:'" + cultivar + "'\n")

else:

status = 'species'

elif terms[1] == 'x' or terms[0] == 'x' or terms[2] == 'x':

if len(terms) == 3:

status = 'hybrid'

else:

status = 'hybrid cultivar'

elif len(terms) > 2:

status = 'cultivar'

cultivar = ' '.join(terms[2:])

Good luck!

- Alan

--
You received this message because you are subscribed to the Google Groups "opentreemap-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opentreemap-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mathew Brown

unread,

Oct 9, 2014, 1:42:32 PM10/9/14

to opentree...@googlegroups.com

Yeah, I figured as much. I think my jumble of code is getting there. Thanks!

You received this message because you are subscribed to a topic in the Google Groups "opentreemap-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/opentreemap-user/cVs0Jcjl6bA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to opentreemap-us...@googlegroups.com.

Michał Stępniewski

unread,

Mar 4, 2017, 1:44:35 PM3/4/17

to opentreemap-user, mathewbr...@gmail.com

So You guys now how to mass import trees directly into the database?

Reply all

Reply to author

Forward