automated species identification

1,528 views
Skip to first unread message

Ken-ichi

unread,
Apr 25, 2017, 6:04:16 PM4/25/17
to inaturalist
Hey folks,

I'd like to share something our team and our collaborators have been
working hard on for a few months: automated species identification!
Here'd the demo for you to play with:

https://www.inaturalist.org/computer_vision_demo

What's going on here? We're using technology similar to facial
recognition systems, except we're training it to recognize species
from the photos we all upload to iNat and identify, and we're
incorporating iNat observational data to rank results based on date
and location. Currently we've trained it on species that have Research
Grade observations by 20 or more unique people. That means it can
recognize commonly-observed RG stuff like Turkey Vultures and
California poppies, but it doesn't recognize dogs, cats, humans,
obscure carabid beetles, dumbo octopus, etc. (it actually seems to
think all humans are lizards... b/c so many lizards are photographed
with a human hand in frame, or b/c the TV show V was totally accurate
and the computer knows more than we do).

Right now it's just a demo, with a lot of flaws, but also a lot of
promise, and we're curious: how do you feel about this? Do you think
you'd find this useful? Our guess is that it will be enormously
helpful for folks who are just getting into natural history, because
if we get it working right, it could deliver an instantaneous array of
good options for very common things. In it's current state, it might
not be that useful for experienced naturalists who are identifying
less commonly-observed things, but you can imagine a future when we
have enough data for it to get really good at identifying things like
moths.

Anyway, thoughts? Bugs? Working well in some parts of the world and not others?

-ken-ichi

P.S. The demo works on most mobile browsers, which usually let you
choose an image from the camera, so this already works pretty well
outside if you have good reception. It's only submitting a small
version of the photo you take so it shouldn't eat up too much data.

James Bailey

unread,
Apr 25, 2017, 6:45:39 PM4/25/17
to iNaturalist
I guess this is the obligatory post on being careful with some species. For instance California poppy is just one of many poppy species that can't be distinguished from the top of the flower (often).

A cool demo though!!

James Bailey

unread,
Apr 25, 2017, 6:47:22 PM4/25/17
to iNaturalist
And it does give a great list of American species for photos from Australia and so on...but it is very promising!

Ken-ichi

unread,
Apr 25, 2017, 7:16:18 PM4/25/17
to inaturalist
Yup, both valid issues largely due to limited data, or biases in the
data. It did not do a good job with
http://www.inaturalist.org/observations/5395632, for example, but
*most* poppy observations that go through it are going to be CA
poppies. The Australian issue is tougher. If we get vision
recommendations of species that have never been observed in the
reported location, do we show them as a way to suggest leads, or are
they just confusing? Do we attempt to infer a higher level taxon from
the results and show observation-based results of that? Still figuring
these things out.
> --
> You received this message because you are subscribed to the Google Groups
> "iNaturalist" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to inaturalist...@googlegroups.com.
> To post to this group, send email to inatu...@googlegroups.com.
> Visit this group at https://groups.google.com/group/inaturalist.
> For more options, visit https://groups.google.com/d/optout.

Charlie Hohn

unread,
Apr 25, 2017, 8:40:47 PM4/25/17
to iNaturalist
whaaaa! I am skeptical of course, but this could be really helpful for common species. I am envisioning something like 'suggested species ID' in a little frame or popup, or as part of identotron or something, rather than it actually offering an ID the way a user does, I think. But I'm super interested and excited to see where it goes. 

It seems way better at doing this than leafsnap (no offense to leafsnap, but it was made several years ago, etc). It successfully suggested crocus and burdock ID, seemed to be helpful for a moth, and thought a picture of my newborn baby daughter from last year was a ringneck snake! (so i guess she is not a lizard).

I will play with it more, of coures. 

Charlie Hohn

unread,
Apr 25, 2017, 8:44:49 PM4/25/17
to iNaturalist
it did flag white fawn lily as 'seen nearby' for a trout lily leaf when the closest observation is several hundred miles away and definitely isn't found here. it might be good either to tighten that 'nearby' radius or else  have it say how far away ("nearest observation 40 miles away" etc)

krancmm

unread,
Apr 25, 2017, 8:53:22 PM4/25/17
to iNaturalist
Pretty neat, for newer users!  I tried it with some common species of butterflies (1 partially hidden in grass), 3 obvious moths, 2 obvious birds and 3 plants and the first choice was correct in each case but one plant.

I really don't like the possibilities from other parts of the world - that, to me, is both confusing and encourages weird IDs.  I'd also like to see specified what is considered "seen nearby" - county, country, km/miles...

At least from the ones I added, since the first selection was correct in all but 1 of 10 it seems more useful to "bump" to the next higher taxon AND also "seen nearby" for additional choices.

This probably needs to be tested by users who are maybe taking rather blurry shots from a distance where even cropped may not be sufficient for an ID...ran into a lot of that during the City Nature Challenge.

Monica

Charlie Hohn

unread,
Apr 25, 2017, 9:01:17 PM4/25/17
to iNaturalist
is there a way to run it on existing inat observations?

Patrick Leary

unread,
Apr 25, 2017, 9:26:11 PM4/25/17
to iNaturalist
Charlie, in this version of the demo the only thing you can do is upload photos. But there's nothing preventing you from uploading a photo from an existing observation. Just keep in mind this may not give the most accurate results. The algorithm has been trained with many photos in existing observations, and should be excellent at identifying those photos. So if you happen to choose a photo that was used in training, the results should be abnormally confident about the proper identification. For now, anything from the last couple weeks won't have been included in training, but that won't be the case for long.

Charlie Hohn

unread,
Apr 25, 2017, 9:31:44 PM4/25/17
to iNaturalist
Oh good point. My idea was that it would be neat to try it on things i observed when traveling far from home, because most likely those flowers i saw in colorado or whatever aren't that rare, i just don't know them. But they are way buried in my photo files. I'll just wait on it for now and try it out with regular photos, but thanks!

krancmm

unread,
Apr 25, 2017, 9:39:49 PM4/25/17
to iNaturalist
Patrick,

All the ones I uploaded for testing were from within the last week and I got 9 of 10 perfectly IDed.  But I also use a dSLR, crop, etc.

In your testing, does the image quality make a huge difference?

Monica

Cullen Hanks

unread,
Apr 25, 2017, 10:25:34 PM4/25/17
to inatu...@googlegroups.com
Charlie,

Just save the image from the obs, and run it. (For now)

This is really exciting!!

-Cullen
--

James Bailey

unread,
Apr 26, 2017, 12:50:31 AM4/26/17
to iNaturalist
Vagrant records are unusual so I'd support prioritizing species already well known from the country. That would solve a list of American species coming up for Australian photos. It doesn't seem like specifying location really matters. It did recognize Henslow's sparrow from a photo so that was impressive, and a nearly entirely leucistic American robin. Saltmarsh sparrow gave it trouble though.

Maybe you could append similar matched species from other countries at the bottom of the page or such.

Ken-ichi

unread,
Apr 26, 2017, 3:50:31 PM4/26/17
to inaturalist
Definitely more we can do with geography James, though as folks have
pointed out in this thread, the appropriate scale is always tricky. If
you're in San Diego you do want to exclude results from India but you
wouldn't want to exclude results from Mexico. You could only show taxa
that have been observed within a given radius of the target
coordinates, but how big should the radius be? It'll take some
tinkering.

Monica, regarding photo quality and cropping, this is still a pretty
big unknown for us, and we haven't done too much quantitative testing
on how much effect camera make and model or blurriness affects the
results. Anecdotally, I'm finding that it doesn't, particular with
taxa that it's very good at like lady beetles. Cropping photos so that
they mostly contain the subject does seem to make a big difference. If
you photograph a lady beetle on a flower, it often identifies the
flower and not the beetle.

Charlie Hohn

unread,
Apr 26, 2017, 4:01:12 PM4/26/17
to iNaturalist
my take is it would be a good idea to reduce the radius of 'nearby' by a significant margin. Most fun would be if it indicated nearest record distance but that may be tricky or not worth programming... I'd limit 'nearby' to at the MOST 100 miles, probably 30 is better at least in the future as our data pool increases. For plants anyway. Seems for animals it doesn't matter as much since they move around. I just found it a bit odd that a plant observed in Montpelier Vermont in the center of the state was showing a species as 'nearby' that is literally not found anywhere in the state (that we know of but it is very unlikely). But not a huge deal especially if the algorithm is good and Atlases further help us with out of range mapping.

Mike Bear

unread,
Apr 26, 2017, 10:20:38 PM4/26/17
to iNaturalist
Not bad! I put in a rather obscure marine species (a tube dwelling anemone) and it got it (mostly) right: it recognized it as an anemone.

Impressive! 

Christopher Tracey

unread,
Apr 26, 2017, 11:04:17 PM4/26/17
to inatu...@googlegroups.com
I've been testing this for the past day with a number of different species and WOW! really impressive work. It get's a lot of things correct or fairly close.  I never thought this would be possible at this scale.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.

QuestaGame

unread,
Apr 27, 2017, 2:05:08 AM4/27/17
to iNaturalist
Hi Ken, 

Just saw this. 

I want to disagree with some assumptions here, without being disagreeable.

I feel that iNaturalist has not addressed some of the concerns I've raised about computer vision; and, by not openly discussing the issue, is being somewhat disingenuous in its approach. 


When was this posted? When I search through the iNat forum, I see nothing about it, considering Alex has been testing it as a "side project" since mid-2016.

Some of the text in the post is misleading. For example, it seems to suggest that computer vision is meant to reduce the "increasing burden on a relatively small group of identifiers." It says, "fortunately, there have been major advances in Machine learning approaches..."

In fact, "fortunately there have been major advances" in other technologies as well - particularly advances in collective intelligence and online economic models. Alex and iNaturalist are simply not aware of these models. (Silicon Valley doesn't promote them). 

- in which I raised questions about the technology. I received no input from your team. I'd be interested to know if iNat made its users aware that their photos would be used to train AI, or how it might credit these users, and the experts who got them to research grade?

I'm quite aware that I could be a lone voice here, but Scott's post - https://www.inaturalist.org/pages/computer_vision_demo - says that iNat has an average response time for identifying species of 18 days. 

The QGame model has an average time of roughly 6 days, with 50% of sightings identified in 24 hours.

In other words, what I'm trying to say is that the technology to reduce identification time already exists. It's more cost-effective, more accurate, with a better social impact. And it's getting smarter and better every day.

Computer vision technology could have some great applications - but it's not the best solution for the problem you've identified; and indeed, I can think of several negative outcomes. That said, it could be great, for example, at categorising the millions of older sightings in digital libraries. But I don't feel - and happy to explain further - that it's an appropriate solution for reducing or maintaining the identification time lag (18 days).

Put simply, there are better, more empowering solutions, that will have longer, more sustainable and socially beneficial outcomes. 

So I'm hereby, for the record, officially raising an objection (sorry, Alex). 

I hope the objection won't to be ignored.
 
Kind regards,
Andrew

Charlie Hohn

unread,
Apr 27, 2017, 7:49:45 AM4/27/17
to iNaturalist
just because someone doesn't respond to your post on here doesn't mean people are being disingenuous.  That's not really an 'agreeably disagreeing' thing to say imho.

If you put stuff on the internet, machines and computers can see it. Even if iNat didn't do it someone else would. Which isn't a justification in and of itself of course. but i'd be surprised if anyone else really cares that a computer used a photo to help identify others of the species. Humans do it all the time.

 i don't see using inat to support a gamification website (and tossing stuff on inat without talking to people first) as less exploitative than using publically posted photos to identify plants. So it's hard for me to take this too seriously. 

That being said i have been told i argue too much on here, so that's all i will add to this post for now.

Ken-ichi

unread,
Apr 27, 2017, 3:26:55 PM4/27/17
to inaturalist
Hey Andrew,

Objection acknowledged! And thanks for the thoughts, I think these are
good topics, even if I disagree with you on some points. If I'm
reading you correctly, you have two specific critiques:

1) Computer vision is not the best way to reduce time to
identification (TTID), and we should instead be adopting techniques
QuestaGame has proven to be more efficient.

First, iNat is not just about reducing time to identification. Our
mission is to connect people to nature through technology. Maybe a
more gamified experience would reduce average TTID to 6 days, but
would it do a better job helping people to connect with what they're
seeing outside? If you're standing in front of a flower and wondering
how you would figure out what it is, do you think to yourself "I need
something like Shazam," or do you think to yourself, "I need something
like Candy Crush"? Personally, I think the CV approach better meets
people's expectations about how their phones will help them learn
about the outdoors. I've watched several people use iNat for the first
time and be let down when they realize it's *not* like Shazam and they
have to wait for people to take a look at their observation. I'm not
asking you to agree with me on what the better approach is, but I hope
you'll agree that it's not obvious how best to help people connect to
nature, and that it isn't necessarily about time to receiving
feedback.

Second, the potential TTID of a CV system is measured in seconds, not
days. If it takes longer than that, you ignore the flower and go back
to playing Pokemon (or, ideally, look at another flower). There are
going to be problems with accuracy, network conditions, technology,
etc, but in the ideal scenario, someone with no background in natural
history should be able to point their phone at a flower and learn its
name in an instant. It probably won't work that well all the time, but
we're talking about a different order of magnitude in terms of TTID,
so I don't think it's fruitful to get hung up on what the faster
approach is.

2) Photographers and identifiers should be credited with training the
model, or at least notified

This is trickier and definitely warrants more discussion. On the
subject of attribution, we obviously can't credit every single person
who contributed to every single classification event since there could
be thousands and the structure of neural networks prevents us from
even knowing what training images contributed to any particular
classification, but I do think there's some potential for
incorporating (and thanking) some of the people most likely to have
contributed to a CV identification. For example, in an app, we could
have something at the bottom that says, "These results brought to you
by susanhewitt, aztekium, greglasley, and others in the iNat
community", and we could get these names from the top identifiers and
observers of taxa in the result set. Something to think about.

On the subject of using people's data for something they didn't
expect, I think that's terra incognita and I look forward to reading
the white paper you guys are working on. Personally I agree with
Charlie, and I don't really care if a computer is looking at my photos
to learn how to identify plants, since humans do it all the time, and
the computer will eventually use that knowledge to help out other
humans in their own learning process. I can see how some people might
be freaked out by it though, so if anyone out there feels that way,
let us know why! Again, we're still developing this, so the more we
know how people feel at the early stages the more we can accommodate
that feedback.

-ken-ichi
> --
> You received this message because you are subscribed to the Google Groups
> "iNaturalist" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to inaturalist...@googlegroups.com.

QuestaGame

unread,
Apr 28, 2017, 12:30:12 AM4/28/17
to iNaturalist
Hi Ken-ichi,

Thanks for this.

I find it ironic that of all people, I'm the one objecting. QGame, after all, certainly stands to gain from this tech.

But I think this is an important issue of our time - and gets to the heart of our mission - "connecting people to nature through technology."

I'll try to keep this brief by summarising what I think are your points (correct me if I'm wrong):

1) The question is AI vs Gaming

Gaming is not part of it. In fact, QGame's ID system is not gamified. Rather, it's AI v. CI (collective Intelligence). So I don't think the Shazam<>Candy Crush analogy is relevant here.

2) AI is by order of magnitudes faster than a CI solution

This is not true. CI (collective intelligence) is faster now and is moving toward instantaneous feedback - even faster (and far more accurate) than AI is likely to be when you factor in the requirements on the user to make the data AI-intelligible. 

3) A computer vision technology that creates a "Shazam for Nature" can help "connect people to nature through technology."

Apart from your own observations, do you have evidence for this? I haven't seen it; but I've read studies in communications science that suggest the opposite effect. 

Taking your points and mine above - I do wonder if there's a middle ground, combining AI and CI in a smart way to connect people to nature. As Scott has written, iNat has a "relatively small group of identifiers." Maybe we need to address that? What if CI could vastly increase that number? 

4) Copyright and AI-training is still "terra incognita"

We're not the first to arrive at such terra (I think of Captain Cook's voyage; I think of pre-public Google). AI has been around a long time. I sometimes wonder, when people's jobs are involved (especially tech people like us), if the ethical thing is often the inconvenient thing - and therefore left undecided.

Which brings us to some important questions I have for you and iNat:


2) What potential negative impacts have been identified ? (Surely there are some)

3) Who has rights to use it ?

4) Who has rights to shut it down if its impacts prove negative ? (Could it even be shut down?)

5) Now that our images and expertise have been used to train the computer vision, do we have any rights over the use? Will private companies such as Facebook, Google, Amazon etc get to use it? How do we know? If they can use it, will they reimburse us?

6) Are there better ways to use AI (I think there are) - e.g. ways to increase the expert pool and people's engagement with nature - and if so, how can we best work to develop these?

7) Have legal and/or ethics experts been consulted by your team? What did they say?

Ken-itch - I don't think it's about some people being "freaked out." I just read this article in Wired today - worth the read - https://www.wired.com/2017/04/hey-computer-scientists-stop-hating-humanities/

Ultimately, I haven't seen enough discussion about all the other ways AI might be used in improve the iNaturalist experience to justify the trendy "Shazam of nature" approach. 

It seems to me there's so much more that can be done, at lower cost, with less risk of negative impacts, far greater potential to connect people with nature, and more sustainable outcomes for people and the environment in the future.

Thanks for listening and keen to hear a broader pool of responses from people of diverse backgrounds and disciplines. 

Cheers,
-A

Charlie Hohn

unread,
Apr 28, 2017, 7:44:16 AM4/28/17
to iNaturalist
I am going to take a crack at a better response to this now, because baby slept through the night and i feel not one bit ornery.

(To be clear, I am a longtime iNat user and a curator but am not admin in any way).

I think, to your points about AI and these kinds of algorithms, they are VERY good questions - when looking at the Internet as a whole. Our faces are online all the time, for many people, and even people who try to avoid it are now on various security cams and such. We are very close to the world depicted in the movie 'Minority Report' where a camera or scanner recognizes you when you walk into a store and gives you personal adds. This also pretty much removes any expectation of location privacy. And since we are currently (imho) moving towards the corporate-dominated dystopia rather than some other ones, that has some huge implications in terms of corporations getting much, much more powerful and intrusive into our lives. Because it happens relatively slowly and because they also offer 'free' products we don't really notice it as much as we should. So yeah, in a perfect world where corporations didn't control the government, we'd probably have some strong rules as to how this tech could be used, and it would be monitored and restricted.

In the world we live in, we are barreling full force into this. Which probably isn't a good thing. That being said, using this annoying intrusive technology to IDENTIFY SPECIES strikes me as almost a countermeasure. Taking something potentially harmful and deriving a great deal of good from it. Increasing public understandng of and connection to nature like Ken-Ichi said. Literally moving towards the 'trichorders' of Star Trek. And on a weird hypothetical scifi note, if we really do someday move towards some sort of 'singularity' where the AI (or AIs) becomes self aware... I'd WANT the AI too to have some sort of connection to and knowledge of the planet and life it is abstracty and distantly but still crucially connected to. I just don't see the downside here.

In terms of image privacy.... make no mistake.. ANY image you post on the Internet is getting combed through by dozens if not thousands of these bots. Your wedding pictures... security camera images on connected devices... landscape shots and scenery, aerial photos, and yes photos of plants and animals. Regardless of if iNat does this other bots will be coming through iNat photos. Is it ethical? maybe not. But that's what the Internet does now. Us not doing it doesn't mean others won't. There's literally no way to prevent it other than not using iNaturalist or ANY other website that has photos of any kind.   So along with our spying computers I would rather have a world where Google Street View can identify plants as they drive, and where our smartphone can not only identify a plant but connect someone to a community of many others in the area excited about that plant. From both a conservation data standpoint and a cultural standpoint, the benefits of those far outweigh the costs.

On my other note... I admit I don't fully understand QuestaGame. I am not at all opposed to it connecting to iNat. However my first introduction to it was when a bunch of observations were being posted, then deleted right after they got an ID - a huge breach of etiquette on iNat and not a good faith part of the community. I don't bring this up because I am still mad about it, I don't think it was meant that way. However, I am truly confused as to how allowing an algorithm to see publically posted posters is bad, but accessing an Internet community to get IDs without a detailed introduction posted to the entire community first is good. In terms of AI vs CI scenarios, I am also a bit confused.  You are making it sound like QuestaGame as banned from iNaturalist or something and I didn't think that was true. How is any of this getting in the way of you being able to use CI? Why do you feel threatened by this in particular?

To be clear again, I hope QG succeeds, i hope someday it expands to New England or CA plants so I can generate some donations from iNat, etc. But I hope it succeeds as a member of this community who is an active participant and who wants overall growth, rather than something that feels like it needs to 'defend' its own interests from 'competition' from other sources of species ID. I don't think we want that kind of thing. The more observations and accurate IDs the better. I'm not yet compelled by your argument that this is a bad thing. And I find your approach a little pushy. Like I saikd I sometimes get too riled up in here and need to take breaks/step away so I don't annoy people with excessive opinions. But I've been part of this community since 2011. You have been a very detatched part of this community for much less time, and appear to be coming in with very strong opinions that we as a community should not implement this new feature. I am having trouble understanding why, or for what reasons other than self-interest... I believe you have valid concerns, but I can't figure out what they are!

Anyway hopefully this was a better post. Again I don't want to discourage you or QG from being a part of iNat. But I think if you want to have a large influence on policy here, more so than me or any other user, you need to somehow formalize the connection with the site like other data partners do and have a well-written FAQ and description of what you are and what you are doing. I for one am still confused about that.

Thank you!

Scott Loarie

unread,
Apr 28, 2017, 8:55:43 AM4/28/17
to inatu...@googlegroups.com
My 2-cents:

I walked around the natural part of the New York Botanical Garden yesterday with super-botanist Daniel Atha botanist http://www.inaturalist.org/people/20600
testing out the computer-vision-demo on a handful of observations http://www.inaturalist.org/calendar/loarie/2017/4/27 

New York is a place where I have pretty much 0 plant knowledge, so the interaction was mostly like this: 
Daniel: "oh here's a good one, try it on this"
Me: "it says Prunus serotina?"
Daniel: "thats right!"

I have to say, we were both pretty floored. It got nearly all of them right. 

It even got this Sedge right, which is still bizarre to me:

And these trees just from the bark:

It did totally fail on 3 (14%) like this one: 

But all in all, both Daniel and I were amazed by how accurate this machine learning technology is. 

As Charlie said, this technology is here like it or not. I guarantee you you're going to see a hundred AI-only identification apps coming online in the next couple years. Pretty soon, your phone will come with an all-purpose identifier built in that identifies everything from cars, to people, to species.

But, there's a ton of reasons why these AI-only identifiers are going to get things wrong. This means alot of these apps will be misleading at best and at worse will make it harder to learn about nature and create high quality data for science. Thats why I think we have a real opportunity here to thoughtfully integrate AI into the existing iNaturalist CI (collective intelligence as Andrew says) to provide a better system for teaching people about nature and vetting data quality then either could do alone.

For example, when the AI is wrong, like this Fumewort which http://www.inaturalist.org/observations/5989406 which it thinks is a Geranium, I had Daniel standing next to me to immediately correct me, but without him I probably would have chosen Geranium (actually maybe I'll give my self more credit than that, but some people would). We have to make sure the AI isn't training itself with its own mistakes (e.g. letting the AI take a Fumewort image that it improperly labeled as Geranium to teach itself what Geraniums look like). As poorly AI labeled images of species start rapidly populating your google image search, iNat could be a place where we can be more rigorous about an confident in the labels because they've been checked by people too.

Fortunately, with iNat, we can use the existing CI to make sure that the AI is learning from people with expertise and not itself. I agree with Andrew that we've only scratched the surface of how  CI technology could be best put to use on iNat and these advances in AI put all the more pressure on us to bring CI forward ASAP as well. For example, IMO we need to shore up 'research grade' to make it quantitative (e.g. RG means >=99.9% accuracy or something) so  we can be more rigorous about the quality of IDs and how AI is influencing things (or not). Thats the goal behind http://www.inaturalist.org/pages/identification_quality_experiment which is rolling along, to start coming up with a more data driven understanding of CI on the site so we can improve it.

Similarly, its one thing to have an app like this work in New York City where we have tons of data to train up these models. I bet a similar trial run in Panama with the computer vision demo would be pretty disappointing (someone please try and report!). For these models to work well in most of the world where we have the most biodiversity and the least amount of data we first need alot more observations and more importantly a ton of identifications provided by knowledgeable naturalists! Without more observations and IDs provided by real people. AI won't be able to provide the training data it needs to perform well for these taxa/regions.

I guess I just don't see these as competing things, AI will suck without CI. We have an opportunity with iNaturalist to build an AI that is more thoughtfully integrated with CI and thus more accurate and useful. This will likely be even more important as other sloppy species identification AI's appear and proliferate (they will).

But also, I think its hard to argue that AI can't play a role in technology that helps people become better naturalists and create high quality data for science. These are the goals of iNaturalist, if AI can help us better achieve these goals, then I'm all for it.

In case anyone is interested, here's the taxa from my trial-run the computer-vision-demo got correct to species:

lesser celandine Ficaria verna
eastern skunk cabbage Symplocarpus foetidus
tulip tree Liriodendron tulipifera
American Bladdernut Staphylea trifolia
umpseed Persicaria virginiana
multiflora rose Rosa multiflora
yellow trout lily Erythronium americanum
Small White Leek Allium tricoccum
American sweetgum Liquidambar styraciflua
Cow Parsley Anthriscus sylvestris
common blue violet Viola sororia
red deadnettle Lamium purpureum
Pennsylvania Sedge Carex pensylvanica
American Hornbeam Carpinus caroliniana
black cherry Prunus serotina

to genus:

Japanese angelica tree Aralia elata
Pignut Hickory Carya glabra
Choke Cherry Prunus virginiana

to family:

Apples Genus Malus

and completely failed on:

star magnolia Magnolia stellata
Bur-cucumber Sicyos angulatus
Incised Fumewort Corydalis incisa

To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.



--
--------------------------------------------------
Scott R. Loarie, Ph.D.
Co-director, iNaturalist.org
California Academy of Sciences
55 Music Concourse Dr
San Francisco, CA 94118
--------------------------------------------------

Paul Bailey

unread,
Apr 28, 2017, 10:47:16 AM4/28/17
to iNaturalist
Hi Scott,

I'm in South Korea and started playing around with the Computer Vision Demo today. Is there a specific way you'd prefer to see feedback -- a certain number of observations to start with, for example?

Outside of a few areas - mainly birds - I really only get help from a couple of iNaturalist users (who I greatly appreciate), so being able to receive feedback from an AI program could be rather helpful. For identifications of things with many different groupings (as a non-expert, beetles come to mind here) it seems like it would be rather useful for creating a narrower range from which to search for more information on my own.

Julien Renoult

unread,
Apr 28, 2017, 12:03:01 PM4/28/17
to iNaturalist
Hi all. Congrat' to the entire team of iNat for this great advance. Even if the tool is not working perfectly now, this is definitively a promising way to go forward in species identification and a great way to do citizen science. To those who remain skeptical about computer vision, I just want to say CV has now become incredibly powerful in object/species/individual recognition; the critical point that determines its success is the number of training images. Personally, this tool motivates me to post more pictures of even not so rare species (I have to admit I am a "lister"), and to do more identifications on other observations.
Good job !

Ken-ichi

unread,
Apr 28, 2017, 2:46:59 PM4/28/17
to inaturalist
I want to reply to some of Andrew's specific questions that haven't
been addressed.
The California Academy of Sciences, a 501(c)3 non-profit organization
in California, USA.

> 2) What potential negative impacts have been identified ? (Surely there are some)

First, inaccurate identifications, which have all kinds of other
implied negative outcomes: identifying humans of certain races as
animals and making them feel like the subject of racism (this has
already happened with Google), people eating deadly mushrooms b/c they
were misidentified as edibles, people killing harmless organisms
misidentified as pests, etc. I think these are all avoidable if we are
honest about the limitations of this technology: it is not right 100%
of the time, so we should never present it as some all-knowing oracle.
It's a tool that gives you suggestions based on image similarity, just
like a graphing application can show you a trend in a series of
numbers. Plus, these are all potential problems with our existing
approach to identification.

More specifically, you could imagine a "robots are coming for our
jobs" version of this where we just assume the AI is always right and
let it override human opinions, and gradually push all the humans out
of iNat until it's just a monolithic natural history Skynet. That
would be exploitative, complete mission failure for us (can't connect
people to nature if you drive all the people away), and frankly
counterproductive if we *wanted* a perfect identification robot, for
all the reasons Scott described above, b/c we don't and may never have
enough training data to achieve that goal.

> 3) Who has rights to use it?

Legally, anyone the California Academy of Sciences chooses to grant
such usage right. Effectively that means we, the iNat team, are
granting that right, just as we do with the rest of iNat.

> 4) Who has rights to shut it down if its impacts prove negative ? (Could it even be shut down?)

See 1, and yes, it can be shut down.

> 5) Now that our images and expertise have been used to train the computer vision, do we have any rights over the use? Will private companies such as Facebook, Google, Amazon etc get to use it? How do we know? If they can use it, will they reimburse us?

If you're imagining a situation where we sell a service based on our
CV model to someone you don't like (a national military, some horrible
ag company, etc), that's totally possible, and an example of the legal
terra incognita I spoke of before. My not-a-lawyer suspicion is that
in the US, if the purveyor of such a service is not actually
re-publishing the creative works that constituted the training data,
then there is no violation of copyright. Other jurisdictions have
different copyright interpretations, but I strongly suspect that this
issue is well ahead of the law in most places. This is certainly
something to consider, but I don't think anything is settled legally,
nor do I think it's a situation where there's a clear right or wrong
way to do it.

Regarding notification and reimbursement, we don't know yet, but
consider that almost every single commercial website and app you use
is already doing something similar by aggregating, analyzing, and
packaging your behavior on their services and selling it to others who
want to understand behavior to better sell things (or run political
campaigns, etc). You get reimbursed by getting to use their services
for free, and you don't get any notification beyond the warnings in
their Privacy Policy and Terms of Service. Andrew, you are using
GMail, so every word you type is being used by Google to sell ads (or
if you've opted out of that, just your usage is being used to sell
ads) and they are not crediting you or reimbursing you with anything
other than their service, yet you continue to use it. Furthermore,
this is a public forum (just like iNat) so who knows who is using our
words or for what purpose.

I have personally been approached in the past by companies seeking to
purchase such data from us, and I have turned them away because I
think that's creepy and I don't think any iNat user thinks that's part
of the deal. To be clear, we don't sell your personal data. Frankly,
though, I think using photos as training data for an AI is
categorically different. It doesn't allow people to discover that
25-year-old women in California who watch Game of Thrones are 50% more
likely than average to vote libertarian or something. Instead it might
show some biases like "photographers in California don't photograph
harvestmen much OR the iNat community is bad at identifying harvestmen
photos to species," which seems relatively harmless. I'm also not
opposed to selling a service like this, not to Monsanto (ew), but
maybe to researchers operating camera trap arrays or something.

> 6) Are there better ways to use AI (I think there are) - e.g. ways to increase the expert pool and people's engagement with nature - and if so, how can we best work to develop these?

Happy to hear your ideas. One way is, like Scott pointed out, reducing
the flood of common species so experts can focus on the rarities the
AI hasn't seen enough to know about.

> 7) Have legal and/or ethics experts been consulted by your team? What did they say?

Not yet, but that doesn't imply that we're not thinking about these
things, or ignoring them because they're inconvenient. Keep in mind
that those of us who work on iNat also use iNat (cue the Hair Club For
Men jokes...), and we have all these same concerns ourselves. For me,
the most important ethical aspect of all this is that we are using the
data naturalists have produced to serve naturalists, or better yet to
recruit new naturalists. We are not exploiting people to get rich or
inflate our egos.

-ken-ichi

jesse rorabaugh

unread,
Apr 28, 2017, 3:38:23 PM4/28/17
to iNaturalist
Five more years of development and this becomes absurdly powerful.
  • How long until you can fly a drone over a park to take high resolution photographs and identify and map the vast majority of plants with the click of a button.
  • How long until it is released on google street view to find every identifiable picture of an organism?



There are probably some ways to use this to for evil, but it is an inevitable outcome of the absurd power computers these days have so it doesn't seem worth fighting.

Charlie Hohn

unread,
Apr 28, 2017, 3:49:45 PM4/28/17
to iNaturalist
they have already used spectral analysis to identify trees over the Amazon. I suspect with a few grad students good with modeling, two or three good drones, and a nice foliage season with clear weather, one could probably map 90% of tree species in a Vermont park to 90% accuracy. The darn things color code themselves to say nothing for what they look like in other spectrums the human eye can't see, phenology timing, shape, texture, etc.

One of the things i do for a living is map that stuff, so yeah, maybe a robot drone will steal my job .But right now i have so much work to do with so few resources that I can afford to share some work with a drone or two or 50. In the end of the day, as Ken-Ichi was touching on, human naturalism and conservation is ultimately oriented towards things humans value, so barring sci-fi stuff like sentient AI (and even maybe in that case), you will always need humans on the ground to do human-based field naturalist, ecology, education, and conservation jobs. And maybe instead of counting sedge dongles in a plot I can have a drone do that and spend more time doing what the human brain literally evolved for countless millenia to do: look at data and employ our powerful senses and intuitions to interpret, share, and ultimately positively interact with the ecosystems we share the world with.Considering that is literally what our species is optimized to do, I think it will actually literally be the last job a robot is able to take from us. And by the time we are at that point, if we make it that far, our species will be a totally different thing altogether anyway.

AfriBats

unread,
Apr 28, 2017, 3:54:53 PM4/28/17
to iNaturalist
Fascinating development, interesting discussion. I think Andrew should be given credit for bringing up some of the issues in his other post, to which there was no reaction at all. I think he raises important issues (as do others here), but it's probably not about who's been using iNat for how long...

Also nice to see that QG rewards iNat's expertise. Looks like there's plenty of room to explore and develop these things together.

Jakob

swhit...@yahoo.com

unread,
Apr 28, 2017, 9:35:29 PM4/28/17
to iNaturalist
Wow, this tool is amazing. I just did a test on 35 different moth species that I've photographed in the last two weeks.

It correctly identified 23 photos to species (the first suggested species in the results list was correct.)

The correct species was near the top of the list for 3 photos (correct species was 2nd, 4th, and 5th in the results list.)

It correctly identified 1 to genus Plagodis, 1 to family Geometers, 4 to subfamily Ennominae, and 1 to superfamily Owlets and Allies. There were only 2 identified to incorrect moth groups.

That's pretty amazing, especially considering that some of these were tricky or uncommon species.

James Bailey

unread,
Apr 29, 2017, 12:25:21 AM4/29/17
to iNaturalist
I managed to trip it up with Cassin's vireo from google images. It said it was plumbeous :)

Scott Loarie

unread,
Apr 29, 2017, 8:53:56 AM4/29/17
to inatu...@googlegroups.com
interesting, did you include a location? Cassin's vireo, plumbeous vireo and blue-headed vireo used to be all considered the same species "solitary vireo" and were split largely based on range. I would think thats an example where location would help the demo distinguish.

On Fri, Apr 28, 2017 at 9:25 PM, 'James Bailey' via iNaturalist <inatu...@googlegroups.com> wrote:
I managed to trip it up with Cassin's vireo from google images. It said it was plumbeous :)

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

QuestaGame

unread,
Apr 29, 2017, 8:54:58 AM4/29/17
to iNaturalist

Thanks, Ken-ichi (and others). 


Great discussion. Appreciate the answers, responses, participation - all the time you’re taking. (Charlie, you’re up late!). 


I know everyone’s busy and the last thing we want these days is more to think about.


I think Scott is spot on with his comments re the CV tool. The philosopher Daniel C. Dennett puts it nicely:


“The real danger [of AI] is that we will over-estimate the comprehension of our latest thinking tools, prematurely ceding authority to them far beyond their competence…”


Personally I see tremendous potential with AI. I love playing around with machine learning tools like TensorFlow; and btw Ken-ichi, GMail believes I’m a 90-year-old mermaid (fun to fool the ad algorithms!). 


I’ve also tested at least a half dozen “Shazam for nature” apps before this one. And Scott is right - there's plenty more afoot. They get good publicity. The media laps it up.


In a recent survey, we asked this question to users of one such CV tool - Merlin Bird ID:


“If Merlin Bird ID gave you a choice between (a) training the AI how to identify birds and, (b) in the same manner, with equal ease, training less advanced users how to identify birds, which would you choose?”


Sixty-five percent of the respondents said they’d rather train less advanced users.


Even if we disregard this survey, I think we have to ask - while we’re developing some cool CV, to what degree are we also implementing functionality that encourages people to train people? 


In this month alone, private venture funds have invested $363 million In AI/Machine Learning. Each month the amount goes up. That's quite an AI tidal wave on its way.


If AI distracts us from enhancing tech that gets people teaching people about nature, this could be a negative outcome. 


Let me be clear - iNat is a major achievement. It's the biggest and most advanced naturalist data collection system in the world. A result of very hard work, by dedicated people. And with this achievement, of course, comes risks and responsibilities - which is why I think the team looks so fit - the growing weight on their shoulders. :-) 


There are also people in the iNat community who can probably identify all 13,750 species in Scott’s dataset. (I remember us marvelling, Ken-ichi, over one such person - and Chris, if you think this CV tool is impressive, then experts like this are wonders of the world!).


I think we’re all agreed that iNat - and society in general - would benefit from more people with knowledge like this. No? 


Which means we should be careful not to take these minds for granted. With iNat my sense is that CAS is funding coders - would you agree? - but not the experts who are providing the nature knowledge. Or am I wrong? Meanwhile, here we are with scientists marching in the streets. Half the Great Barrier Reef has perished. The highest rate of extinction in 66 million years. The message of iNat's CV investment - to the next generation at least - is that we value coding skills. Whereas CI investment (a drop in the ocean compared to AI) values nature expertise.


The influence of iNat should not be underestimated. Perhaps QGame's tech is a kind of balancing off-set to CV? (A kind of Ed-tech play?). I hadn’t thought of it that way - but it could be the case. And maybe, Charlie, that's my concern: We have 50,000 kids here who are ready and eager to learn how to identify all 13,750 species (I suspect a few of them could do it now). They’re quick learners. It's easy for them. But maybe the message we/iNat/CAS are sending is that learning to code is more valuable than learning to identify species.

I’m not saying that encouraging kids to learn coding can’t also help connect them to nature. But I wonder - and I appreciate your acknowledgment, Scott - that without the CI element, if there's a danger, instead of “connecting humans to nature through technology,” we might end up doing something different, like connecting technology to nature through humans.


P.S. Scott - btw, re http://www.inaturalist.org/pages/identification_quality_experiment - we’re working with the Australian National University and others on a similar project. It could be interesting to test different "crowd" systems - to see what elements of CI, if any, make a difference in speed and accuracy.

Charlie Hohn

unread,
Apr 29, 2017, 9:09:35 AM4/29/17
to iNaturalist
thanks for a great response. I actually think the difference between our views here is i don't think this little thing is taking away from or diminishing the phenomena of experts helping novices or training people. Thats the heart of iNat, which has been growing exponentially, and this AI thing is just a tiny side project. From what I can tell it was funded by outside sources not CAS. If iNat announced they would stop developing everything except the AI, yes i'd be pissed too. but I wasn't seeing that? And i wasn't seeing anything about iNat telling kids to code instead of learn plants. I'm not sure where that came from. Like some others here i think the tech would be really helpful if used well. As for the focus of iNat, there are ways I would prefer it be a little bit different, as I've posted about here before, but you can't give everyone everything they want and there are so many users now! Overall I think the devs are doing an amazing job and the community is too. I can't believe this common species ID thing would harm the community. 

To address another comment, I wasn't trying to say people who had used iNat longer should have more say in what happens! It isn't some sort of pecking order thing. I was just saying it would be goo to be very familiar with the community before leveling certain criticisms at it. For instance that people are being disingenuous. But I understand what you are saying now and even if I don't really agree, I think it's important to think about.

Now, time to go outside! It's finally spring :)

snuroo

unread,
Apr 29, 2017, 10:28:02 AM4/29/17
to iNaturalist
A very stimulating conversation - feels like the edge of something quite extraordinary. I just want to wade in with something practical - will we be able to add multiple photos per observation? For humans, plant ID is so much easier when you have shots of habit, flowers, leaf arrangement, leaf shape etc. Can the AI handle that? Fascinating stuff, great work guys.


Charlie Hohn

unread,
Apr 29, 2017, 1:01:21 PM4/29/17
to iNaturalist
i just put in a plant in Paraguay with no idea at all and i am pretty sur eit got at least the genus right. I was pretty shocked.

Ben Phalan

unread,
Apr 29, 2017, 2:56:45 PM4/29/17
to iNaturalist
This is an exciting development! All credit to those raising important ethical questions. It's important to discuss those. One can ask some similar questions of iNaturalist without this technology. Does it make us more likely to reach for a smartphone rather than simply observing and appreciating the species around us? This is something I worry about, but on balance I think the benefits far outweigh the costs, in allowing people to access such detailed information about the species around them, and to get to know them more easily than they ever could before. I certainly appreciate the species around me more for getting to know them more intimately.

I tried the demo for some species from Brazil, with mixed results - see below. I'm sure this will improve as we get enough observations of these species to train the algorithms. For now, I guess none of the species (and in some cases, probably no species in the genus) has sufficient observations in the database.

For ten birds (the ten for which I have most recently taken good quality photos), it got correct family-level ID for one, correct order-level ID for 4, and did not suggest an ID for 3 (for one of these, the first result had the correct genus; for another, the correct family; and for the third, the right family was represented in the top ten). It suggested one incorrect genus-level ID (for a hummingbird), and one incorrect family-level ID (placing a tyrannulet in the New World Warblers - which is not unreasonable for a newbie to neotropical birds!)

For ten invertebrates (mostly butterflies), it got (as far as I can tell) correct genus-level ID for 4, correct subfamily for 2, and correct order for 2, and did not suggest an ID for 1 (the top result was in the wrong Order, so that was a good call). It got one completely wrong, suggested a jumping spider genus for a photo of a velvet ant.

I'm impressed by how good it is, and it is striking a pretty good balance between overconfidence and overcaution. One suggestion is that it could make use of the taxonomy tree to highlight species it has not learned yet (e.g. when it identifies a Lasaia sp. it could let us know there are 7 Lasaia species in the taxonomy tree, of which it only has enough images to know one).

Another refinement could be for it to "know" which geographic and taxonomic areas iNat has less comprehensive coverage for, perhaps by using species accumulation curves, and to be more cautious in such areas. In the neotropics, observations in iNaturalist represent only a small fraction of the species present, whereas for birds or butterflies in North America, I'd guess that most species are represented by now. It would also be nice to be able to provide feedback by letting the system know when it suggests something wrong.

Very interesting. I look forward to seeing it develop.

Detailed results - birds:

Crescent-chested Puffbird (Malacoptila striata) - “We're not confident enough to make a recommendation, but here are our top 10 results” Top result: White-whiskered Puffbird (Malacoptila panamensis)

Chestnut-capped Blackbird (Chrysomus ruficapillus) - “We're pretty sure this is in the order Perching Birds” Top result: Brown-headed Cowbird (Molothrus ater)

Masked Water-tyrant (Fluvicola nengeta) - “We're pretty sure this is in the order Perching Birds” Top result: Northern Wheatear (Oenanthe oenanthe)

White-eyed Parakeet (Psittacara leucophthalmus) - “We're not confident enough to make a recommendation, but here are our top 10 results” Top result: White-fronted Parrot (Amazona albifrons)

Picazuro Pigeon (Patagioenas picazuro) - “We're pretty sure this is in the family Pigeons and Doves” Top result: Rock Pigeon (Columba livia)

Sapphire-spangled Emerald (Amazilia lactea) - “We're pretty sure this is in the genus Cynanthus” Top result:  Broad-billed Hummingbird (Cynanthus latirostris)

Planalto Tyrannulet (Phyllomyias fasciatus) - “We're pretty sure this is in the family New World Warblers” Top result: Orange-crowned Warbler (Oreothlypis celata)

White-barred Piculet (Picumnus cirratus) - “We're not confident enough to make a recommendation, but here are our top 10 results” Top result: Shining Bronze-Cuckoo (Chrysococcyx lucidus) [plus several woodpeckers further down]

Yellow-olive Flycatcher (Tolmomyias sulphurescens) - “We're pretty sure this is in the order Perching Birds” Top result:  MacGillivray's Warbler (Geothlypis tolmiei)

Flame-crested Tanager (Lanio cristatus) - “We're pretty sure this is in the order Perching Birds” Top result: Bronzed Cowbird (Molothrus aeneus) [several New Zealand endemic bird species in the top ten]

Invertebrates

Skipper butterfly (species unknown) - “We're pretty sure this is in the genus White-Skippers Top result:  Laviana White-Skipper (Heliopetes laviana) [looks like the right genus to me]

Nymphalid butterfly - “We're pretty sure this is in the order Butterflies and Moths”. Top result: Polyphemus Moth (Antheraea polyphemus) [not really similar]

Lycaenid butterfly - “We're pretty sure this is in the order Butterflies and Moths” Top result:  Juniper Hairstreak (Callophrys gryneus) [which is the correct family]

Hesperid butterfly - “We're pretty sure this is in the subfamily Spread-wing Skippers” Top result:  Glazed Pellicia (Pellicia arina) [not this species, but the subfamily seems correct]

Lycaenid butterfly - “We're pretty sure this is in the genus Hemiargus” Top result: Ceraunus Blue (Hemiargus ceraunus) [genus seems correct]

Metalmark butterfly - “We're pretty sure this is in the genus Lasaia” Top result: Blue Metalmark (Lasaia sula) [not this species, but genus appears to be correct]

Callicore sp. butterfly - “We're pretty sure this is in the subfamily Tropical Brushfoots” Top result: Anna’s Eighty-eight (Diaethria anna) [not this genus, but correct subfamily]

Large centipede (species unknown) - “We're pretty sure this is in the genus Scolopendra” Top result: Vietnamese Centipede (Scolopendra subspinipes) [seems reasonable]

Iridescent green bee (species unknown) - “We're not confident enough to make a recommendation, but here are our top 10 results” Top result:  Drone Fly (Eristalis tenax)

Velvet ant (species unknown) - “We're pretty sure this is in the genus Phidippus” Top result:  Cardinal Jumper (Phidippus cardinalis). [Wrong Class! But two velvet ants did appear in the top ten results]


I also threw in a (rather poor) photo of a mystery insect. I have no idea even what Order this is from. The system placed it in Diptera, but I don't think it is. Any suggestions from the collective intelligence model gratefully received! https://www.inaturalist.org/observations/3644911

QuestaGame

unread,
Apr 30, 2017, 12:27:55 AM4/30/17
to iNaturalist

Charlie - yes, CAS has paid for the coding. It hasn’t paid (as far as I’m aware) for super-botanist Daniel Atha to help Scott's testing in New York, or am I wrong? And if it has, what about all the other super-naturalists on iNat who have helped train the AI? Is it paying for them? One thing I’m confident in - it’s at least as hard to become a super-botanist as a coder of machine learning.


The Aboriginals in Australia learned this lesson the hard way, and are doing all they can to ensure it doesn’t happen again. Come visit us in Oz; we’ll go to AIATSIS (aiatsis.gov.au), and you can see all the regulations that have resulted from centuries of exploiting people’s intellectual rights. 


Looks like the rest of us may need to learn the lesson as well. And yes, we can be defeatist about it. All you have to do is compare the two studies going on here - Scott’s ID Quality Experiment vs the Google-sponsored CVPR 2017 - and you can see the massive disparity in financial commitment.


But I’m really encouraged by this discussion. Clearly we humans, being open and inclusive and loving each other, have the ability to think things through and offer creative solutions. It will probably take us at least 10 years to figure out the lessons of what we’re doing now. (Communications scientists, btw, predicted the “Fake News” problem at least a decade before it happened). I'd be happy to draw up some proposed guidelines for people to consider. I hope, if anything, we’re spawning some fresh thinking with this thread.


Thanks again to everyone for sharing and listening.


-Andrew

Charlie Hohn

unread,
Apr 30, 2017, 7:59:55 AM4/30/17
to iNaturalist
it seems you are assuming us botanists are being somehow secretly scammed out of IDs instead of um, willingly providing them? no one is undert the impression that we are going to get paid. We do it because we want to. If you can find a way to get us paid that's great but considering we barely get paid for doing actual professional botany i'm not holding my breath. Anyway, why not let us make our own decisions? I help ID plants becuase I want to increase the pool of knowledge and help people. THis doesn't interfere with that. Others may have other reasons but those are theres to choose.

Drawing a line from me or other botanists being 'exploited' for IDs being used to design an algorithm, and horrible exploitation of Aboriginal people is pretty darn sketchy in my opinion No one has systematically committed genocide of botanists that i am aware of.

The lack of money for botany isn't because other science is stealing it. The lack of money is because our economy is broken and rigged and money is going to crap like lawns and letting rich people hoard massive amounts of money, and towards bombing peple. I personally don't believe we should squabble for crumbs within science. Instead of me getting mad that a coding project is funded better than my wetlands monitoring, I look at the actual root cause - that bomb they dropped on Afghanistan i'm pretty sure could literally fund my work for a thousand years. 

Charlie Hohn

unread,
Apr 30, 2017, 8:15:59 AM4/30/17
to iNaturalist
so let me ask you this Andrew. Are you making any money at all from the work you are doing including iNat IDs? When questagame sends money to iNat do you get some too? My photos are licensed non-commercial. I don't care if a research algorighm uses them but i'm starting to wonder if I want Questagame using them. 

With all due respect, I'm more worried about Questagame drawing up guidelines on how to use inat data than i am about photo algorithms. Andrew, I urge you to step back a bit. I understand that you have strong feelings about this and maybe feel that it threatens your project, but other than you, the response from the inat community has been mostly positive about this. Please don't come into our community and start telling us how we should use our own data.

Paul Bailey

unread,
Apr 30, 2017, 10:19:12 AM4/30/17
to iNaturalist
Some results from South Korea:

1. http://www.inaturalist.org/observations/4571275
Butterfly: Asian Comma (Polygonia c-aureum)
"We're pretty sure this is in the genus Commas" -- correct genus, but even though I put the location as 'South Korea' I received several North American species as guesses. P. c-aureum was included in 'Top 10'.

2. http://www.inaturalist.org/observations/5673774

Hover fly: Genus Helophilus (possibly Helophilus virgatus)
"
We're pretty sure this is in the subfamily Drone Flies." -- correct subfamily. Two of the suggestions were for species found in the Americas. I supposebased on visual cues rather than range.

3. https://www.inaturalist.org/observations/5890690
Jumping spider: Evarcha albaria
"
We're pretty sure this is in the genus Phidippus" -- incorrect guess. 9/10 suggestions were jumping spiders. Again, some suggested matches only have a distribution in the Americas.

4. https://www.inaturalist.org/observations/6004396
Unknown species of mayfly / Order Ephemeroptera
"We're pretty sure this is in the genus Muscoid Flies and Allies" -- incorrect guess. Suggestions include tachinid flies, house flies, bottle flies, hover flies, a bush cricket, and a dragonfly. I'm guessing the large eyes make identification a little more difficult for the AI.

5. https://www.inaturalist.org/observations/5990806
Butterfly: Short-tailed Blue (Everes argiades)
"We're pretty sure this is in the genus Cupido" -- incorrect guess, but close. All of the results are Blues and the Short-tailed Blue is fourth in the list of suggestions. And again, lots of North American species listed as suggestions.

6. https://www.inaturalist.org/observations/6003362
Stink bug: Sloe bug (Dolycoris baccarum)
"
We're pretty sure this is in the genus Dolycoris" -- correct genus. First result is D. baccarum, with 9/10 of the results being stink bugs.

Michael Ellis

unread,
Apr 30, 2017, 1:01:40 PM4/30/17
to iNaturalist
I greatly applaud these efforts and the amount of work and funding invested in this project. I do not necessarily believe the imaging data should be restricted to pre-existing iNaturalist observations.  There are many peer-reviewed species photos in other Creative Commons projects.

I believe one of one of the number one things holding back the general public from fully understanding and appreciating nature is how hard it can be for the average Joe to identify a species.  If we make it easier to identify wildflowers for the millennials and people of the modern world, we will be bringing about a much better understanding and (hopefully) appreciation of the natural world.

I understand this is controversial but I believe technology is moving quickly in this direction and our only option is to embrace it and try to view it for all of the advantages this can have. I think this technology must be labeled as "Experimental" for many years if it ever makes it onto the iNaturalist platform. It must be very clearly explained that the identifications are merely a list of suggestions, which is not comprehensive and must be reviewed by someone with experience to confirm or verify an identification.

So far it is saving me a lot of time getting to the genus or species level of observations I was never able to figure out before. I consult a field guide or other reference to determine whether or not I might accept one of the automated suggestions.

TLDR; This system provides automated species identification suggestions rather than actual identification, and that must be made clear.

Michael Ellis

unread,
Apr 30, 2017, 2:16:35 PM4/30/17
to iNaturalist
I would love to see the option to process multiple photos to triangulate into a potentially more accurate ID. For example: if you have a photo of the leaves AND the flowers, the system might be able to give a better identification suggestion.

I see this system as being able to be incorporated quite nicely into "Identotron"

James Bailey

unread,
Apr 30, 2017, 10:06:53 PM4/30/17
to iNaturalist
Both Cassin's and plumbeous in California so not helpful.

Plumbeous is a diluted version of Cassin's and has no bright green or yellow like it. So it would have really been a test of seeing if it could detect small colour differences.

James


On Saturday, April 29, 2017 at 5:53:56 AM UTC-7, Scott Loarie wrote:
interesting, did you include a location? Cassin's vireo, plumbeous vireo and blue-headed vireo used to be all considered the same species "solitary vireo" and were split largely based on range. I would think thats an example where location would help the demo distinguish.
On Fri, Apr 28, 2017 at 9:25 PM, 'James Bailey' via iNaturalist <inatu...@googlegroups.com> wrote:
I managed to trip it up with Cassin's vireo from google images. It said it was plumbeous :)

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

James Bailey

unread,
Apr 30, 2017, 10:13:16 PM4/30/17
to iNaturalist
Hit send too early -- sorry if it sounded like I was shooting down your suggestion of adding location.

QuestaGame

unread,
Apr 30, 2017, 10:34:45 PM4/30/17
to iNaturalist

Hm, I think “guidelines” was wrong word. I didn’t mean “guidelines” in a restrictive sense. I meant a kind of headlamp - not slowing AI down (QGame is fully on board with AI), but trying to shed light on the “terra incognita" that seems to create so much confusion.


Anyway - this is my last post for the thread. I believe there are AI solutions that can get us where we want to go - a society that values nature - quickly and cheaply. We just need to think more openly, objectively, creatively - and include as many different viewpoints as possible. So - onward! :-) 

Charlie Hohn

unread,
May 1, 2017, 7:11:36 AM5/1/17
to iNaturalist
sounds good. I apologize for being too down on questagame. I one the collaboration between iNat and QG is long and fruitful, even though I don't fully understand it :)

SummitMetroParks-NaturalResources

unread,
May 3, 2017, 12:30:53 PM5/3/17
to iNaturalist
Well, we tried it with a red spotted newt, toadshade (yellow phase, trillium) and Tachinid fly, and it was spot on for the first two and thought the Tachinid was a housefly.  Very impressive and exciting.  I look forward to its release. . . in the meantime I think I'll start preparing my résumé : )  Rob Curtis

CW Gan

unread,
May 3, 2017, 8:58:22 PM5/3/17
to iNaturalist
We have a prototype that I'd butterflies, it is semi automated in that the photo has to be orientated in a certain direction and theaccuracy is quite good. I think if you can support semi automated id - allow user to crop and orientate the photo, the accuracy will be much better.

jesse rorabaugh

unread,
May 5, 2017, 12:48:13 AM5/5/17
to iNaturalist
This seems like it would be better if it could include pictures from observations which would be research grade were it not for an iNaturalist rules.

There are a lot of species which have been confirmed to genus and are not possible to get to species from almost any photos. If the genus was confirmed by a second person though, it seems like it would be good to train the computer on those photos.

Also, there are species where large numbers of captive individuals have been submitted but not many wild ones. Being able to go around a garden and ID the plants from this system would be a nice application of this. Unless it trains on cucumbers, tomatoes, and other cultivated plants it will be difficult to get there though. 

Charlie Hohn

unread,
May 5, 2017, 7:24:25 AM5/5/17
to iNaturalist
yeah good point, it would be nice to get captive-cultivated species on here more especially since so many newbies add them. And it could add motivation to go out and add some, for the photos. I know sometimes species look different in the wild from cultivation, but i doubt that would mess with things the algorithm cares about too much.

I think a cropping or frame-tagging tool would help it a lot too, since there are so many observations (of plants especially) with more than one species present. It sounds like that might be a pain to program or not a priority though

meu...@landcareresearch.co.nz

unread,
May 5, 2017, 4:41:41 PM5/5/17
to iNaturalist
yet another reason to remove this arbitrary command that cultivated/domesticated records shall not be research grade !!!!  I suppose some day someone will explain this :-)

Scott Loarie

unread,
May 5, 2017, 4:47:05 PM5/5/17
to inatu...@googlegroups.com
We're still doing lots of experiments on what works, what doesn't. For the record, the next version of the model will include training data from captive/cultivated photos. We're also including a lot more training data (at the expense of testing data) as an experiment.  But we're still sticking with labels as species at the moment. So we don't have any labels at the genus rank etc. We've had alot of discussion about data linked to taxa other than species, but it gets complicated if that means multiple models, or overlapping labels etc. So for now, we've just been working with species.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

janetwright

unread,
Sep 25, 2017, 12:10:03 AM9/25/17
to iNaturalist
I think this is a great feature (and works impressively) but I have an example where it is causing a problem. 

Smilax (greenbriers) are very popular observation subjects all over North America.  The top recommendation for many photos of Smilax is coming up as Smilax coriacea.  People are choosing that as their ID suggestion without checking it, but S coriacea is a narrowly distributed species found only in the West Indies and south Florida.  Is there a way to get this changed?  I would think Smilax rotundifolia would be a good one to build into the recommendation, as it is pretty consistent looking, widely distributed and frequently observed.  Thanks!

Charlie Hohn

unread,
Sep 25, 2017, 7:32:20 AM9/25/17
to iNaturalist
There's been some talk (in this forum anyway) about modifying it's spatial sense, ie not letting it recommend things that aren't found anywhere near the user. That would help with this.

Cullen Hanks

unread,
Sep 25, 2017, 8:38:32 AM9/25/17
to inatu...@googlegroups.com
If I recall the question was at what scale?  The app already indicates when an observation has been observed nearby, perhaps this could be highlighted more prominently.

Another way to deal with this would be to get a notification (flag) when you ID an observation to species in a place (state or county level or higher) where there are no RG observations.  This would be beneficial at two levels.  First, it would be an alert that maybe you should double check your ID.  Second, it would provide an indicator of acknowledgement and awareness to the significance of the observation.

-Cullen


--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

stanw...@gmail.com

unread,
Sep 25, 2017, 1:20:24 PM9/25/17
to iNaturalist
I recognize all the potential limitations, but I was amazed with just putting in a couple of my photos (Texas Kidneywood) and Whitebrush and it IDed them perfectly.

I also realize this is a huge endeavor and it will not work with very similar species, but I believe for the beginner it has great potential.

Thank you so much for pursuing this.

Gonzalo Zepeda

unread,
Sep 28, 2017, 12:41:28 PM9/28/17
to iNaturalist
Creo que es un gran logro. Felicidades.

stan & wendy drezek

unread,
Sep 28, 2017, 4:31:49 PM9/28/17
to inatu...@googlegroups.com
I since tried automated species identification on 10 relatively common plants:  Western Soapberry, Mealy Bluestem, Buffalo Bur, Wild Petunia, Rock Rose, Forstweed, Parthenium hysterphorus, Prickly Pear, Spiny Hackberry, and Purple Passionflower.

In six of the ten it nailed it.  In two more it gave the correct species as a second choice; in one it was a third choice.  In only one case did none of the suggestion match.

I believe it is most amazing.

stan

--
You received this message because you are subscribed to a topic in the Google Groups "iNaturalist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/inaturalist/17wEweW5zZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to inaturalist+unsubscribe@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.



--
6 Westelm Garden
SAT 78230-2632
1-210-493-0939 Wendy Drezek
1-210-464-1365 Stan Drezek

James Bailey

unread,
Sep 30, 2017, 1:36:57 PM9/30/17
to iNaturalist
It has issues with insects, posting superficially similar options but not the right one, which is often 4th or 5th down the list or not shown at all. There are quite a few research grade obs of these species.

It does remarkably well on plants, however.

Alice Abela

unread,
Oct 2, 2017, 2:46:06 AM10/2/17
to iNaturalist
Some less definitive language when delivering ID's might be good. "Pretty sure" seems a bit too confident for inverts at this point. I got the following response after identifying a mantid posted as an orthopteran: 

"We also thought it was a mantid, but accepted the "best choice" provided by the app."

I ran it through a few pics, I was really impressed that it was able to correctly get "Trimerotropis" right for a grasshopper I tested it against. It was able to identify flies as flies, but it provided species suggestions that that weren't in the same family. I got the same result for wasps (it was able to get family right on one, but the species it was pretty sure on was in a different subfamily). It seems be fairly good at getting things down to order level (mantid notwithstanding). Maybe unless it can gets refined further, it could say something like: "we're pretty sure it's in X order, here are some possibilities..." 

But, I can also see people not thinking critically about the auto-ID, and agreeing with it, and other people agreeing with them, increasing the number of errors in research grade identifications. I think this may be solvable in part by tweaking the language, or limiting how low it goes taxonomically.

It is leagues ahead of whatever system they have generating tags on Flickr: that one can't tell the difference between grasshoppers and lizards.

-Alice

James Bailey

unread,
Oct 3, 2017, 3:24:09 PM10/3/17
to iNaturalist
For ANYTHING non-butterfly or moth, I suggest it defaults to the family for ID (or subfamily if it is really sure, maybe genus in some cases?). At least at this point in time.

stan & wendy drezek

unread,
Oct 3, 2017, 3:35:23 PM10/3/17
to inatu...@googlegroups.com
I am not sure what the best default may be.  First, I believe the ID is pitched as a suggestion more than a definitive ID.  Second, in the case of common plants my experience is it has been excellent in actually identifying correctly about 60% of the species, having another 20% in its list, and blowing only about 20%.  Since the person new to nature is going to benefit, it seems to me at least for plants letting it go to species if it "wants".

I continue to be amazed at just how good it is.  Obviously with many, many genera where the differences among the species are relatively technical and hard if not impossible to see in a photo, the ID tool isn't going to be a lot of help.  But I really think for the beginning user it is a great help.  But again it should be emphatically pitched as suggestions and not formal IDs.

stan



On Tue, Oct 3, 2017 at 2:24 PM, 'James Bailey' via iNaturalist <inatu...@googlegroups.com> wrote:
For ANYTHING non-butterfly or moth, I suggest it defaults to the family for ID (or subfamily if it is really sure, maybe genus in some cases?). At least at this point in time.

--
You received this message because you are subscribed to a topic in the Google Groups "iNaturalist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/inaturalist/17wEweW5zZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to inaturalist+unsubscribe@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

Charlie Hohn

unread,
Oct 3, 2017, 5:04:25 PM10/3/17
to iNaturalist
for plants, identifying to family makes no sense, especially since the algorithm isn't working through a key and identifying things to family level before genus and species, but looking at each species independently.  IE, i think it would have an easier time identifying New England Aster than knowing it's in Asteraceae (without guessing the species) based on its visual characteristics. As mentioned above they should still be presented as suggestions, but for plants, i don't think 'backing up to family' will actually result in better IDs.

I think there are two main issues here: one is the location filter. I just got a plant suggested as 'seen nearby' and looking at the map it was in new jersey. i am in Vermont. That's not nearby. And would be even less 'nearby' in places like California with more spatial variability. Things should only suggest if they are within 100 miles or something, or at least anything else should have a big red OUT OF RANGE sticker on it or something. 

The second issue is people just accepting the IDs without looking at them. As usual, it's mostly students using the site under 'duress' who do this. it was bad enough when they were assigned to use iNat and some of them put in the minimal effort, but now since the app identifies the species for them (often incorrectly), many don't even try to key them out beyond that. The student issue is a different issue that's been discussed in detail, but in terms of the algorithm, i think it really does need to display something different when the ID comes from that.  Even though it's annoying for us other users who use it properly.  Alas.


On Tuesday, October 3, 2017 at 3:35:23 PM UTC-4, stan & wendy drezek wrote:
I am not sure what the best default may be.  First, I believe the ID is pitched as a suggestion more than a definitive ID.  Second, in the case of common plants my experience is it has been excellent in actually identifying correctly about 60% of the species, having another 20% in its list, and blowing only about 20%.  Since the person new to nature is going to benefit, it seems to me at least for plants letting it go to species if it "wants".

I continue to be amazed at just how good it is.  Obviously with many, many genera where the differences among the species are relatively technical and hard if not impossible to see in a photo, the ID tool isn't going to be a lot of help.  But I really think for the beginning user it is a great help.  But again it should be emphatically pitched as suggestions and not formal IDs.

stan


On Tue, Oct 3, 2017 at 2:24 PM, 'James Bailey' via iNaturalist <inatu...@googlegroups.com> wrote:
For ANYTHING non-butterfly or moth, I suggest it defaults to the family for ID (or subfamily if it is really sure, maybe genus in some cases?). At least at this point in time.

--
You received this message because you are subscribed to a topic in the Google Groups "iNaturalist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/inaturalist/17wEweW5zZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to inaturalist...@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

Alice Abela

unread,
Oct 4, 2017, 2:05:32 AM10/4/17
to iNaturalist
I can see it working well for plants, but I think insects are going to be a problem for species level identification. 

I don't know anything about programming, but I think Donald Rumsfeld's quote is apt:
"...there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones."

Unlike with plants when you get into insects I would think there are a lot of "unknown unkowns" from an image recognition stand point.With very little effort, pretty much anyone can go out and photograph a species that's never been photographed before or even a species that's never been described. I went down to a popular local beach and photographed three species of fly that were undescribed at the time. The computer was pretty sure they were in the genus Stichopogon, none of the species options were in the right family and most were 5-10+ times larger. Maybe having the user approximate size would help somewhat? So far, flies seem to be it's biggest blind spot. It does, however, correctly identify them as flies which is pretty impressive (better than a lot of people). I tried to fool it with a wasp mimic, it got that it was a fly right, but suggested a subfamily in the wrong family. I think with so many of the family level splits depending on things like wing venation, it'd be really hard separating things like black and yellow drone flies and soldier flies based on gestalt (I threw in a soldier, it went with a drone fly). But, I imagine eventually (if it's not already happening) it could factor in things like wing venation as a check.

I tried to stick to tightly cropped non-busy images when I tried it against some other inverts. It only returned spiders as options when I tried it on a solifugid, but it did get one of the three psyllids I tried it on right (it thought the other one was a meadow spittlebug [right order] and couldn't make a determination on the third). Various pale rhaphidophorids stumped it ( but I really prefer this over a bad ID, top visual matches were beach hoppers). It was pretty sure my Rhaphiomidas parkeri was an orthop. I couldn't make a call on my southern mole cricket (it's first result was an atlantic ghost crab, but the number two result was a southern mole cricket [pretty impressive]; I wonder if the sand was throwing it, mine was on sand and it returned a lot of crabs and most of the mole cricket images I've run across on inaturalist are on white sheets). It couldn't make a call on my Odontophotopsis sp., but the top two results were still velvet ants which was again really impressive, on the flip side there were three species of plover, a coyote, a rabbit, a crab, and two beetles also in the top 10. In it's defense, the jpegs that I had sitting around that I tried it against were not species that are well represented species (or represented at all on inaturalist) in terms of existing images.

GanCW

unread,
Oct 4, 2017, 10:30:35 AM10/4/17
to iNaturalist
I have tested it against moths and butterflies which i photographed in Singapore and Malaysia. In almost every cases Identotron is able to provide more accurate id that computer vision. I am not sure if this is due this demo system has smaller data set ?

As pointed by others there are very similar looking species from different taxa which AI may not be able to even recommend the correct species or even higher level Taxa due to lack of data set or information that is not available to the sysyetm. However if we can provide additional parameters such as size or taxa -  order, family,  genus to assist the system in narrowing the data set, it may be able to provide more accurate suggestion .

James Bailey

unread,
Oct 4, 2017, 3:11:25 PM10/4/17
to iNaturalist
For clarification, I was only talking insects above. Plants it does fine recognizing.

Much of the problem with insects is that many identifications of difficult groups (for instance Ichneumonid wasps) only have research grade on a handful of species. So the automated system knows what those are, but not what Ichneumonids are in general because it only takes research grade observations.

bouteloua

unread,
Oct 4, 2017, 5:38:52 PM10/4/17
to iNaturalist
For plants it does ok, but I'll also echo that so many misidentifications from California and New Zealand plants are showing up in the Chicago region. It is really frustrating when 30 students armed with an imperfect AI misidentify the same plants and then they all agree with each other's observations...

emra...@gmail.com

unread,
Oct 5, 2017, 1:46:23 PM10/5/17
to iNaturalist
I haven't followed this whole thread or used the new feature much, but the few times I've used it (almost by accident) I've been very impressed with the results. A couple of times I've had a batch of observations ready to go except for one that I hadn't identified, stuck the cursor in the "Species Name" box, and then started trying to figure out what it was in another tab, only to return and discover that the auto-suggest had come up with a better match than anything I had been able to fine. It's definitely not perfect, but I applaud the work that has gone into it. As an example, I was pretty sure that this was some kind of wasp, but couldn't get any closer with field guides or internet searches, and then I flipped back to the tab in which I was in the process of adding observations and discovered that the AI had figured out what it was, and a quick check on BugGuide confirmed that it was almost definitely correct. Computer magic. Thanks, iNaturalist team!

GanCW

unread,
Oct 6, 2017, 9:20:17 AM10/6/17
to iNaturalist
How do I get to the AI identification after an observation has been submitted ?

and n

Charlie Hohn

unread,
Oct 6, 2017, 4:33:15 PM10/6/17
to iNaturalist
on the website, if you click on the place where you'd add an ID, it (usually) runs the algorithm and will show you the top choices there.

GanCW

unread,
Oct 7, 2017, 4:49:23 AM10/7/17
to iNaturalist
Say from this page , how do I get to the AI ID page ?

Chris Cheatle

unread,
Oct 7, 2017, 9:07:20 AM10/7/17
to iNaturalist
I'm not sure about a "page", but click the Suggest An Identification tab (even if it is your own observation and you already have done an ID), when it is active place your mouse and activate the Species Name box, in a moment the suggestion process will begin to populate.

jesse rorabaugh

unread,
Oct 8, 2017, 8:36:42 PM10/8/17
to iNaturalist
As far as I can tell the big problem with arthropods is that there are just too few species with over 20 observations for the algorithm to make much progress. Flies make a good example. There are about 125,000 described species of fly, 3048 of which have been submitted to iNaturalist. However only 139 species have twenty research grade submissions. Of those 139 species, 60 are hover flies. If the fly you photograph is one of the 139 species in the algorithm I suspect it will correctly identify it quite often. Chances are the fly you found isn't on the list though. It gets much worse with more obscure groups of arthropods. Only one species of springtail, one species of thrip, ten species of aphids, four species of scale insect, make the cutoff.

I have been actively trying to push a lot of common arthropods over that twenty limit by resubmitting regularly many underrepresented species. It does seem like at the rate submissions come in three years from now this will not be nearly as much of an issue.

There is also a problem that the algorithm seems not to realize how bad the data set is with arthropods though. It gives a huge number of guesses for genus or family that are way off while stating that it is "pretty sure." 

Janet Wright

unread,
Oct 8, 2017, 8:41:31 PM10/8/17
to inatu...@googlegroups.com
Thanks, Jesse, for the insights on how the process works for insects.  I had thought it pointless to submit multiple observations of familiar insects, but I see now why it’s useful.  



Chris Cheatle

unread,
Oct 8, 2017, 9:55:43 PM10/8/17
to iNaturalist
I would think another benefit to multiple submissions of common things is that if they are able to better integrate geographic or range intelligence into the suggestion algorithm, then better clarity on ranges from those "common" species observations will be achieved.

GanCW

unread,
Oct 9, 2017, 1:39:29 AM10/9/17
to iNaturalist
Hi Chris,
Thanks. I didn't realise the Suggest an Identification page was auto iding the species as I usually just type something there before the system has a chance to suggest ! 

GanCW

unread,
Oct 9, 2017, 1:45:50 AM10/9/17
to iNaturalist
Jesse,
Fully agree. There are just insufficient data for insects and most users use it as a mean to store their life list so they do not upload similar species.  However we now have quite a few users in Singapore and Malaysia who regularly submit observations and the id will only improve over time.  Another benefit of having similar observation is we get a view of the spcies from different angles and not just the perfect angle as in typical reference or gudebook.

Charlie Hohn

unread,
Oct 9, 2017, 8:13:24 AM10/9/17
to iNaturalist
I dunno, for plants i think that repeated observations of common species are REALLY important. Often common species stop being common due to things like plant diseases, or sometimes an area of habitat is lost and we don't even know what used to grow there. This is true for vast swaths of California. I know insects move around more than plants but even still. When things are affected by phenology it's helpful to track over time. With plants, if it's not phenology related i usually won't keep putting the same population of plants in again and again. But other than that... have at it if you want to. It isn't hurting anything and could be helpful.

AfriBats

unread,
Oct 9, 2017, 9:58:02 AM10/9/17
to iNaturalist
Maybe worth cross-linking to this thread, which raised a couple of concerns regarding AI identifications, and how they are handled by the system.

AfriBats

unread,
Oct 9, 2017, 10:36:24 AM10/9/17
to iNaturalist
And here's a good example where a user is apparently trusting the IDs without checking, requiring loads of efforts to bring the IDs back on track www.inaturalist.org/observations/jinc249

I really think something needs to be changed, as marvellous and useful this new tool is.

Jakob

GanCW

unread,
Oct 9, 2017, 11:17:59 AM10/9/17
to iNaturalist
For this observation, Compare gives a correct id while Suggest an ID does not have the correct id in the suggestion list.

Charlie Hohn

unread,
Oct 9, 2017, 11:18:39 AM10/9/17
to iNaturalist
pretty much all the 'under duress' students in California do this exact thing, especially my favorite ones down in so-cal. Because CA has a lot of iNat obs it is a little better than when people try it elsewhere, but it's still a bummer.

I love the algorithm, I think it is really neat, but if the purpose is to help get things ID'ed to research grade accurately, it's not fulfilling that goal. It's instead making it a lot harder. When I do IDs for so-cal plants i spend a lot of time correcting these. And even still i miss most of the ones that are marked as research grade due to students agreeing with each other when they shouldn't. 

Pat Lorch

unread,
Oct 9, 2017, 5:29:39 PM10/9/17
to iNaturalist
I have been loving the AI.  We did a recent bioblitz in one of the Cleveland Metroparks (North Chagrin Reservation) meant to compare to a thesis done in the same woods in 1935 by A.B. Williams (https://clevelandmetroparks.com/parks/learn/blogs/notes-from-the-field/2013/may-2013/our-first-naturalist).  We (roughly 7 of us) made over 200 observation of roughly 154 species in iNat.  I spent several hours with an expert on local fungi Bill Kurpiewski, he identified fungi and I photographed and entered them in iNat.  He found 58 species of fungi.  The AI got most of them to species. Very impressive.

One thing that would help with fungi (something similar probably applies to other groups with specialized photos needed for good ID) would be to have iNat suggest a picture of the underside of mushrooms. 

My experience with the AI has got me interested in trying to add my own fungus observations, but I am not sure it provides an incentive to learn to ID fungi myself.  Has anyone thought about whether the AI is a dis-incentive for naturalist to learn a new taxon?  

Charlie Hohn

unread,
Oct 12, 2017, 3:23:25 PM10/12/17
to iNaturalist
Here's an odd one, attached. I observed a beech tree in literally the same place as several other observations of the species. But it suggested something else, which is found in New Zealand - Mountain Horopito.  Weirder, it doesn't even mention beech as observed nearby, when it was observed within 50 feet of this point. I've had others say 'observed nearby' when the nearest other one was 100 miles away.  So to say the least, the geographical proximity feature needs to be update, especially in the case of species like beech and Mountain Horopito both of which have tons of observations. 

I also see several other mountain horopito observations in the eastern US... guessing they are all wrong as they are all recent and presumably algorithm fed.
Capture.PNG

phidippu...@gmail.com

unread,
Oct 14, 2017, 12:38:57 AM10/14/17
to iNaturalist
If several images of an organism are uploaded (different angles, magnifications, body/plant parts, habitat impression etc), does automated species identification take all of these pics into account and makes a suggestion based on an 'average or median best fit' or does it just look at the first image or does it pick out the best recognizable one?

On a different note, as has been suggested before, please make it clear on the interface (especially important to new users) that the suggested ID should be taken carefully and change the phrase 'we're pretty sure' to something less presumptuous :)

GanCW

unread,
Oct 14, 2017, 11:43:31 PM10/14/17
to iNaturalist
Hi ken-ichi,
is the Ai identification really using geographical information when recommending id ?

I am asking because when I try to use it to id Faunis canens  (a species that is only found in South-East Asia) in this observation :

the system recommends Archaeoprepona demophon , a species that is found in the America and there is no observation records in South-East Asia !




Why is it suggesting a species that is not found in where the observation was recorded while  there many research grade observations of Faunis canens in iNat database ?


Ken-ichi Ueda

unread,
Oct 15, 2017, 12:43:22 AM10/15/17
to iNaturalist
It is using nearby records, but only to insert some frequently-observed taxa and weight the results, *not* to exclude results. The reason it's including A. demophon in the results is because it thinks your photo looks like images of that species on iNat. The reason it's not excluding that taxon from the suggestions even though it hasn't been observed nearby is because we felt it was more important to include some suggestions even if we know they're unlikely due to a lack of nearby records to handle cases when people are observing in areas where we have no data at all. Let's say you're trying to get suggestions for a butterfly in Siberia. iNat knows almost nothing about butterflies in Siberia, so we think it's better to show some unlikely suggestions instead of zero suggestions. That way a novice might figure out that it's in the family Nymphalidae or something, even if none of the suggestions look exactly right.

I think a better critique here is why is it suggesting A. demophon as the top result? The answer is that at the time we trained the current computer vision system, we didn't think we had enough records to train it on F. caenens, so it doesn't even know about that species, and thus it never includes it in suggestions.  The current system was trained on an export from 12 May 2017. So you can get a sense for the available records at something like https://www.inaturalist.org/observations?d2=2017-05-12&locale=en&place_id=any&preferred_place_id=14&quality_grade=research&subview=grid&taxon_id=415281&view=observers. When we decide whether to include a species, we look at the number of people who have made Research Grade obs of that species (or obs that would be RG except they're captive). If that number is less than 10, that species isn't even included in the training. I think the fact that you're seeing 11 unique observers in the obs search results is b/c there were some IDs added since then. Also, even when we have a species with 10 people who have observed it, we randomly partition those observers into three groups. One group's photos we use for actually training the system, another group's photos we use for validating the system while it's training so we have some sense of when to stop training (e.g. there's no use continuing to train it if it's 99.9999% accurate on everything), and then a third group's photos is used for testing after the entire training is done. So if you have 10 people who have observed a species but 1 person has contributed 100 photos and everyone else just 1 each, that 100-photo person might end up in the testing group and we wouldn't even use their photos in the training process.

I realize that's complicated and kind of confusing (to be honest I don't understand the entire process myself), but the remedy is simple: get more people to add more photos of both species and the system will learn to tell them apart. ~50 observations of a species is not a lot as far as the training is concerned, especially if only a few people have added most of those observations. We are re-training the system right now on new data from an export on 28 August and that version of the suggestions will include F. caenens, b/c we've acquired more records by more people since last May.

-ken-ichi

GanCW

unread,
Oct 15, 2017, 9:07:12 AM10/15/17
to iNaturalist
Hi ken-ichi,
Thanks for the explanation which is very helpful. Now that we know the magic number '10', I will try to get that for as many species as possible.

How frequently does the computer vision systems get re-train with newer data set ?

gan

Ken-ichi

unread,
Oct 15, 2017, 9:52:54 AM10/15/17
to inaturalist
10 people is the lower limit, but the magic number is really "more." We're planning on retraining every few months. It's a very computationally intensive process and on our current hardware each training takes a few weeks.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.

Charlie Hohn

unread,
Oct 15, 2017, 10:17:11 AM10/15/17
to iNaturalist
I know in the long term the neat work going on with the Atlases will come in to play. In the short term, i think it would really help to have some sort of alert on the app and/or the website when the nearest observed example is far away. Maybe one for 100 and one for 1000 KM, maybe even a filter in ID please to find these 'distant' observations. It would offer another benefit - some of these outliers are real and would represent outliers or new invasives we really want to track.

Otherwise... the issue with odd species being identified way out of range is getting worse, and i fear it will be really hard to keep up on as the site grows. I spend a fair bit of time chasing around IDs of common New England or California plants that pop up way out of range. it's kind of a bummer, i know i am obsessed with the range maps, but it makes those less viewable and also obscures those neat outliers, for instance someone found some sugar maples growing in the mountains in Utah (planted but still neat and weird) that would get missed if they were buried in 3000 observations of sycamore or whatever out there.

And yeah as the algorithm gets better it will happen less but I still find it hard to believe it won't happen at all. Then again, the extent to which the algorithm works NOW is something I thought was decades away a year ago, so who knows.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

GanCW

unread,
Oct 16, 2017, 12:29:40 AM10/16/17
to iNaturalist
Yes the more the better but now we don't even have 10 so my focus is to get as many species to have 10 so that beginners can use Computer Vision to assist them with id.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

David K

unread,
Oct 20, 2017, 8:55:26 PM10/20/17
to iNaturalist

1) It's a very computationally intensive process and on our current hardware each training takes a few weeks.

I am not a computer specialist so this could be an overly naive question, but given the effort involved, are you retraining on all eligible species, or just select species?  For example, given that  Mallards, Monarchs and 10 other species all have >10,000 observations, is there any point in devoting processing time to groups like that?  or even those with 500 or 1,000 observations.  

I would assume that there is a sweet-spot in the distribution of species, with those having between, for example,10 and 100 observations providing the greatest reward if computer resources are constrained (I'm making these boundaries up for discussion purposes, the point is whether there is such a subset, not whether its bounded by 10-100 or 50-250).  Or do you have to process the whole dataset every time to train the system.

2) Are you able to expand the training set by accessing curated photos from places like bugguide, MPG, EOL or any useful regional/taxonomic specialists? 

David

tony rebelo

unread,
Oct 21, 2017, 3:43:32 AM10/21/17
to iNaturalist
If iNat goes a reputation based route, might it perhaps be best to train on observatons with the highest ID scores?

Ken-ichi

unread,
Oct 21, 2017, 3:13:04 PM10/21/17
to inaturalist
We train on all species for which 10 more people have made Research
Grade observations, which narrows things down a bit. To my knowledge,
you can't really rebuild a model piecemeal like you suggest, i.e.
carve off the Monarch part of the model but retrain on weird pill bugs
b/c those could use improvement. The system is learning from all
images, so you need to train with the entire dataset.

We haven't reached out beyond iNaturalist for training data, and it's
debatable whether or not that would really improve the system. Photos
from sites like BugGuide almost certainly would, because like iNat
they're also mostly in situ photos taken by (relatively) normal people
with normal camera setups. MPG data might actually really screw the
model up for moths b/c all the in situ photos are from the same angle
and there are tons of images of pinned specimens, which have pretty
different visual properties than the kind of photos that are usually
posted to iNat. On top of all that, we don't really have the right to
access that information, and even if we did, most of these sites don't
make their photo data easily available (EOL does, though).
> --
> You received this message because you are subscribed to the Google Groups
> "iNaturalist" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to inaturalist...@googlegroups.com.

Alex Rebelo

unread,
Oct 22, 2017, 1:31:45 PM10/22/17
to iNaturalist
Very cool, although it can struggle with some African taxa (probably because of the lack of RG training observations).
Would it not make sense to have an option to only show the taxa found in the area of locality? (Obviously it is useful to also see all matches in some circumstances too).

GanCW

unread,
Oct 23, 2017, 11:40:23 AM10/23/17
to iNaturalist
Yes,  having the option to only show taxa found in the area of locality will be awesome !

Sam McNally

unread,
Nov 8, 2017, 4:14:39 PM11/8/17
to iNaturalist
A bit new to this thread, and apologies if this has already been asked, but could there be a way to incorporate Establishment data into suggestions? e.g. if a species is listed as "Endemic" on the California checklist, perhaps it shouldn't be suggested for an observation seen in Borneo, or at least be given lower priority. Also, is there a way us users can help "teach" the AI when it gets something wrong, or conversely, "praise" it when it gets something right (negative and positive reinforcement)?  I've seen several beetle grubs ID'd as Xystocheir dissecta (a millipede), and suspect faulty auto ID is the culprit.

Thanks,

Murray Dawson

unread,
Feb 22, 2018, 4:12:01 AM2/22/18
to iNaturalist
The page at https://www.inaturalist.org/computer_vision_demo has just started to fail to upload images. I've tried on different browsers, devices and images but it's definitely not working at the moment.

I do hope that this page is going to be kept working as it's sometimes really useful to identify an image through this page before uploading - for naming files after the species name for example as part of a workflow. 

Would be great to know what's up with it at the moment.

Tony Iwane

unread,
Feb 22, 2018, 11:21:05 AM2/22/18
to iNaturalist
Hi Murray,

The computer vision model has been integrated into the standard upload page for awhile now, does that not work for your workflow?

Tony
It is loading more messages.
0 new messages