AI bad fail

512 views
Skip to first unread message

cray man

unread,
Sep 22, 2017, 1:10:42 PM9/22/17
to iNaturalist
In the following observation, AI suggested a New Zealand cicada species for something occurring in the United States (Ohio).  And the observer confirmed without knowing this.

https://www.inaturalist.org/observations/7800670

I believe I've seen a couple other US cicadas IDed as this species. 

First off, a molted shell is not very identifiable to any species and secondly something from New Zealand should not be a top choice for something in the united states especially with cicadas being very unlikely to be introduced into other areas.

Maybe provide feedback to the AI developers.

Scott Loarie

unread,
Sep 22, 2017, 1:31:22 PM9/22/17
to inatu...@googlegroups.com
Hi Dan,

At the moment, the AI only has ~20,000 species to suggest based on 'visually similar' - these are the ones for which we have enough data to include in the model
We are also using location data to add more suggestions ('seen nearby'). But we're currently not using location data to remove species.

I agree this would be cool to prune suggestions based on what doesn't occur nearby, and is a big priority at least for me. But its not trivial. We can't use observations to throw out species since they are not complete representations of a species range, so likely we'll need to use taxon_ranges or listed_taxa or something new for this....

We should all be aware that the AI might increase the number of bad ideas because of these issues you mention: (1) tendency for people blindly following the suggestions combined with (2) suggestions that aren't always perfect. We are monitoring this, thinking of improvements for each of these

In the meantime, this means we as IDers all need to be vigilant. One feature I'm a personal fan of are Atlases which can be used to surface out of range observations like the one you mention. For example, I just made one for Chorus Cicada (Amphipsalta zelandica) https://www.inaturalist.org/atlases/11476 and it is indeed marked as having out of range observations. You can read more about atlases here https://www.inaturalist.org/pages/atlases and please feel free to help resolve atlases in the marked atlas feed which you can find here: https://www.inaturalist.org/atlases/

Longer term, we're currently working on ways to evaluate/improve 'Research Grade' so that it is tied to predictive accuracy of identifications, potentially including things like user earned reputation, rather than simple consensus https://www.inaturalist.org/pages/identification_quality_experiment

We're also, working on many ways to improve the accuracy of the suggestions both on the 'visually similar' front and the spatio-temporal 'seen nearby' front 



--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.



--
--------------------------------------------------
Scott R. Loarie, Ph.D.
Co-director, iNaturalist.org
California Academy of Sciences
55 Music Concourse Dr
San Francisco, CA 94118
--------------------------------------------------

Matt Goff

unread,
Sep 22, 2017, 2:43:11 PM9/22/17
to inaturalist

Is there a way to indicate when an initial observation id was selected from the first option the AI presented? I'm seeing observations come in (especially from high school students using iNaturalist for class) that are suspiciously precise. That is, things identified to species that are challenging (primarily mushrooms, but also some terrestrial arthropods) based on limited photos/info. 

I often don't know enough myself to confidently disagree with an observation, even when I'm pretty sure it's wrong. I'm not sure if this should matter, but I do feel like if I knew the initial observation was based on the AI top suggestion, I would be more willing to disagree, even when I'm not sure what it is. At least for now, the AI doesn't seem to be especially reliable for a number of groups in south coastal Alaska. (Probably because too many things that regularly occur here are not in the ~20,000 species available to it.)

A related issue I know others have brought up before is the agreement that classmates will sometimes give each other, thereby lifting a questionable observations research grade. Especially in the case where they're both using the AI (perhaps having both taken pictures of the same organism) - that's really only one (independent) id, so research grade is perhaps problematic.

Thanks,

Matt


Scott Loarie

unread,
Sep 22, 2017, 2:48:49 PM9/22/17
to inatu...@googlegroups.com
>Is there a way to indicate when an initial observation id was selected from the first option the AI presented? 

This is being tracked and stored but not displayed

Charlie Hohn

unread,
Sep 22, 2017, 2:51:31 PM9/22/17
to iNaturalist
Yeah, I personally think a very broad filter of at the most 1000 miles or something should be applied to the ID feature. It makes sense that it's hard to do that considering iNat isn't represented in a geographically even way. But I've definitely had the algorithm suggest California plants in Vermont, and perhaps more importantly, the student project in LA that already was generating a lot of problematic data is now throwing out all kinds of wayyyy out of range IDs presumably because of this. I know it's a tricky one to solve! But for what it's worth, to me the problems with having things suggested that are >1000 miles away outweigh the positives. Especially for taxa that tend to be more geographically restricted, such as plants.


On Friday, September 22, 2017 at 2:43:11 PM UTC-4, Matt Goff wrote:

Is there a way to indicate when an initial observation id was selected from the first option the AI presented? I'm seeing observations come in (especially from high school students using iNaturalist for class) that are suspiciously precise. That is, things identified to species that are challenging (primarily mushrooms, but also some terrestrial arthropods) based on limited photos/info. 

I often don't know enough myself to confidently disagree with an observation, even when I'm pretty sure it's wrong. I'm not sure if this should matter, but I do feel like if I knew the initial observation was based on the AI top suggestion, I would be more willing to disagree, even when I'm not sure what it is. At least for now, the AI doesn't seem to be especially reliable for a number of groups in south coastal Alaska. (Probably because too many things that regularly occur here are not in the ~20,000 species available to it.)

A related issue I know others have brought up before is the agreement that classmates will sometimes give each other, thereby lifting a questionable observations research grade. Especially in the case where they're both using the AI (perhaps having both taken pictures of the same organism) - that's really only one (independent) id, so research grade is perhaps problematic.

Thanks,

Matt


On Fri, Sep 22, 2017 at 9:31 AM, Scott Loarie <loa...@gmail.com> wrote:
Hi Dan,

At the moment, the AI only has ~20,000 species to suggest based on 'visually similar' - these are the ones for which we have enough data to include in the model
We are also using location data to add more suggestions ('seen nearby'). But we're currently not using location data to remove species.

I agree this would be cool to prune suggestions based on what doesn't occur nearby, and is a big priority at least for me. But its not trivial. We can't use observations to throw out species since they are not complete representations of a species range, so likely we'll need to use taxon_ranges or listed_taxa or something new for this....

We should all be aware that the AI might increase the number of bad ideas because of these issues you mention: (1) tendency for people blindly following the suggestions combined with (2) suggestions that aren't always perfect. We are monitoring this, thinking of improvements for each of these

In the meantime, this means we as IDers all need to be vigilant. One feature I'm a personal fan of are Atlases which can be used to surface out of range observations like the one you mention. For example, I just made one for Chorus Cicada (Amphipsalta zelandica) https://www.inaturalist.org/atlases/11476 and it is indeed marked as having out of range observations. You can read more about atlases here https://www.inaturalist.org/pages/atlases and please feel free to help resolve atlases in the marked atlas feed which you can find here: https://www.inaturalist.org/atlases/

Longer term, we're currently working on ways to evaluate/improve 'Research Grade' so that it is tied to predictive accuracy of identifications, potentially including things like user earned reputation, rather than simple consensus https://www.inaturalist.org/pages/identification_quality_experiment

We're also, working on many ways to improve the accuracy of the suggestions both on the 'visually similar' front and the spatio-temporal 'seen nearby' front 


On Fri, Sep 22, 2017 at 10:10 AM, cray man <danjoh...@gmail.com> wrote:
In the following observation, AI suggested a New Zealand cicada species for something occurring in the United States (Ohio).  And the observer confirmed without knowing this.

https://www.inaturalist.org/observations/7800670

I believe I've seen a couple other US cicadas IDed as this species. 

First off, a molted shell is not very identifiable to any species and secondly something from New Zealand should not be a top choice for something in the united states especially with cicadas being very unlikely to be introduced into other areas.

Maybe provide feedback to the AI developers.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.
--
--------------------------------------------------
Scott R. Loarie, Ph.D.
Co-director, iNaturalist.org
California Academy of Sciences
55 Music Concourse Dr
San Francisco, CA 94118
--------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

Charlie Hohn

unread,
Sep 22, 2017, 3:21:50 PM9/22/17
to iNaturalist
Also, in terms of your point about students, Cray Man, there are several places it's been discussed if you want to join in those conversations.  
to link just a few.
and the study that Scott mentioned. 
It's a known issue though usually just limited to a few areas. A lot of them can be flagged as captive/cultivated which removes them from RG anyway.

cray man

unread,
Sep 22, 2017, 3:37:57 PM9/22/17
to iNaturalist
Thanks Scott,  good information.  I need to dig into these atlases when I get some time.

AfriBats

unread,
Sep 23, 2017, 1:48:41 PM9/23/17
to iNaturalist

Is there a way to indicate when an initial observation id was selected from the first option the AI presented? I'm seeing observations come in (especially from high school students using iNaturalist for class) that are suspiciously precise. That is, things identified to species that are challenging (primarily mushrooms, but also some terrestrial arthropods) based on limited photos/info. 

I often don't know enough myself to confidently disagree with an observation, even when I'm pretty sure it's wrong. I'm not sure if this should matter, but I do feel like if I knew the initial observation was based on the AI top suggestion, I would be more willing to disagree, even when I'm not sure what it is. At least for now, the AI doesn't seem to be especially reliable for a number of groups in south coastal Alaska. (Probably because too many things that regularly occur here are not in the ~20,000 species available to it.)

I think this is an excellent and super important suggestion - please make it visible which IDs have been made by AI. I would even favour a solution that AI identifications are showing up in a different way, eg like a placeholder and thus without the sometimes all-too-easy possibility to agree with that initial AI identification. AI identifications are a brilliant and exciting tool, but I think it's also risky in the way it is currently handled (increasing false RG observations). At least until AI's error rates, and additional possibilities to exclude highly unlikely IDs, have been assessed and implemented.

Jakob

Ben Phalan

unread,
Sep 23, 2017, 3:15:48 PM9/23/17
to iNaturalist
Following from these suggestions, how about simply* disabling the "Agree" option for identifications made using the AI (or at least for those where the observer chose a species from the top ten suggestions, rather than the AI's best suggestion, which is typically not a species-level ID? This would reduce the number of incorrect agreements.

*I don't know how simple this would be to program

Scott Loarie

unread,
Sep 23, 2017, 3:44:42 PM9/23/17
to inatu...@googlegroups.com
one thing to consider is that using the AI doesn't necessarily mean people are just blindly following. I find it, like range maps and other tools, a useful tool to help my IDs so I use it a lot, but it doesn't mean I'm just blindly followingl. I also find it useful as an 'autocomplete' so I don't have to spend time typing in the whole name of species I know

On Sat, Sep 23, 2017 at 12:15 PM, Ben Phalan <benp...@gmail.com> wrote:
Following from these suggestions, how about simply* disabling the "Agree" option for identifications made using the AI (or at least for those where the observer chose a species from the top ten suggestions, rather than the AI's best suggestion, which is typically not a species-level ID? This would reduce the number of incorrect agreements.

*I don't know how simple this would be to program

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist+unsubscribe@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

AfriBats

unread,
Sep 23, 2017, 4:02:59 PM9/23/17
to iNaturalist



one thing to consider is that using the AI doesn't necessarily mean people are just blindly following.

Fully agreed, I've been happily using it a couple of times, and hopefully made a sensible choice. However, bad fails like the one triggering this thread indicate that some tweaking is necessary. Users uncritically confirming species IDs is already an issue in groups where there's little or no checking by experts, and this is certainly being exacerbated by AI's seductive IDs...

Ben Phalan

unread,
Sep 23, 2017, 6:25:16 PM9/23/17
to iNaturalist
Agreed, I use it like this too. But it might be worth the cost of having slightly fewer Research Grade observations if we can reduce the number of incorrect RG observations.


On Saturday, 23 September 2017 12:44:42 UTC-7, Scott Loarie wrote:
one thing to consider is that using the AI doesn't necessarily mean people are just blindly following. I find it, like range maps and other tools, a useful tool to help my IDs so I use it a lot, but it doesn't mean I'm just blindly followingl. I also find it useful as an 'autocomplete' so I don't have to spend time typing in the whole name of species I know
On Sat, Sep 23, 2017 at 12:15 PM, Ben Phalan <benp...@gmail.com> wrote:
Following from these suggestions, how about simply* disabling the "Agree" option for identifications made using the AI (or at least for those where the observer chose a species from the top ten suggestions, rather than the AI's best suggestion, which is typically not a species-level ID? This would reduce the number of incorrect agreements.

*I don't know how simple this would be to program

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

bouteloua

unread,
Sep 23, 2017, 10:40:36 PM9/23/17
to iNaturalist
When I am fairly certain of an ID, I still specifically don't choose that ID suggested by the AI because I know that the site is recording that selection. I just type it in. I agree that it should be displayed to the public when an ID was selected from the AI suggestions.

Donald Hobern

unread,
Sep 24, 2017, 7:10:41 AM9/24/17
to iNaturalist
Similar cross-continent misidentifications are occurring in many groups and take more work to catch because it's easy to confirm an identification for something highly recognisable and not notice where it came from.

I would suggest tuning the AI as Charlie proposes (although 1000km is still a wide window) but perhaps proposing the matching family in cases where the AI finds a close match but the range is wrong.

Donald

On Sun, 24 Sep 2017 at 04:40 bouteloua <cassi...@gmail.com> wrote:
When I am fairly certain of an ID, I still specifically don't choose that ID suggested by the AI because I know that the site is recording that selection. I just type it in. I agree that it should be displayed to the public when an ID was selected from the AI suggestions.

Charlie Hohn

unread,
Sep 24, 2017, 8:10:30 AM9/24/17
to iNaturalist
my initial distance buffer i proposed was a lot smapper (100 k?) but people didn't like it so I posted 1000 :)

I've used the algorithm as a form of autocorrect too. As Scott says the whole research grade system may get overhauled soon and presumably that would be addressed. I'd use the algorithm more except connectivity is often slow in Vermont and when you do many 100s of observations a month it does burn through some bandwidth (unless I do it over wifi later)

bouteloua

unread,
Sep 25, 2017, 12:38:53 PM9/25/17
to iNaturalist
I spend so much time fixing mistakes based on the AI that I literally just said aloud "oh noooo" when I saw that it's incorporated into the Sugest an ID dropdown. :\

AfriBats

unread,
Nov 1, 2017, 8:13:05 AM11/1/17
to iNaturalist

tony rebelo

unread,
Nov 1, 2017, 9:07:20 AM11/1/17
to iNaturalist
This is not just an AI problem.
We have a field guide to insects in southern Africa that in many cases forgets to tell users of the guide that the species illustrated in the guide has another 40 similar species in the genus and 5 very similar genera with dozens of species.   Many users of the field guide confidently and happily make the ID of the picture in total ignorance of the fact that it may well be one of another 100 or 200 other species.
A lot of this comes down to experience and a reputation system will allow other users to see how likely a particular user is likely to make such a mistake.

Ta
T


On Saturday, 23 September 2017 21:44:42 UTC+2, Scott Loarie wrote:
one thing to consider is that using the AI doesn't necessarily mean people are just blindly following. I find it, like range maps and other tools, a useful tool to help my IDs so I use it a lot, but it doesn't mean I'm just blindly followingl. I also find it useful as an 'autocomplete' so I don't have to spend time typing in the whole name of species I know
On Sat, Sep 23, 2017 at 12:15 PM, Ben Phalan <benp...@gmail.com> wrote:
Following from these suggestions, how about simply* disabling the "Agree" option for identifications made using the AI (or at least for those where the observer chose a species from the top ten suggestions, rather than the AI's best suggestion, which is typically not a species-level ID? This would reduce the number of incorrect agreements.

*I don't know how simple this would be to program

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

Donald Hobern

unread,
Nov 1, 2017, 10:41:00 AM11/1/17
to inatu...@googlegroups.com
I think this one may indeed be an AI problem.  It really is essential that the AI does not recommend a species unless it has been recorded without AI support as a research grade observation in the region where the new observation occurs.  Uploading Larentiinae in Denmark triggers a cascade of N. American species suggestions from multiple families, none of them found in Europe.  I think the AI should be tuned to bump up to higher taxa until it encounters a taxon with suitable ranges.

The problem otherwise is that the AI may be e.g. used by more than one person to get the same out-of-region identification and for them to start agreeing each other's ids and establishing an incorrect model for others to follow.  A tipping point could easily be reached where more people think the species is one that occurs elsewhere than there are "experts" to correct the misconception.

Donald

Calebcam

unread,
Nov 1, 2017, 12:34:51 PM11/1/17
to iNaturalist
Wow, that's pretty bad Jakob. Maybe Al should take location into thought? Or do they already? I agree with Donald, Al should be tuned to bump up to higher taxa, especially for insect/plant species.

Caleb

Scott Loarie

unread,
Nov 1, 2017, 12:49:42 PM11/1/17
to inatu...@googlegroups.com
Hi folks,

I thought it might be helpful to describe what we're currently doing.

When the AI looks at a photo we get back a vector of 'Vision'
probabilities associated with all ~20,000 species in the model

We then try to determine whether the top choices all share a common
ancestor. If not, we just return the top 10 vision results

In the attached figure the top 8 Vision results are all in the Genus
Aspidoscelis so we use that as the common ancestor

We then use a radius around the location of the observation and the
common ancestor to get a vector of 'Frequencies' based on verifiable
observations

Note the three colors:

Grey: 'seen nearby' (ie in Frequencies) but not 'visually similar' (ie
not in Vision)

Red: 'visually similar' but not 'seen nearby'

Green: 'visually similar' & 'seen nearby'

We do our best to order these and return the top 10, but the important
point is that we're currently considering Green, Grey, and Red results
to be included in the top 10

We certainly could exclude Red but our experiments showed that
including these helped performance overall. I think this is due to the
fact that our frequency data is so incomplete. But it would be easy to
do this.

An alternative route I've been experimenting with is using
atlases/listings (rather than frequency data) to restrict the results.
For example, if iNat knew that there were only 300 amphibians in
Australia (regardless of whether they were in the frequency data by
having been observed) we could use this data to restrict suggestions
to this set. But I admit that while it would be doable to get this
working globally on Vertebrates, the data probably isn't there for any
meaningful subset of plants or insects.

Scott
> --
> You received this message because you are subscribed to the Google Groups
> "iNaturalist" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to inaturalist...@googlegroups.com.
> To post to this group, send email to inatu...@googlegroups.com.
> Visit this group at https://groups.google.com/group/inaturalist.
> For more options, visit https://groups.google.com/d/optout.



Screen Shot 2017-11-01 at 9.26.29 AM.png

Charlie Hohn

unread,
Nov 1, 2017, 1:18:40 PM11/1/17
to iNaturalist
My proposal (which you may not like, i dunno):

-For now include only green in results. I'm chasing around a lot of California and Vermont plants that show up all over the world, and it's reducing the value of the site and using up time that could be used for verifying other IDs. The algorithm is hindering, rather than helping, with accurate research grade observations IMHO.  That doesn't mean I am anti-algorithm, I really like it. But if a main purpose is to get better research grade observations i don't think it is fulfilling that goal. Anyhow the 'green' radius seems pretty huge, I've had things show up as 'seen nearby' that are ~100 miles away. You could work in species lists from places like CA that have them if you want.

-If nothing 'green' shows up, and nothing matches, just have it not show anything. Have a message like 'Inaturalist doesn't know this species! Recruit friends to add more data from this area to get better results' or whatever. I think we can all agree that no ID is better than an incorrect ID, and if the area you are in doesn't have enough data to select from nearby species, it probably doesn't have enough data for the algorithm to work properly anyway.

-Concentrate on getting more atlases set up. In particular, I stopped making atlases because my understanding was we needed to cite our range maps to some external document or publication, which frankly, i don't really have the time and resources to do. I propose for now we loosen these requirements, because an imperfect atlas is still better than nothing. (It isn't bad data, its an imprecise range map, and all range maps are estimates anyhow). Let us build atlases based on iNat and GBIF observations, and later we can go back and connect them to some exterior source or update them if need be. Or, alternatively, if you can find comprehensive range maps import them to Atlases somehow. Also add atlas functionality for things like 'waifs' or recent escapes that aren't necessarily in range, or else let us flag observations so they don't trigger the atlas again and again (ie the weird hollyleaf cherry in Napa that was planted and might be spreading, but isn't really in range). 

-When someone reports something outside atlas range, have a more persistent notification pop up. Have it say something like 'This isn't known to occur in the area! It either is not the correct ID, it's something a human put there, or else you found a new population! Please make sure to double-check your ID'. And finish creating the system where we can get notifications for those. For instance I'd love to see a notification any time an out of range plant pops up in Vermont. 95% are cultivated or mis-IDed, but 5% or whatever are neat new discoveries or else new invasives that need action taken.

 That's my opinion anyhow.

AfriBats

unread,
Nov 2, 2017, 9:19:40 AM11/2/17
to iNaturalist
Thanks for this background, Scott, it helps a lot to understand how these IDs are being generated.

I think this thread shows widespread and serious concern about these AI-based IDs. Including only the green results, and otherwise offering higher taxa, might be a good interim solution. Also, please consider making AI-IDs visible on the web page (maybe even searchable).



I think we can all agree that no ID is better than an incorrect ID

Absolutely! Especially if others agree with incorrect species IDs, and false RG observations then falling under the "Needs ID" radar.

Please do something about it sooner than later - I share Donald's concern that this could quickly swamp iNat, or at least generate an unacceptable level of incorrectly IDed RG observations.

Jakob

AfriBats

unread,
Nov 2, 2017, 9:41:23 AM11/2/17
to iNaturalist
Maybe these 'Red' cases (visually similar, but not seen nearby") could be handled by suggesting a higher taxon, which has been observed in the region, and adding a lower taxon as a comment.

So rather than offering the possibility to ID a plant in Australia as Umbellularia californica (= so far observed only in North America), it could suggest "Lauraceae", and then include an automatic comment "You might want to check out Umbellularia californica, which is visually similar, but hasn't been found in the region around your observation". Users would then need to manually enter the lower ID via the browser interface, making it harder to accept these wild suggestions.

Chris Cheatle

unread,
Nov 2, 2017, 9:47:41 AM11/2/17
to iNaturalist
Two minor notes / comments. Any solution attempt to differentiate the categories should not be solely based on colour coding:
- many users either will not know or will not take the time to learn what the colours mean.
- additionally assuming a normal distribution of users, roughly 5% of users (10% of the men, and a smaller percent of the women), including the one writing this comment are colour blind, so will struggle to even make note of the colours

I too think an explicit warning should be generated if choosing something out of iNat's known ranges, it should ideally prompt that the species selected is not currently known from iNat data to occur in the selected location. It would be best to say "known", not that "it does not occur" as of course there are vagrants, invasives etc.

Like others have commented, we've seen a spike in out of range submissions here in Canada as well, one in particular is a raft of Eurasian Magpie submissions.

Charlie Hohn

unread,
Nov 2, 2017, 10:09:32 AM11/2/17
to iNaturalist
What is really striking here is the level of agreement among  several iNat users. I know Jakob and i sometimes have different visions for how iNat should work but this time we are solidly on the same page. Seems others are too. Something to consider. I know this isn't really a democracy but I think there's a pretty strong message here... 

Cullen Hanks

unread,
Nov 2, 2017, 11:39:23 AM11/2/17
to inatu...@googlegroups.com
Charlie et al,

This is a good discussion.  I want to add a couple of thoughts to a “out of range” or “new to area” alert, which I think would be applicable for more than just the interaction with suggested species.  It would be valuable anytime a species is being documented out of range or in a new area.  It would encourage:

1. Caution: Are you sure?  You better check again and make sure it isn’t a look alike species known to this area.  If you aren’t sure, you should probably back it up to a higher taxonomy.

2. Documentation: If it is, make sure you do a good job of documenting it!  If you still can, consider collecting additional documentation.

3. Data Awareness: Wow, this is a new record for this area.  Observations like this are really valuable.  This might be rare here, or if this is a common species, this is an indication that this taxa needs more attention in this region!

4. Kudos: Great Job, you just helped document a new population for iNat…..  Keep up the good work!


Of course this would also be invaluable for curation.  Say, hypothetically, you are interested in plants in Vermont, imagine focusing your curation time on new plant taxa for any county in Vermont?  


However, as Scott indicated, the devil is in the details.

Best,

Cullen






Calebcam

unread,
Nov 2, 2017, 4:16:30 PM11/2/17
to iNaturalist
I agree. I just saw a plant the other day that was an animal, and I think that was Al. It could easily (and quickly!) swamp iNat, like Jakob said, and it just means lots of bad IDs are floating around that somebody has to go fix. Now when I go and start IDing some of the most recent reptiles, I see plants.. and birds, and stuff I didn't see 2 months ago (and I think that's Al's fault, not iNat users).

Caleb 

Tony Iwane

unread,
Nov 2, 2017, 4:29:37 PM11/2/17
to iNaturalist
Caleb, can you share links to some of those observations, if possible?

Tony

Calebcam

unread,
Nov 2, 2017, 7:16:42 PM11/2/17
to iNaturalist
I'll look for them Tony, but they are probably long gone now.

Caleb

Calebcam

unread,
Nov 2, 2017, 8:28:48 PM11/2/17
to iNaturalist
I didn't see the one that I remember so clearly... it was a human but IDed as a lizard. Here's a few I found that seem like they are Al though:

https://www.inaturalist.org/observations/8635146

Caleb

Calebcam

unread,
Nov 2, 2017, 8:40:02 PM11/2/17
to iNaturalist
Just for fun, I am going to upload some of my obs (I took some pics of local fungi), and use Al is the IDer. We'll see how many are right/wrong!

Caleb

Ben Phalan

unread,
Nov 2, 2017, 11:04:02 PM11/2/17
to iNaturalist
Hi Tony,

Not as extreme, but I've been seeing several observations from Brazil misidentified as Tree Swallow, which does not occur in South America. The community is correcting these, but I could see how this could get out of hand for taxa where there are not so many expert identifiers. Presumably the issue is that the training set is heavily biased towards North American species.

For example: https://www.inaturalist.org/observations/8114898

I support the suggestions by Cullen, Charlie and others to make it much less likely for the AI to suggest a species that has not been recorded in the country (or better, the sub-national unit) and/or to flag these observations as interesting and unexpected. For species where we have range maps (many vertebrates), this could be extended to any records outside of the mapped range.

Ben


James Bailey

unread,
Nov 3, 2017, 12:27:13 PM11/3/17
to iNaturalist
I'd actually be very interested in having a tag that shows the ID was suggested by the AI.

Tony Iwane

unread,
Nov 3, 2017, 1:25:14 PM11/3/17
to iNaturalist
Caleb, I think this one is a joke by a student, which sadly has happened since way before the AI was implemented. :/ I have no idea what happened here, since AI returns spiders for me with this photo - perhaps a language thing? The other two seem to be AI-related, however. Ben, I agree those misidentified swallows are likely AI. I'll bring these suggestions to team. 

Tony

Calebcam

unread,
Nov 3, 2017, 8:16:30 PM11/3/17
to iNaturalist
Thanks Tony. I am doing a test to see just how many observations Al can ID wrong. See:


This shows all my recent obs, and most of them I used Al on (the ones I say were Al are Al, and the ones I don't say anything in the description, or "Al suggested ___" are ones that I IDed myself. 

Caleb

jesse rorabaugh

unread,
Nov 5, 2017, 1:03:38 AM11/5/17
to iNaturalist
A lot of these problems could be removed if the default was to only suggest the genus if the species had not been seen nearby. That way the computer still helps you make the ID, but we won't be getting to research grade unless a human figures out the species.

Calebcam

unread,
Nov 5, 2017, 9:40:39 AM11/5/17
to iNaturalist

James Bailey

unread,
Nov 5, 2017, 2:27:44 PM11/5/17
to iNaturalist
Family level is safer if no species are known nearby, because genera are often very similar visually.

Ben Phalan

unread,
Nov 7, 2017, 2:49:58 PM11/7/17
to iNaturalist
Some more problems with the AI in Brazil...

Groove-billed Ani is not found in Brazil:
https://www.inaturalist.org/observations/8718982

Pale Thrush is not found in the Americas. Yes, the correct ID was also in the top ten, but the observer must have picked the photo they thought looked most similar:
https://www.inaturalist.org/observations/8715197

That's 2 out of 30 observations reviewed this morning, and perhaps most of the identifications made by the AI in those 30. If these sorts of errors are appearing in birds, which have an excellent training set, are relatively easy to ID from photos, and have many observers who can correct them, I fear to think how many errors are being introduced unnoticed and going RG in other taxa.

The weird thing about the first case is that the correct species, Smooth-billed Ani, is frequently enough observed that it should be in the training set, but was not in the top ten. Instead, all ten suggestions are of species which are not found in the geographic area where the observation was made, but are presumably well-represented in the training set.

The AI does not know what it does not know. Unless its suggestions are made more conservative (e.g. family level suggestions, only showing taxa known from the region) then it appears that a lot of nonsense IDs are being added to iNaturalist.
Message has been deleted

Chris Cheatle

unread,
Nov 7, 2017, 3:59:30 PM11/7/17
to iNaturalist
Personally, I would prefer they be flagged so they can be found more easily, rather than requiring additional reviews for promotion.
  • as external users we have no idea what the "success" rate of the AI identifications is. Is it 50%, is it 75%, 90%, 99.8% ?. All we are really seeing and highlighting are the apparent issues
  • I dont see yet, although I am sure some individual cases can be pointed out that there is a significant issue with approvals of incorrect ID's, it seems in most cases they get entered, caught by a reviewer and corrected I'm not sure many at all are actually making it to research grade as a result of someone agreeing with an incorrect AI ID.
  • I am not sure I personally see the value of requiring a Russian matryoshka doll process of approvals. If someone goes in and enters an observation of a Blue Jay, regardless of if the ID was theirs or via AI, and someone else reviews and  approves it, is there a systemic need for someone else to go in and review the reviewing, and then someone else to go in and review the reviewing of the reviewing etc.
  • I think a possible solution is to add a 3rd category under the Observations hyperlink, where there is presently "By You" and "Identify", add a 3rd one called "Validate" or something similar and have it behave the exact same way as the Identify, only the pool of observations be those promoted to Research Grade, but with 3 or fewer (or whatever number the community reaches as a consensus) ID agreements. That way, if people have specific concerns about adding supplemental ID's it is easily done in a familiar way etc. If people feel strongly about doing supplemental agreements, give them a way to do it, while leaving a system that works for a high majority of sightings as is.

On Tuesday, November 7, 2017 at 3:26:16 PM UTC-5, paloma wrote:
I would like to see the observations made with AI require more "agrees" before becoming Research Grade. If iNaturalist Research Grade observations go to calflora.org, for example, I'm concerned that calflora.org's range maps for plants (as well as iNaturalist's maps on the taxon pages) are going to be made unreliable. If the AI is making so many mistakes and encouraging people to suddenly expand species' ranges (which I have noticed, too), that is going to be a problem. And I agree with just making the AI choices at the family level until it gets those right (which I think would encourage learning taxonomy). I have been doing iNaturalist for a few years now, and I still often start my observations with "Asteraceae" for plants in that family. It seems like the AI misleads beginners into thinking that all species can be identified very easily, when it could be more of a learning tool of how to narrow the choices in a given area, step-by-step.

Charlie Hohn

unread,
Nov 7, 2017, 4:36:08 PM11/7/17
to iNaturalist
maybe it is too complex, but after atlases are built out and such, what about a third ID or some other extra verification required for anything out of range? The biggest issue i am seeing is error caused by the algorithm not being range-limited at all. If you are in range and have a good pool of existing observations, the algorithm works quite well. In those cases it increases accuracy and I don't see any reason for extra ID.  But the out of range observations are bad and getting worse. 
Message has been deleted

Charlie Hohn

unread,
Nov 7, 2017, 5:13:10 PM11/7/17
to iNaturalist
you can mark it as needing further ID and then it will not get research grade until there are more IDs and the tag is voted off.

On Tuesday, November 7, 2017 at 5:05:30 PM UTC-5, paloma wrote:
I think the idea of flagging an observation "Validate" would be good. When I see an observation of a plant species in a genus apparently new to the county, which was rapidly agreed to and is now Research Grade, it would be nice to be able to flag that as something to be validated when I am suspicious but unqualified to decide whether it's right or not. And I would like this whether AI was used or not.
Message has been deleted

Charlie Hohn

unread,
Nov 7, 2017, 6:46:19 PM11/7/17
to iNaturalist
I don't see that much rubber stamping, and those who rubber stamp never think to unclick that box anyhow. 

On Tuesday, November 7, 2017 at 5:28:53 PM UTC-5, paloma wrote:
yes, I just thought the "validate" box would be searchable by people who are interested specifically in the range problem and would like to help on those specifically--presumably that would be people qualified to make a good ID; when I've used the "needs further ID" it seems like there often follows what I suspect to be just more rubber-stamping
Message has been deleted

Charlie Hohn

unread,
Nov 7, 2017, 8:45:07 PM11/7/17
to iNaturalist
perhaps rubber stamping is just worse in taxa/locations i don't see as often. i have seen it before, but the way others talk about it i think maybe it is more common other areas.

On Tuesday, November 7, 2017 at 8:17:30 PM UTC-5, paloma wrote:
maybe it's a non-problem, then

--
You received this message because you are subscribed to a topic in the Google Groups "iNaturalist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/inaturalist/hiIgCH_Fl6I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to inaturalist...@googlegroups.com.

Ben Phalan

unread,
Feb 6, 2018, 7:54:57 AM2/6/18
to iNaturalist
I'm noticing some problems with the "seen nearby" component of AI suggestions, in Brazil...

Song Sparrow is the top species suggestion here and is marked as "seen nearby". Song Sparrow does not occur in South America, and there are no (even erroneous) RG observations from anywhere remotely near this observation. So I wonder what has happened here:

https://www.inaturalist.org/observations/9747771

Yes, the AI does suggest Passeriformes, but it's all too tempting for a beginner to click on the top species suggestion.

In this second example, the only one of the top ten species suggestions marked "seen nearby" is Harris's Hawk:

https://www.inaturalist.org/observations/9747764

Yet, there are other species on the list which have been observed nearby, such as Roadside Hawk (which I believe is what is in the photo), but Roadside Hawk is /not/ marked "seen nearby". Why not?



Reply all
Reply to author
Forward
0 new messages