Overuse of AI suggestions - the (discouraging) Big Picture.

415 views
Skip to first unread message

Chuck Sexton

unread,
Oct 28, 2018, 1:27:13 AM10/28/18
to iNaturalist
I'm getting pretty frustrated with the unthinking selection of iNat AI-suggested IDs by relatively inexperienced observers.  Far too many users--particulary novice users and anyone enamoured with the thought of AI help--are just blindly selecting the first suggested ID without any knowledge or investigation.  And soooo many times the AI suggestion is way off.  The result is that for many, many groups of plants, insects, etc., the database is getting totally clogged with IDs which are way off base.  The problem is particularly acute for any of us (myself included) who are investigating new geographical areas outside our own comfort (or ecological) zone.  I try to correct those miscues that are in my geographic area and taxonomic areas of interest, but it has become a tidal wave in inaccuracy.  I can't keep up with the corrections needed for just my small areas of interest.

This is a pretty radical suggestion, but could we disable the AI for a while, take a deep breath and rethink what it's doing to the fundamental goals of iNat?  I'm pretty discouraged about its utility at this point, with the downside of incorrect IDs far outweighing the occasional enlightened (= correct) ID for too many groups.  IMHO, AI should be enlisted as a *last resort* for suggesting IDs, not as a first pass.  I don't know how to institute that system-wide.  It needs to come with restricted use, pop-up warnings, and other constraints to ensure that it doesn't become a complete joke.  (Some of the suggested IDs at present can be pretty funny, if they weren't so annoying.)

Chuck

Mark Tutty

unread,
Oct 28, 2018, 1:34:18 AM10/28/18
to inatu...@googlegroups.com
Maybe we need AI suggested IDs to not count towards RG? Ie only human IDs count.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

Ralph Begley

unread,
Oct 28, 2018, 8:06:00 AM10/28/18
to iNaturalist
Chuck,
I'm actually testing this feature.
I agree that it needs to mature, but disabling would prevent my testing.

Thanks
Ralph

Ralph Begley

unread,
Oct 28, 2018, 8:32:40 AM10/28/18
to iNaturalist
Chuck,
Perhaps disabling this feature in the "production" version and enabling in the "beta" version would be an option.

Chris Cheatle

unread,
Oct 28, 2018, 9:07:42 AM10/28/18
to iNaturalist
This has been discussed many times. I guess I'm concerned there is no evidence turning it off will improve the accuracy rate.

I suspect one of 2 things will happen, either way more records will go in with no ID, which adds a suite of other issues (just a different kind of correction needed which is no less work than correcting the AI), or folks use Google etc and search for 'black and grey bird' or what have you and pick the first thing they think remotely looks like their observation.

Charlie Hohn

unread,
Oct 28, 2018, 9:39:06 AM10/28/18
to inatu...@googlegroups.com
i too have noticed a decline in ID quality and have spent a bunch of time trying to fix a million pines in southern california that are mis-identified.  But here's my experience:

The ID algorithm does not make managing IDs harder unless you include the fact that it's caused more people to use the site. However, it does not make IDing easier either. Mostly because it doesn't deal well with range and it doesn't include captive/cultivate species which in my observation make up a large chunk of the plant mis-IDs.

The biggest issue still seems to be students and other similar newbies adding massive amounts of 'low quality' content to the site, a problem that has been discussed on here forever but IMHO needs more work. I still think we need separate student accounts but maybe that isn't possible for other reasons. I also think we need ot target outreach and feature development for a while to target more 'power users' and scientists and super-naturalists,because i think the changing balance between those sorts of new users and newbies is too skewed towards newbies.It's just easier to recruit the newbies. 

here's what i'd propose:

-Run the algorithm on ALL observations automatically. Not as a pop up but just have the ID list there on the page as 'ID Algorithm Suggestions'. If there is one that ranks highly (like with Seep) you could give that the ability to act to change the community iD, but it should have maybe 1% of the weight of a human observer so that any human observer can override it.

-Still offer the algorithm as an option on the app (because it's so cool) but put it in as that 1% weight observation on the website. Then let the user go back and 'agree' with it if they review it and think it's right. Make it visible on the website that this is what happened so we know whether the user is reviewing those or not.

-Create some sort of student accounts or similar. Maybe it's time to consider the internal merit based reputation again.

-show a notification for any observation where there are no research grade observations of that species within 100 miles (or whatever). Not even as a questioning thing, sometimes these are neat range extensions. It would raise interest in these observations and also give people a chance to notice they either found something neat or had a wrong ID.

-Train the algorithm on captive/cultivated plants.If a main point of it is to help with low level IDs this is crucial. Most low level IDs by ' duress students' and newbies are of captive/cultivated plants. Sometimes ~90% of them. 

-Shift some of the marketing and development focus to attracting and maintaining use by experts, power users, and naturalist hub citizens rather than 'everyone'.

I know i've said some of this before and apologize for being a broken record.but i think some of this would help a lot.

On Sun, Oct 28, 2018 at 9:07 AM Chris Cheatle <cmch...@gmail.com> wrote:
This has been discussed many times. I guess I'm concerned there is no evidence turning it off will improve the accuracy rate.

I suspect one of 2 things will happen, either way more records will go in with no ID, which adds a suite of other issues (just a different kind of correction needed which is no less work than correcting the AI), or folks use Google etc and search for 'black and grey bird' or what have you and pick the first thing they think remotely looks like their observation.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.


--
============================
Charlie Hohn
Montpelier, Vermont

Ralph Begley

unread,
Oct 28, 2018, 11:13:57 AM10/28/18
to iNaturalist
Charlie,
Very useful observations.

Chris Vynbos

unread,
Oct 28, 2018, 11:14:33 AM10/28/18
to iNaturalist
I agree there is a growing problem. It may be worse for us in Southern Africa because the suggestions seem to be for taxa that don't even occur on our continent. We then have a situation of a really bad ID often not even to the right family, and now we need more than 2 extra IDers to turn it around. The problem is made worse because the people using these AI suggestions don't respond to requests to revisit their IDs, and I suspect this is because they are using the app and aren't finding their mentions. I commented on this here: 

Chuck Sexton

unread,
Oct 28, 2018, 1:12:55 PM10/28/18
to iNaturalist
Charlie Hohn makes some good suggestions but perhaps it enlarges the task I was addressing.  As I rethink my frustration with the AI, the main issue creating the problem may not be its over use, but its poor ID performance.  In most cases I deal with, that in turn is based on lack of geographic specificity, i.e. suggestions are offered apparently without regard to location (even at a continental scale--Sorry, South Africa!).  IF detailed geographic specificity were added in, I think the ID performance would be vastly improved.  I would like to see the ID specificity narrowed to, at a minimum, the state/province level in North America, and preferably further (county or biogeographic region, for instance).  IF an AI-identified taxon has not been previously documented in the subject geographic area, then it is supressed (see below).

A second aspect is that the user has no idea if AI is offering a sure bet for an ID or a wild guess.  I expect it would need a complex algorithm to allow it to "keep score" of its own correct IDs, but some quantitative indication of the likelihood of a correct ID would be cautionary for any user.  The suggested IDs might be offered in the drop-down list a manner akin to what AncestryDNA offers for genetic matches:
"Genus Xxxxx sp., prob. 90-100%"
"Species Xxxxx yyyyy, prob. 90-95%"
"Species Xxxxx zzzzz, prob. 5-20%"
"Species Aaaaa bbbbb, prob. < 5%"
etc.
And the suggested IDs should be only as good as the AI success rate.  IF an AI algorithm knows it may not be close, it might indicate something like:
"Flowering Plants, prob. 75%"
"Family Asteraceae, prob. 5%"
"No further ID available."
That is, the suggested IDs should always be VERY conservative.

I understand that the AI is only as smart as it has been trained thus far, so limiting the suggested IDs with geographic specificity and basing them on its own track record would go a long way to minimizing the junk IDs.

I look forward to the day (!!!) when the suggested IDs might be displayed at this level:
"This is likely to be Xxxxx yyyyy but that species hasn't been documented in your county yet; the nearest records being about 150 mi to the northeast.  iNat isn't familiar with other species in this genus which might occur in your region.  Your observation should not be confused with Zzzzzz sp. which are very similar but known only from South America."
Message has been deleted

Chris Vynbos

unread,
Oct 28, 2018, 2:44:54 PM10/28/18
to iNaturalist
I think Charlie's suggestion of a reputation system is the easiest way out of the conundrum. It means you can have your cake and eat it: while the AI slowly improves,  bad AI suggestions will have little impact if the person selecting them is a newbie with a fraction of a voting point. Newbies who stay and learn will slowly build up reputation points as their ID accuracy improves. We won't have to waste the time of people like Wongun who currently has to implore people who are clueless about bugs to change their ID's (and yet still he gets ignored) because he'd have the reputation built up that gives him the ID power 10x that of a person who doesn't know a beetle from a bug. It would solve our Southern African conundrum too: so what if newbies on the app make dreadful IDs and then never change them? I can make a correction that negates their wrong ID (provided that my reputation for that taxon shows that I clearly know my stuff). 
And lastly, so what if hundreds of kids everyday come on to iNat and put up pics of their friends and ID them as lizards, or slugs? their IDs will not have an impact.  

On Sunday, 28 October 2018 07:27:13 UTC+2, Chuck Sexton wrote:

Chris Cheatle

unread,
Oct 28, 2018, 3:24:34 PM10/28/18
to iNaturalist
Putting aside feelings on the implementation of a reputation system (personally I think it would have a net negative impact on the site, while there are some positives, they would be outweighed by the bad), I'd suggest the 2 discussions be kept separate, as having or not having a reputation system does not resolve the issue of how we think the computer vision system should work.

Were there a reputation system in place, the same initial issues raised in the first post would still exist.

Charlie Hohn

unread,
Oct 28, 2018, 4:28:08 PM10/28/18
to inatu...@googlegroups.com
yeah and just to clarify i am not trying to say we should definitely do something like that.  I just think it might be worth having some way to separate out people who are new and overconfident in their IDs, like i said mostly students.

In terms of the initial ID issues, i feel like in the past before the algorithm people still proposed inexplicable and random IDs. I really think it has more to do with the students than everything else combined. I dont think just having the link is enough. There needs to be student accounts and maybe even a way to toss groups of people in there. I don't know. This problem is one of the worst on the site and getting worse every year.

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.

Mark Tutty

unread,
Oct 28, 2018, 4:32:30 PM10/28/18
to inatu...@googlegroups.com
There is already a reputation system in place!

When two users post differing IDs, I can look at both of those users and potentially recognise one of them as a regular that I see makes good IDs and seldom guesses. I am more likely to agree with them. 

Even if both users "look the same" in terms of my respect of their past IDs, their comments that they might make will lend weight, at least as far as to whom I might agree with.

A duress poster isn't going to be around to change their id, but they're not going to be reviewing many observations from others, so their net impact is negligible.

Rather than trying to get an absent poster to change an id, tagging others to add weight does work. There are maybe a dozen people that I will tag as I know they are regulars and are sensible users that will be around to change their IDs should dialog head that way...

--

Paul

unread,
Oct 28, 2018, 7:46:06 PM10/28/18
to inatu...@googlegroups.com
I’m guilty of relying on that IDer a little to much in the past, just relocated to southern New Mexico and I’ve almost stoped using it 
It keeps trying to take me to plants only found in California and not here. 
No joke about misidentified pine in California 
After reading this tread I started reviewing 
Arizona and California IDs 
Was stunned. It’s never fun either 
When someone chews you out for trying to 
Help also 

Sent from my iPhone
--

Chris Vynbos

unread,
Oct 29, 2018, 12:01:58 AM10/29/18
to iNaturalist
Agreed let's not make this thread about reputation system. But in answer to the initial post, 'getting pretty frustrated with the unthinking selection of iNat AI-suggested IDs by relatively inexperienced observers.' one could solve this without resorting to reputation system by making all newbie users IDs remain as placeholders until agree with by an established user. Once a newbie users has had (say) 5 of their IDs confirmed by an established user, they too become an established user. 

Andrew Gillespie

unread,
Oct 29, 2018, 4:34:03 AM10/29/18
to iNaturalist
My reply might seem rude so I want to be clear, up front, that I do not intend to offend anyone, I want to offer constructive criticism. I have read many posts complaining about incorrect id because of various issues like students, newbies, captive, overuse of Ai and so on. Any of the various suggestions for improvement may or may not be good ideas, but all should be discussed as separate issues.

The change that will address the complaints lies in changing attitudes. The site is a community site that is not limited to the scientific community. The fact that it is open to all is part of what makes it such a fantastic site (in my opinion, the best in class). If you are getting frustrated, you need to lower your expectations. Everyone has to start somewhere and you need to be patient with people who are learning. Either try to assist them in learning or ignore them if facilitating is not your thing. Getting frustrated or upset has a negative impact on you, but does little else. So rather choose to not let it bug you.

Chris Vynbos

unread,
Oct 29, 2018, 7:18:22 AM10/29/18
to iNaturalist
@Andrew. We aren't complaining about people who are beginners and want to learn. Those people are quick to correct their mistakes, and they soon learn to make IDs at their level of certainty. These people are a pleasure to have on the site. The problem is those who are on the site under duress or just to play the fool. And then there's AI issue, that was the start of this thread, as that it can lead inexperienced users to make badly wrong IDs. This is fine with inexperienced users who fall into the first category, because they will pick up their mistakes and correct them. It's the other users who are typically on the site only for a short time and never seen again who give the rest of us a large work load. I don't know how many IDs you make a day Andrew, but myself and a bunch of others in Southern Africa spend hours a day making IDs, trying to get our Southern African obs correct and valuable for research. When someone makes a useless ID thanks to AI or being silly, and then never correct them, our workload for those obs are doubled as we need more agreements than we normally would, and this simply slows the process down. As iNat grows the problem will grow, and there will be less time for us to give useful feedback to those inexperienced users who want to learn.

Charlie Hohn

unread,
Oct 29, 2018, 7:49:39 AM10/29/18
to inatu...@googlegroups.com
I really think at least 95% of these issues are due to the duress student accounts. I don’t get why inat doesn’t just ban or at least heavily discourage this use. Those sorts of assignments annoy the community, degrade the data, and honestly don’t help the stated mission of connecting people with nature either. Setting aside the larger issue about that this says about the predominant education system out there... what value is it adding to the site? It annoys existing users, annoys the students themselves, and annoys the people using the data. I know we don’t want to be super heavy handed but that aside I think we can just find a clever way to phase this out. Maybe heavily market a version of seek for students and call it Inaturalist classroom?
 People are welcome if they want to be here and engage with the community. Regardless of knowledge level. If they don’t want to be here we should go our separate ways. 

--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at https://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.
--
Sent from Gmail Mobile

Mark Tutty

unread,
Oct 29, 2018, 8:07:59 AM10/29/18
to inatu...@googlegroups.com

If they are only on the site for a short time and never seen again, how can they give the rest of us a large workload?

 

My experience has been that a school group/class of duress posters, or an over enthusiastic new-comer that hasn’t quite got the knack yet, might hit the site with perhaps a hundred observations. It seems excessive at the time, but they are typically only occurring sporadically. We would be better served to target some training to the teachers and tutors in how to discourage those from the outset. I am personally tired of seeing students getting qualifications for half-a@#d efforts! Set a bar, and if they don’t meet it, fail them!

 

I typically look at where the observations are being made, look for the nearest school, see if the uniforms match the kids in the photos, and then email the school administration complaining about their students mis-use of IT. The crap observations stop pretty quick.

 

But regardless of my personal pet peaves, If there are 10 of us doing the confirmations, and the observations are rubbish, then we each get 10 observations to mark as rubbish. We don’t have to try and salvage every crap observation that comes up! If they are semi-useful, then we each ID maybe 30 to family or genus (ie cross-confirming to over-ride the bad ID) and then switch them to “can’t be improved”. Just do as many as you are happy to do... some will do 40 and some 10... and it doesn’t have to happen right now...  take a break and come back when you feel good about doing some.

 

cheers
Mark Tutty
kiwif...@gmail.com

--

Andrew Gillespie

unread,
Oct 29, 2018, 8:33:16 AM10/29/18
to iNaturalist
AI is actually not intelligence. It is statistics. The more data it is given the higher the probability is that it will eventually be right. That means that if it is not used then it will never improve. That said, I personally wouldn't miss AI if it was removed.

Andrew Gillespie

unread,
Oct 29, 2018, 8:42:16 AM10/29/18
to iNaturalist
@Chris. Could you give some examples of the problem observations?

Charlie Hohn

unread,
Oct 29, 2018, 9:00:42 AM10/29/18
to inatu...@googlegroups.com
mark there are literally tens of thousands of them. There is one class that brings on 100 students each year and has each make 20+ observations. That's just one class. it's not trivial at all. Maybe it is in your area, but in the areas I look at, it comprises maybe 95% of the ID issues and maybe 50% of the ID work. Because nearly all of the observations are the wrong ID or are captive/cultivated... And no, we don't have enough people to keep up with it, any look at the range map of most common plant species will show you that.  Sugar maple and red oak in California... check.  Coast live oak, a California endemic, all over the Southeast... check.  (well, i got rid of those but there will be more soon).  I just spent a while weeding out dozens of ponderosa pine observations in the LA basin. That species can't survive there even if planted, and none were flagged as planted anyway. And these are just a few species, there are thousands more.

We've tried reaching out to teachers and frankly it hasn't worked. Maybe there's a way to do so more vigorously but i'm not sure how to do that. I'm not going to track down and comment to schools, for a whole ton of reasons. I don't see that as a very good answer here. It's good that it hasn't been as much of an issue with your area or your taxa, but you should believe the many others here from many other areas saying it's a huge issue.

Tait Sougstad

unread,
Oct 29, 2018, 9:06:45 AM10/29/18
to iNaturalist
I'll throw in to say that I agree with most of Charlie's ideas here. They would all likely plug quality leaks in their own way.

However, suppose iNat stays the way it is. I'd like everyone to take a 40,000ft view for a moment. There are some negatives in the democratic way iNat is set up, that some bad data can get introduced into the batch. However, there is a big positive that offsets it: You are not alone. None of us are responsible for a one-man crusade against bad data. Every time you make a contribution, you should feel good that you are volunteering your time to something that really matters, rather than melting your brain with cheap entertainment. You should not feel like the weight of the database rests on your shoulders!

I know there are times that a observation goes Research Grade from a bad ID and groupthink confirmation, and I want to do everything in my power to get it off of RG but I just can't, darn it! I can't save the data! But, in a century, when we are all dead and iNat has the most robust data on the planet, you just have to trust the system that there are other people who are going to try to contribute to the quality of pines in California, or sagebrush in Montana, or lichens in New Zealand. And why do they do that? There are altruistic reasons in contributing to the community, sure, but each time we do it we are also investing in our own skills, getting personal benefits and pleasure from the effort.

So, if you want to feel really good about contributing to iNat, you should make an effort to recruit the kinds of 'experts, power users, and naturalist hub citizens' that would continue to make this a rich community to be a part of for everyone! That would improve the quality of the data, and continue to train the AI to be a better assistant along the way.

Also, as a counterpoint to the 'poor AI quality' observation, it's just more impetus to keep training it. Some of the species are becoming very well developed. (I've suggested elsewhere having a meter on the species page that displays how well trained the species is on AI recognition.) I just went through 30 posts that I already knew the species, and the AI suggested the one I wanted at the top each time! Saved me some typing. Let's look at some of the good suggestions we are putting out here, but keep in mind that the main need we have is for more users!

Tait

Charlie Hohn

unread,
Oct 29, 2018, 9:27:21 AM10/29/18
to inatu...@googlegroups.com
More users, but more users who are engaged in the community and want to be here. Not growth for growth's sake. 

Andrew Gillespie

unread,
Oct 29, 2018, 9:48:23 AM10/29/18
to iNaturalist
I do largely agree with Tait. I mostly limit myself to South African plants. Identifying them is actually really difficult. I have also contributing to the project for transcribing old paper records from herbaria. Even famous professional botanists have had their IDs corrected, sometimes decades later. I am actually impressed at how many of the iNat observations are correctly identified in short order.

Jeremy Hussell

unread,
Oct 29, 2018, 12:53:28 PM10/29/18
to iNaturalist
I confess I don't understand why it's taking so long to get the A.I. to pay attention to location. Isn't it just a matter of passing it some more inputs during training (i.e. the lat/lon of the observation)? For that matter, why is the AI only being trained on the first image of each observation, instead of all of them?

Chris Cheatle

unread,
Oct 29, 2018, 1:30:17 PM10/29/18
to iNaturalist
Just a couple of comments:
  • I suspect it is simply a matter of terminology, but I'm not aware it has ever been stated or confirmed the CV tool is not trained on all photos in observations. 'trained' to me means the process where the tool is taught to understand what a species looks like. The user facing implementation which users runs does only use the position 1 photo in an observation, but that is the 'running' not the 'training'
  • the tool does pay attention to location, it does report seen nearby when run, it does not seem to do it in a way that makes sense to many commenters here. There is a spread of belief about how much weight geography should get, some say if it is not known from the area of the observation, a species should not be recommended at all. Others question exactly what should be meant by 'seen nearby', and how in the face of the user that should be
  • Since we both live in Ontario, I'll give an example, there are no iNat records of Blue Jays within a couple of hundred kilometers of Timmins (likely due to lack of users in the area), it is also equally unquestionable that Blue Jays are numerous and common there. What should happen if a user submits a Blue Jay there and runs the computer vision ? Some would answer iNat does not think it is within x kilometers so Blue jay should not be listed, even if the visual match is 100%. Others would say list it, but highlight in an effective way to the user it is not known there, others would say the buffer to measure if Blue Jays are there is too narrow, and maybe for instance the Ontario checklist should be the source, not submitted records.
  • Maybe the answer is to not display things that don't meet a certain threshold of visual match. i mean if the #8 option that comes up if you run on a Blue jay is a Double-barred Finch and it has a visual match of 0.08 - why show it - just show a shorter list.
  • I don't know what is the perfect answer, just that it seems there is a strong consenus on the board that the current approach is not working well.

Jeremy Hussell

unread,
Oct 29, 2018, 2:42:09 PM10/29/18
to iNaturalist
Here's the statement I was thinking about, about which photos are used for ID: https://groups.google.com/d/msg/inaturalist/gsI1PqAJv8M/0MbvAz3HCQAJ

Reading it, I conclude that the A.I. is trained on each and every RG photo *separately*, and only considers the first photo of new observations when suggesting an ID. My earlier question was meant to be along the lines of "Why isn't the A.I. trained on groups of photos (i.e. one group of photos per RG observation), so that the suggestions can be generated by considering the whole group of photos in a new observation?". I expect the answer will be something like "Because considering an arbitrary number of photos would be too computationally expensive. Suggestions have to be generated quickly." (Next question: would using, say, the first 3 photos of each observation be too computationally expensive?)

More importantly, I also read that as saying that the sort-order of the suggestions are adjusted *afterwards* based on whether there are nearby observations of the same organism, not that the lat/long/accuracy of observations is fed into either the training of the suggestion generator or the suggestion generator. So if I understand correctly, the location is not used to generate the list of suggestions, only to alter their ranks afterwards, which means that if the top 10 suggestions are all endemic to a different continent, adjusting the ranks based on the locations afterwards won't help. It also seems likely that the date/time of the observation isn't fed into the training or the suggestion generation. My question is: why (assuming I actually understand what's going on) are the location and date not included in the inputs for the A.I. when it generates suggestions? They're both important parts of the information which humans use to identify observations. Is the machine learning algorithm being used so specialized that it can only accept images as input?

There's more fun stuff in Tony's post, such as training the A.I. on non-research grade and casual observations (so that suggestions can be made for higher-level taxa and domestic/captive/cultivated organisms), but I think those won't have as much impact as using location as an *input* for the A.I. would.

Chris Cheatle

unread,
Oct 29, 2018, 3:18:04 PM10/29/18
to iNaturalist
Supposedly the date is fed in (I doubt time is, but date is), multiple times I've been told in answers here, that the seen nearby determination is a combination of geographically nearby sightings based on the location of the observation, and sightings need to be +/- 30 calendar days from the day of the sighting (calendar days defined as within any year - ie a sighting on July 1st 2018 is compared against all sightings from June 1 to July 30 in any year).

It doesn't seem to work properly all the time, as I have indicated in other posts, which still does not seem to be officially accepted as a bug, but that's another issue.

Both location and date are contributing to the list shown to the user, or more precisely they are contributing to information embedded in that list ´specifically the 'seen nearby' indication. Neither is an exclusionary element however.

The order of operations seems to be : find visual matches then determine if they are meaningful spatially and temporally, not the other way around.

I'm not a computer vision expert, but my understanding is that a computer vision match algorithm is just that ´what does this look like that I have been trained on ? Further application of other elements such as time and location need to be done separately.

The whole question still boils down to 2 simple elements in my mind:
- how do you determine what species are geographic and/or time matches to the sighting.
- what do you do when you get a high percentage visual match that apparently is not a viable geographic or time option

Jeremy Hussell

unread,
Oct 29, 2018, 4:33:50 PM10/29/18
to iNaturalist
Computer vision classification algorithms classify images, almost by definition. Machine learning classification algorithms are more general: they can classify arbitrary inputs, which may include images or any combination of images and other data. I'm not yet clear on whether iNaturalist's "A.I." is a specific example of a general machine learning algorithm, which can be retrained using more inputs, or a specialized image classification algorithm which cannot be adapted to take into account more input. In the latter case taking date and location into account could only be done on an ad hoc basis after a list of suggestions had been generated, or by training several different classifiers for different locations (e.g. train a different classifier for each continent or biogeographic realm). In the former case a classifier could be trained to classify any image+location+date combination, instead of just an image.

In case it isn't crystal clear, post-processing of the list of suggestions, e.g. marking some of them as "Seen nearby" (which includes "and at roughly the same time of year", if I understand you right), and perhaps adjusting the order of results based on this, are not what I'm asking about. My questions are about what's being used as input into the classification algorithm and why it's hard to change, not what happens afterwards. (The outputs are another matter altogether. The stuff Tony has mentioned about classifying to higher levels of the taxonomic hierarchy looks like original research to me. I'm not surprised that's taking a long time.)

Andrew Gillespie

unread,
Oct 30, 2018, 1:45:01 AM10/30/18
to iNaturalist
I can understand why location is not not part of the algorithm. It is inductive logic. A species won't be found in a location until it is.

Ralph Begley

unread,
Oct 30, 2018, 8:01:41 AM10/30/18
to iNaturalist
Jeremy,
Here's an overview (you may already have seen this. Apologies if you have.):
https://www.inaturalist.org/pages/computer_vision_demo

Ian Toal

unread,
Oct 30, 2018, 12:34:41 PM10/30/18
to iNaturalist
I'm coming late to this discussion (as always), but here are some of my thoughts. I don't use AI - it does not work well with Moths. A lot of the new accounts I see (presumably students) have 'unknown' as the heading, in spite of the fact that, say, "Dragonfly" would be enough to get it into the system for ID. I don't know if it's possible or not, but if the AI system was set to a sub-family ect. it would encourage new users to make a rudimentary ID before any lower ID's came up. This way it would force students to make at least a general ID (doing some research) before any further id could be presented, but I guess that would make the 'Unknown' category obsolete. I do often troll through the 'Identification' section and upgrade Unknown to at least Family, to get it into the system, so that might be counter-productive. It's a tricky problem between accessibility and identifications for research. 

Ben Phalan

unread,
Nov 5, 2018, 12:35:03 PM11/5/18
to iNaturalist
I would love to see the following, in relation to the AI:

1. Include among the presented options the common ancestor of species matches above some threshold of confidence (e.g. Order, Family, Genus)

2. Make the first (suggested) identification default to family-level, with the thumbnail of the species that matches. Current default seems to be species

3. Don't suggest a species-level match unless the match reaches a *very* high level of confidence (higher than at present)

4. Never include in the list of suggestions for species-level IDs species which have not been seen nearby. Always suggest family level instead. Such records should always be confirmed by humans anyway.

In relation to the final point, here is the latest daft AI suggestion from Brazil - I could post many similar examples if I thought it would help: https://www.inaturalist.org/observations/18125949

I realise most/all of these things have been suggested previously, but want to keep this topic in the sights of the developers. Thanks!

Ben

Ben

Reuven Martin

unread,
Nov 5, 2018, 12:57:43 PM11/5/18
to iNaturalist
I have a hard time understanding what all the fuss is about here. What's the big deal if a complete newb in Brazil identifies a thrush as a thrasher? It's not a research grade observation, and for a newb who doesn't understand how to identify birds, they're using the available information to make a reasonable guess. Yeah, there's a lot of low-quality, misidentified observations, but that's a *good* thing. These are the exact people we should be reaching out to and assisting. Many of these people no doubt learned about and tried iNat due to the automatic ID feature. Yes, they should probably be taking the suggestions with more of a grain of salt, but it doesn't really seem that much of an issue to me.

If such observations are reaching research grade, I haven't seen it much locally, and in any case that's an issue of people confirming things they aren't sure about, not an issue with the auto-ID.

Your suggestion 4 would cripple the feature, which is IMO one of the most useful and incredible pieces of technology I've used. 

Paul

unread,
Nov 5, 2018, 2:12:41 PM11/5/18
to inatu...@googlegroups.com
👍👍👍

Sent from my iPhone
--

Ben Phalan

unread,
Nov 5, 2018, 5:53:10 PM11/5/18
to iNaturalist
It's more an annoyance than a huge problem for birds, because most species can be identified from photos and there are plenty of people to go and correct errors. My main concern (as expressed in the previous thread I linked to) is for other taxa such as insects where this expertise is lacking, and where other users are sometimes quick to "agree" without ruling out all the options. Given the volume of observations being made, I think it's highly likely that the AI plus trigger-happy agreeing is resulting in a larger number of junk Research Grade observations than would otherwise be the case. I agree with Reuven that people confirming things they're not sure about is part of the problem, but the AI gets them half-way there.

I don't think suggestion 4 would cripple the feature. Rather it would require that the first observation of a species in a new area would have to pass a slightly higher threshold (two human IDs) before becoming Research Grade. After that, the species would become eligible for ID by the AI in the area. The AI is incredible for species and in areas where there are already many observations, such as California. But it is often very poor (at least at species level) in areas where most species have not been observed or have few observations, including most of the tropics.

Ben Phalan

unread,
Nov 5, 2018, 6:34:27 PM11/5/18
to iNaturalist
To give some more concrete, non-bird examples: there are several mentioned in this thread: https://groups.google.com/forum/#!msg/inaturalist/PrjmKO9YvZ0/gKuo5j0nBgAJ;context-place=forum/inaturalist

They include green lacewings, velvet mites, cicadas, mosses and cockles. It's hard to get a sense of the scale of this problem because the ones that people have noticed and drawn attention to have also had the most effort put into fixing them. But the AI in cases like these is working against, not with, human observers - constantly prompting newbies to add species-level IDs to observations in all parts of the world, based on a training set which is heavily biased to a few regions such as the west coast of the United States. Having the AI suggest family-level IDs until such time as there are Research Grade observations of the species within the same region would do a lot to reduce this problem.

The problem is also not only one of junk Research Grade observations, which are probably still unusual. It is also that to fix an inaccurate species-level ID, a greater number of correct IDs are needed. Again, for birds this is usually not a problem, but for other taxa, in parts of the world and for taxa with few experts, this is very limiting.




Ben Phalan

unread,
Nov 5, 2018, 6:48:28 PM11/5/18
to iNaturalist
Re-reading that other thread, I am reminded that the iNat team is working on this and will be making improvements to how the AI works (thanks Scott et al!) So I guess I should just be patient and wait for those changes to be rolled out...

Tait Sougstad

unread,
Nov 5, 2018, 7:48:54 PM11/5/18
to inatu...@googlegroups.com
Also... encourage all of your friends and any one else who loves to identify stuff to join iNat and continue training the AI in species across the globe!

--
You received this message because you are subscribed to a topic in the Google Groups "iNaturalist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/inaturalist/3BignJ8Kj0c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to inaturalist...@googlegroups.com.

Chris Vynbos

unread,
Nov 5, 2018, 9:13:20 PM11/5/18
to iNaturalist
Could we not have iNat recognise when users are trying to make IDs of sp that are geographically out of range, and give the user a pop-up warning that needs to be dismissed before the ID is accepted? This wouldn't solve all the problems discussed on this thread, but would solve some of them.

Mira Bowin

unread,
Nov 5, 2018, 10:26:33 PM11/5/18
to iNaturalist
I'm just joining the google group and this is my first post. 

I thought I might share my experience and perspective as an individual that you more experienced and reputable users might classify as an amateur or "newbie."

Reading this post thoroughly, it seems I fall into the category of those who have a sincere interest in learning and improving but make mistakes and perhaps the reasons for my mistakes are relevant and my feedback potentially helpful:

Mobile App:

- When I began using iNaturalist I was drawn to it because I am an ebird user and had recently begun using their app.  I wanted a similar option for recording other life forms and contributing to citizen science data and research.  Despite reading up on guidelines and rules (at least I thought) and making good faith efforts it wasn't until I accidentally identified an observation incorrectly (it didn't occur on my continent and was incredibly similar to the correct choice) and it was brought to my attention, that I fully understood that the AI suggestions were NOT limited to my geographical area.  I incorrectly assumed that because it identified nearby observations that this meant it was tuned into my coordinates and wouldn't lead me astray.  

Ebird is only relevant here because if someone is coming from using an app like that (like myself) where there are several "checks" preventing you from making erroneous IDs based on geography, there could be a similar expectation of specificity or boundaries.  For those not familiar with it, ebird provides a localized checklist and then rankings of rarity that may require additional boxes to be checked which then still might be followed up on by the regional overseer before becoming countable data (e.g. when we had a western tanager visiting).  I realize these are vastly different and I'm not suggesting copying ebird just offering my experience.

-After I realized through my own errors, and the feedback of kind and patient pros, that AI couldn't be relied upon to the extent I thought, I now use it to get a generalized sense of a suggestion to be cross-referenced with field guides and reputable sites, or to confirm what I already believe to be the case.  I am also now in the habit of running each of my pictures (when I have several) separately through the AI suggestions by switching my default picture to further determine consensus.  If this had been suggested I would have done that from the start because different pictures of the same species will sometimes yield vastly different suggestions and sometimes will yield the same suggestion which can help increase or decrease my ID confidence respectively.

Suggestions (echos of previously mentioned ones):

- Offer more specific "heads up" to new app users:  "Our AI is here to guide you but will frequently offer suggestions from other locales or may not be exact.  It is not meant to replace your consultation of field guides or other professional and quality scientific resources when making an ID."  (with less clinical language perhaps)  If this is already offered somewhere sorry if I missed it.

-Alerts that let users know they've chosen an ID that doesn't necessarily make sense geographically.  I don't know that limiting the ability to choose it at all always makes sense as sometimes some observations may have value in indicating the presence of a newly introduced or spreading species, invasive or otherwise.

Generally speaking, I'm not always sure of whether I'm using the site correctly and I have 100% good intentions even as I seemingly annoy some of the more pro users.  Sometimes, there is a struggle in wanting more information than the very introductory guides to posting and ID-ing but not being able to comprehend and follow the more technical conversations and news about taxon swapping etc.  Perhaps I'm just occupying an amateur middle ground but I'm hoping I'm welcome here even with past mistakes and occasional and avoidable stupid current mistakes or accidental smartphone button-pushing errors (of which I've had several).  For those I apologize and hope to continue to do better.  I don't want to be detracting from good science I want to be a contributor.

Hope this feedback is remotely helpful.

Respectfully,
Mira

Hurley, NY



Mark Tutty

unread,
Nov 6, 2018, 2:27:37 AM11/6/18
to inatu...@googlegroups.com

Mira, you are most welcome in iNaturalist! Many of us are just like yourself, and to be honest, even for those that aren’t, it’s the diversity of people using that I value most!

 

I have been wanting to start up courses at our local technical college in the evenings, much like they do with camera clubs etc. I think iNaturalist is a wonderful tool, and there is terrific scope for improvement in how new users are introduced to it!

 

cheers
Mark Tutty
kiwif...@gmail.com

--

You received this message because you are subscribed to the Google Groups "iNaturalist" group.

To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.

Chuck Sexton

unread,
Nov 6, 2018, 9:34:48 AM11/6/18
to iNaturalist
Reuven,

As a frequent identifier on iNat, I spend perhaps 1/4 to 1/3 of my time trying to correct erroneous IDs which were offered by the computer vision function and accepted by the observer.  For just my small areas of interest (e.g. moths and/or Texas observations), this commonly amounts to 10 or 20 observations a day I try to correct.  And I would estimate that a few to several each day are being "confirmed" to Research Grade in just my small niche.  Two points I'd make:
-- IMHO, this constitutes a much larger error rate for casual and research grade observations than I see as acceptable; and
-- As a dedicated identifier with some technical expertise, it's a huge burden on my time spent on iNat that I could be devoting to other iNat tasks like working on my own backload of observations to upload or researching newer ID challenges.

IF iNaturalist wants their database for any taxon to have some utility for legitimate research purposes in the scientific or wider community, then reducing the error rates (e.g. "junk Research Grade observaions") and thus lessening the burden on would-be researchers needing to vet such sightings should be priority goals.

I'll get down off my soap-box now...

Chuck

On Monday, November 5, 2018 at 11:57:43 AM UTC-6, Reuven Martin wrote:
...Yeah, there's a lot of low-quality, misidentified observations, but that's a *good* thing....

If such observations are reaching research grade, I haven't seen it much locally, and in any case that's an issue of people confirming things they aren't sure about, not an issue with the auto-ID.
//

Upupa epops

unread,
Nov 6, 2018, 5:34:22 PM11/6/18
to iNaturalist
I think I agree with Chuck that it is an issue worth being concerned about. I've personally seen examples where there are many RG observations of a more well-known species when there are actually nearly identical species in the same range that haven't been properly eliminated. With more users identifying observations those issues with all eventually be dealt with, but it does get in the way of more productive identifying.

Also I don't think I've seen it mentioned here yet but I think this will be really helpful for some of these issues: https://groups.google.com/forum/#!msg/inaturalist/_HKqs2XKzb8/prRS5LR7BAAJ

jdmore

unread,
Nov 8, 2018, 5:16:14 AM11/8/18
to iNaturalist
Mira, that is a hugely helpful perspective.  Thank you for taking the time to put it out there.  It's a good reminder that all of us have been in the same shoes at some point, and that 99.9% of those here are good-willed volunteers of all experience levels, doing this for the love of it.

Not that we can't find ways to improve our tools and how they are used, as threads like this show.  But to acknowledge that the results will always be less perfect that we would prefer, and that is a small price to pay for the community that is growing around this site.

There is still great science to be had from the data, and it will always require some level of finessing and clean-up first.

--Jim Morefield

pw...@capaccess.org

unread,
Nov 10, 2018, 6:08:59 PM11/10/18
to iNaturalist


I just wanted to offer one hopeful aspect of the situation--it seems to me that getting lots of people to post pictures is building a really impressive database, even if things aren't identified correctly.  If it weren't so easy, all that data would be lost  forever, including pictures of either a species lost to an area, or a chance to see when it first arrived, or with rarities, to find it even exists.  At some point, one could reasonably hope the AI feature could go back and help with some ids better than it can now, and that would help the reviewers a lot. So on the whole all this participation is a good thing.
 But that said,  PLEASE make AI stop suggesting things not even found on the same continent as the observation; block it from posting a suggested id that is that unlikely in its list of suggestions.  I fortunately found out early that it did this, by reading the information about the insect I had photographed, but highlighting this problem in the initial instructions would be good until it's fixed.

Patricia Wood
Reply all
Reply to author
Forward
0 new messages