--
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-vision-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/0f1d0b82-089c-4e5b-8319-d984b75b17d6%40googlegroups.com.
After spending a lot of time and money digging into the issue, some questions:1. In the last year, TEXT_DETECTION has been changed to have a much lower rate of text recognition (recall) but much better accuracy (precision). How does Google communicate when such dramatic changes are made?2. When run on a large corpus of family and travel photos, DOCUMENT_TEXT_DETECTION appears significantly better than TEXT_DETECTION, in both the number of photos with recognized text (recall) and the accuracy of that recognized text (precision), across many different kinds of text (signs, jerseys, license plates, car decals, book titles, documents). Given this, in what circumstances should TEXT_DETECTION ever be used?DetailsI reran TEXT_DETECTION on a collection of photos that had originally been run over a year ago and that had recognized text. Subjectively examining a dozen or so, none of these now had any recognized text. But when I then ran DOCUMENT_TEXT_DETECTION on these dozen or so photos, most of them did have recognized text. This led me to question whether TEXT_DETECTION was even working any longer.Some observations of doing a more careful comparison with a collection of 10K family and travel photos:- Over the last year, TEXT_DETECTION appears to have been changed to have a much lower rate of recognition but much higher accuracy with the text of those that are recognized (lower recall but higher precision).- DOCUMENT_TEXT_DETECTION is significantly better than the new TEXT_DETECTION in both recall and precision, on all kinds of text in photos (signs, jerseys, license plates, car decals, book titles, documents).- The old TEXT_DETECTION from a year ago had a higher recall rate for stand-alone larger numbers (e.g. on jerseys or license plates) than either the new TEXT_DETECTION or DOCUMENT_TEXT_DETECTION.Stats:- The old TEXT_DETECTION recognized text in 10,527 photos.- Of those 10,527 photos, the new TEXT_DETECTION recognized text in just 2915 photos (28%).- Of those 10,527 photos, the new DOCUMENT_TEXT_DETECTION recognized text in 5497 photos (52%).- The 5497 DOCUMENT_TEXT_DETECTION photos included 2808 of the 2915 TEXT_DETECTION photos (96%).Manually examining large numbers of these photos shows the old TEXT_DETECTION with a much lower precision than the new TEXT_DETECTION and DOCUMENT_TEXT_DETECTION, and the new TEXT_DETECTION somewhat lower precision than the new DOCUMENT_TEXT_DETECTION. (I didn't do quantitative scoring.)This spreadsheet contains the text for the three methods for all 10,527 photos: https://www.dropbox.com/s/abx0q77nech66mr/comparison-john.2018.06.08.xlsx?dl=0. A cell that is #N/A indicates no recognized text for that photo and method. Use Excel data filtering to include and exclude #N/A cells.
After spending a lot of time and money digging into the issue, some questions:1. In the last year, TEXT_DETECTION has been changed to have a much lower rate of text recognition (recall) but much better accuracy (precision). How does Google communicate when such dramatic changes are made?
2. When run on a large corpus of family and travel photos, DOCUMENT_TEXT_DETECTION appears significantly better than TEXT_DETECTION, in both the number of photos with recognized text (recall) and the accuracy of that recognized text (precision), across many different kinds of text (signs, jerseys, license plates, car decals, book titles, documents). Given this, in what circumstances should TEXT_DETECTION ever be used?
DetailsI reran TEXT_DETECTION on a collection of photos that had originally been run over a year ago and that had recognized text. Subjectively examining a dozen or so, none of these now had any recognized text. But when I then ran DOCUMENT_TEXT_DETECTION on these dozen or so photos, most of them did have recognized text. This led me to question whether TEXT_DETECTION was even working any longer.Some observations of doing a more careful comparison with a collection of 10K family and travel photos:- Over the last year, TEXT_DETECTION appears to have been changed to have a much lower rate of recognition but much higher accuracy with the text of those that are recognized (lower recall but higher precision).- DOCUMENT_TEXT_DETECTION is significantly better than the new TEXT_DETECTION in both recall and precision, on all kinds of text in photos (signs, jerseys, license plates, car decals, book titles, documents).- The old TEXT_DETECTION from a year ago had a higher recall rate for stand-alone larger numbers (e.g. on jerseys or license plates) than either the new TEXT_DETECTION or DOCUMENT_TEXT_DETECTION.Stats:- The old TEXT_DETECTION recognized text in 10,527 photos.- Of those 10,527 photos, the new TEXT_DETECTION recognized text in just 2915 photos (28%).- Of those 10,527 photos, the new DOCUMENT_TEXT_DETECTION recognized text in 5497 photos (52%).- The 5497 DOCUMENT_TEXT_DETECTION photos included 2808 of the 2915 TEXT_DETECTION photos (96%).Manually examining large numbers of these photos shows the old TEXT_DETECTION with a much lower precision than the new TEXT_DETECTION and DOCUMENT_TEXT_DETECTION, and the new TEXT_DETECTION somewhat lower precision than the new DOCUMENT_TEXT_DETECTION. (I didn't do quantitative scoring.)This spreadsheet contains the text for the three methods for all 10,527 photos: https://www.dropbox.com/s/abx0q77nech66mr/comparison-john.2018.06.08.xlsx?dl=0. A cell that is #N/A indicates no recognized text for that photo and method. Use Excel data filtering to include and exclude #N/A cells..
--
You received this message because you are subscribed to the Google Groups "cloud-vision-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-vision-discuss+unsub...@googlegroups.com.
To post to this group, send email to cloud-vision-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-vision-discuss/5ba4c618-2cde-4c60-bf3f-9efadcada3c5%40googlegroups.com.