My observations have been that when it was first rolled out, the accuracy was good although it would sometimes include non-US occurring and very unlikely species as possible ID's. Then it got better and for a while (sorry, I have no dates) it was working extremely well, with high accuracy. I don't know if somewhere along the line the data set took in a bunch of false ID's from eBird submissions by birders misidentifying relatively common, therefore non-flagged species (so these ID's just weren't scrutinized) but a couple of years ago its results started to get shaky at times. This hypothesis could be completely wrong as I don't know Merlin's inner workings. However, I can say that since it implemented location-based ID's it's often, but not always been spotty and unreliable. Just last week it failed to recognize a loud and obvious Carolina Wren, and this is one of many missed and sometimes completely off-base ID's that have been happening over the past year or more. I would also say that being a bit too strict about individual locations (counties, etc.) has overall undermined rather than strengthened its capabilities. To give an example, if something rare but not impossible (say, a Lark Sparrow) showed up in my neck of the woods, I highly doubt that it would be recognized even with a good recording. Others may of course have completely different takes and results. I'm still using it because its ID capability currently seems fair to pretty good to me.
Sean Smith