WebPhil wrote: > I do a search for site:www.domain.com for some of my clients to check > how many pages are indexed, but it's coming up with insanely different > results.
> For example, one site I check on Google's search engine has 504 pages, > API returns 87.
> Another site returns 330 on the search enging, API returns 1.
> Why the dramatic discrepancy?
> I don't think I've ever seen a result exactly the same from the search > engine and the API.
I have the same problem. I use API to retrieva the first 100 links, but they are definitely not the same with the real ones returned by google search engine.
I have seen the same issue. I am just looking at estimated total counts. I was able to get closer search results by ensuring that all the restrictions were set the exact same way in the API, but there are still discrepancies in reference counts. I wonder if the API is being supported on a different cluster of machines to the real google?
One glaring issue is if you restrict search by language, and there are no results, then the API returns all results, and there is no way to tell this other than detecting that the language on the returned pages is different that specified.
I'm seeing the same discrepancy in results as well. Here's a search I just ran: term: "widget" via web site: about 3,290,000 results via api: about 526,000 results
Just a small variance there... I also think that the API searches through a different cluster then the web clusters. Unfortunatly I have a feeling that Google will just say "well, it's beta!" if asked about this.
And I think Google has shown what they think of the developers using the API by the complete lack of responses.
At this point I'm giving up on GoogleAPI as a viable tool, since it's obviously never leaving it's infancy, even though it's been around longer than the other APIs, it's still the least reliable.
At least Yahoo's support staff respond, and are helpful. You'd think that somebody would look at the messages, but there are some here that are well over 3 weeks old and not replied to at all.
Let's hope that Google spend a little more time on the Developer tools and less time on their April Fools jokes in the future.
I've been experiencing the same problem for weeks (but it seems to have been going on for much longer), it appears that the API has access to an index of around 1.6 billion pages compared to google's 8 billion (does anyone have a more accurate figure?). I emailed their tech support over a month ago and I got a reply saying that my email had been forwarded to the technical team, but I've had no further replies.
The API also seems to return an inconsistent number of pages, for example running the same query twice gave:
The query "reactive oxygen species" returned about 215000 results. The query "reactive oxygen species" returned about 137000 results.
If anyone has any ideas of how to reduce any of these inconsistencies please let me know!
After 3 weeks I finally got a response from google, and it's pretty much what was expected:
=============================== As you may know, the total number of results displayed when you search on Google is only an estimate, not an exact count. Any discrepancies you notice between a Google search using the Google Web APIs and a Google search on Google.com are likely the result of ongoing changes to our index and/or variations in the estimated number of results.
Regards, The Google Team ===============================
molsen wrote: > After 3 weeks I finally got a response from google, and it's pretty > much what was expected:
> =============================== > As you may know, the total number of results displayed when you search > on > Google is only an estimate, not an exact count. Any discrepancies you > notice between a Google search using the Google Web APIs and a Google > search on Google.com are likely the result of ongoing changes to our > index > and/or variations in the estimated number of results.
> Regards, > The Google Team > ===============================
I am finding the same unreliable results with the api. I have created a new word and exactly 1 listing appears in a google.com search.
Since this word appeared in google.com I have been tracking it's debut using the google web api. It has been over 100 days and still no sign of it. I suspect the inconsistences are more to do with this api connecting to another caching server other than google.com.
So don't hold your breathe it, looks as if we are entitled to periodic updates only and not realtime data.