searchhistograph gross underestimation?

38 views
Skip to first unread message

Jean Véronis

unread,
Dec 7, 2012, 3:14:32 AM12/7/12
to otte...@googlegroups.com
Hello,

Counthistogram returns the following values for the last 2 weeks for the keyword QVEMF (there is not much change according to the countmethod used, target/citation)

12-12-07 183
12-12-06 87
12-12-05 98
12-12-04 151
12-12-03 300
12-12-02 766
12-12-01 14032
12-11-30 82
12-11-29 78
12-11-28 139
12-11-27 131
12-11-26 288
12-11-25 866
12-11-24 17870

However, we track the same keyword using GNIP and the counts are wildly different, and much higher. The two peaks at 12-12-01 and 12-11-24 are respectively at 57,996 and 126,446 (including RTs).

The Topsy counts seem therefore grossly underestimated — unless I miss something, of course.
Note that the analytics interface gives different counts (still very underestimated)  despite what you seem to indicate in the documentation (" The Topsy Analytics application is built on this API call."):

Many thanks for you help.
--jv

Jean Véronis

unread,
Dec 13, 2012, 11:09:58 AM12/13/12
to otte...@googlegroups.com
Hi,

I wonder if you had any chance to check into this?

Thanks
--j

Vanessa Hsu

unread,
Dec 13, 2012, 12:48:14 PM12/13/12
to otte...@googlegroups.com
Hi Jean,

Our /searchhistogram call are estimated counts based on our significant-tweet index (only those who have been retweeted or contain a link), so they are expected to be lower volume. 

We provide comprehensive, exact counts in Topsy Pro Analytics. With Topsy Pro, you can access these counts for any term or phrase, and either get up to the minute counts or go back for over 2 years instantly.

Please let me know if you're interesting in getting a free trial of Topsy Pro. I can also pull the numbers for QVEMF for you for our comparison.

Thanks,
Vanessa

Jean Véronis

unread,
Dec 13, 2012, 12:59:29 PM12/13/12
to otte...@googlegroups.com
Hi Vanessa

This is a very piece of information. As you can see, the understimation is extremely important, and it is the case for many other queries in that area (Social TV).

However, it is important to know that the counts are exact with the Pro accounts. I'am actually consulting for a client who is considering acquiring a Pro account and they wouldn't make the step based on my previous tests. 

If you could indeed forward the real count on that keyword for the last 2 or 3 weeks that would be a help.

By the way, unrelated, but important also: I cound't find a way to query on a #QVEMF instead of QVEMF (although in this cas, the difference is marginal).
Many thanks

--j


2012/12/13 Vanessa Hsu <vanes...@gmail.com>

Vanessa Hsu

unread,
Dec 13, 2012, 2:32:33 PM12/13/12
to otte...@googlegroups.com
Hi Jean,

Attaching the data below for both terms, please note all day counts are based on UTC time.

keyword Time (UTC) #QVEMF QVEMF
#QVEMF 11/13/12 0:00 462 47
#QVEMF 11/14/12 0:00 322 47
#QVEMF 11/15/12 0:00 599 92
#QVEMF 11/16/12 0:00 143948 4339
#QVEMF 11/17/12 0:00 29962 1061
#QVEMF 11/18/12 0:00 1521 161
#QVEMF 11/19/12 0:00 628 69
#QVEMF 11/20/12 0:00 282 29
#QVEMF 11/21/12 0:00 299 50
#QVEMF 11/22/12 0:00 297 48
#QVEMF 11/23/12 0:00 141379 4518
#QVEMF 11/24/12 0:00 8026 624
#QVEMF 11/25/12 0:00 1321 160
#QVEMF 11/26/12 0:00 413 70
#QVEMF 11/27/12 0:00 641 69
#QVEMF 11/28/12 0:00 205 43
#QVEMF 11/29/12 0:00 291 62
#QVEMF 11/30/12 0:00 117725 3785
#QVEMF 12/1/12 0:00 6908 472
#QVEMF 12/2/12 0:00 1169 187
#QVEMF 12/3/12 0:00 517 114
#QVEMF 12/4/12 0:00 351 82
#QVEMF 12/5/12 0:00 344 73
#QVEMF 12/6/12 0:00 522 118
#QVEMF 12/7/12 0:00 114358 4101
#QVEMF 12/8/12 0:00 8114 747
#QVEMF 12/9/12 0:00 971 150
#QVEMF 12/10/12 0:00 498 83
#QVEMF 12/11/12 0:00 556 94
#QVEMF 12/12/12 0:00 258 91
#QVEMF 12/13/12 0:00 207 32


Cheers,
Vanessa

Jean Véronis

unread,
Dec 13, 2012, 2:38:28 PM12/13/12
to otte...@googlegroups.com
wonderful !
many thanks 
ps: just wondering: what would be the syntax to query on a hashtag in otter api?

this
 doesn't seem to bring very different results from
Reply all
Reply to author
Forward
0 new messages