User "andhye" has karma "null" in HNSearch DB and "1" on HN

39 views
Skip to first unread message

Vanni

unread,
Jul 4, 2011, 10:22:53 AM7/4/11
to hnse...@googlegroups.com
Hi HNSearch folks!

Playing with your cool API I caught an user with karma null in your db:


{
...
    "results": [
        {
            "item": {
                "_id": "andhye",
                "about": null,
                "cache_ts": null,
                "create_ts": null,
                "karma": null,          <<<<<<<<<<<<
                "username": "andhye"     <<<<<<<<<<<<
            },
            "score": 1.0
        }
    ],
...
}

Regards,
Vanni Totaro

Zack Maril

unread,
Jul 4, 2011, 3:31:53 PM7/4/11
to hnse...@googlegroups.com
Weird.

Take a look at this:
http://news.ycombinator.com/user?id=andhye

Compare it to the new account that I just made:
http://news.ycombinator.com/user?id=testforvanni

In theory, both accounts appear to have the same history, 0 submissions and 0 comments => 0 karma. It depends on the latency, but in theory these accounts are identical except for the user name and so both should show up in a search together that is independent of the username of the accounts. But, if you redo your search again, the test account isn't  showing up. Could very well be due to latency at this point though. But if they were to show up like that due to 0 karma, there would probably be a bunch more accounts with no submissions or karma.

Sounds like there is an odd error somewhere. Maybe andhye deleted a submission or comment and that doesn't play nice with ThirftDB?

Warning: I am not a maker of the API, I just like exploring hacker news. Anything I say is probably very wrong.

Neat find though.

Vanni Totaro

unread,
Jul 4, 2011, 3:43:23 PM7/4/11
to hnse...@googlegroups.com
Hi Zack,
IIRC every new user starts with 1 karma point (i.e. 0 submissions and 0 comments => 1 karma).
The "andhye" null karma bug has to be fixed by hand in the HNSearch DB. Still curious about the origin of this discrepancy :)
Regards,
Vanni

Andres Morey

unread,
Jul 5, 2011, 11:02:04 AM7/5/11
to hnse...@googlegroups.com
The reason "testforvanni" isn't in the index is because HNSearch only indexes active users. If a user has never commented or submitted then they won't be in the DB at all. To keep user karma up-to-date we re-crawl users whenever they post something new to HN.

If you want to check when a user was last updated you can look at the 'cache_ts' attribute in the JSON object. The user "andyhe" hadn't been crawled at all which is why their karma was null. There are no submissions/comments for that user so my guess is that they posted something at some point that later got deleted which would explain why they're in the DB to begin with. My guess is that we didn't crawl that user because their post got deleted before their username was added to the fetch queue.

I forced a crawl of that individual user so their karma should be up-to-date:

Vanni Totaro

unread,
Jul 5, 2011, 1:17:09 PM7/5/11
to hnse...@googlegroups.com
Hi Andres!

The fact you said that HNSearch only indexes active users opens another discrepancy question:

if I run those two commands in a Linux terminal:

# Get no. of users in users DB
# Result:
    "hits": 74609,

# Get no. of users with at least one item (submission or comment) in items DB
# Result:
73167

74609 != 73167... Why?

Regards,
Vanni Totaro

Andres Morey

unread,
Jul 5, 2011, 2:35:39 PM7/5/11
to hnse...@googlegroups.com
The discrepancy is due to users whose submissions/comments were deleted.

Andres
Reply all
Reply to author
Forward
0 new messages