Stream List from API Has Duplicate Entries

32 views
Skip to first unread message

theGeekPirate

unread,
Feb 8, 2012, 9:47:51 PM2/8/12
to Justin.tv API Developers
Example, at this moment:

Your list: http://api.justin.tv/api/stream/summary.xml?category=gaming

shows 1494 total current streams.


Using your API to retrieve a list (using an offset of +100 each loop)
returns also returns 1494 after ~10 seconds.

BUT, when I remove duplicates (because I can visually see them) using
(PHP):

function removeDuplicatesFromMultiArray($multiArray)
{
$serialized = array_map('serialize', $multiArray);
$unique = array_unique($serialized);
return array_intersect_key($multiArray, $unique);
}

I see 1272 (which is the actual correct number).


Please take a look at your generated list, and see why it is producing
duplicates.

Example of a duplication (which I've formatted):
157 - Twitch - World of Warcraft - bloodlegion - Heroic Alt Raid #2!
Live Now. Heroic Dragonsoul 1 shotting on Alts.
157 - Twitch - World of Warcraft - bloodlegion - Heroic Alt Raid #2!
Live Now. Heroic Dragonsoul 1 shotting on Alts.
157 - Twitch - World of Warcraft - bloodlegion - Heroic Alt Raid #2!
Live Now. Heroic Dragonsoul 1 shotting on Alts.

Note: Most duplicates are happening around every 100ish entries
(sometimes visible around 101, 206 etc.)

Thanks!

theGeekPirate

unread,
Feb 8, 2012, 9:55:45 PM2/8/12
to Justin.tv API Developers
These duplicates will not be right beside each other, but very close.
Easy way: Create a quick script that will only show you all duplicates
in the list.

Max LaRue

unread,
Feb 8, 2012, 10:16:10 PM2/8/12
to justintv-ap...@googlegroups.com
Consider this:

1.) Target stream has 1000 viewers, causing it to be 98th on the list
2.) You query the API limit=100&offset=0, finding the first 100 streams (including the target stream).
3.) After your request, the stream (which is cached for 60 seconds according to the JTV documentation) suddenly updates with new counts
4.) Someone posts on twitter and gets 100 new viewers, bumping them from 105 to 80th on the list
5.) After a series of similar events in between your first and second call, the target stream is now ranked 103rd
6.) Now, your request limit=100&offset=100 comes in, and you find the Target stream, which appears to be a duplicate, but is actually an accurate representation of the data. 

Note that not only do you get the Target stream twice, you actually miss the stream that jumped from 105th to 80th inbetween your two requests

You said it yourself: Most duplicates are happening around every 100ish entries, which is consistent with this hypothesis.

Thoughts?


--
You received this message because you are subscribed to the Google Groups "Justin.tv API Developers" group.
To post to this group, send email to justintv-ap...@googlegroups.com.
To unsubscribe from this group, send email to justintv-api-deve...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/justintv-api-developers?hl=en.


theGeekPirate

unread,
Feb 8, 2012, 10:41:19 PM2/8/12
to Justin.tv API Developers
I have tested and reproduced this issue constantly... meaning if I run
it 6 times a minute, I get the exact same results as Twitch every time
(removing the "updating while fetching" hypothesis).

Also, I receive the exact same number as Twitch does, which tells me
that the information I request is as accurate as theirs.

And since when I get the same number as Twitch, _and_ I see
duplicates, it tells me that their numbers are incorrect (since they
also contain duplicates logically).

Also: Their API doesn't return results sorted by viewers, although
it's close (as if the sort is cached for longer or something, it's
quite strange).

theGeekPirate

unread,
Feb 8, 2012, 10:49:33 PM2/8/12
to Justin.tv API Developers
Note: Their http://api.justin.tv/api/stream/summary.xml?category=gaming
number updates differently than their API, so the number aren't
_always_ exact, although +-10 opposed to +200-300 with the duplicates.

theGeekPirate

unread,
Feb 8, 2012, 10:53:48 PM2/8/12
to Justin.tv API Developers
The duplicates all show that they have the same number of viewers, if
you still have doubts ^^

Latest test run: Twitch number: 1471 My number: 1475 (these include
duplicates)

theGeekPirate

unread,
Feb 8, 2012, 11:36:11 PM2/8/12
to Justin.tv API Developers
Small Update:
PHP Code to remove duplicates from array updated for speed, displays
exact same results:

$combinedArray = array_unique($combinedArray, SORT_REGULAR);

theGeekPirate

unread,
Feb 8, 2012, 11:41:01 PM2/8/12
to Justin.tv API Developers
Although Max LaRue, I do agree with one thing.

If there's duplicates, it can definitely mean that we may be missing
streams as well.

theGeekPirate

unread,
Feb 9, 2012, 7:29:48 PM2/9/12
to Justin.tv API Developers
SWEET, finally some proof I saw with my own eyes, only using the
twitch.tv url, no code involved (except to tell me which ones were
duplicated automatically to save some time).

Off _one_ page, I get the same name twice.

I copy and pasted the result here so you can see for yourself:
http://pastebin.com/WivWysRK

Something is definitely wrong with your API =b

theGeekPirate

unread,
Feb 9, 2012, 7:41:09 PM2/9/12
to Justin.tv API Developers
Code to view duplicates in case for further testing needed by Twitch:

$duplicates = array_diff_key($combinedArray,
array_unique($combinedArray, SORT_REGULAR));
echo '<pre>';
var_dump($duplicates);

theGeekPirate

unread,
Feb 9, 2012, 7:33:38 PM2/9/12
to Justin.tv API Developers
Apology if this is a double post, I wasn't able to see my previous
post.

I found duplicates on ONE page (http://api.justin.tv/api/stream/
list.xml?category=gaming&limit=100&offset=99) using no code, showing
me that there is in fact an issue with Twitch's API.

Copy+Pasta with instructions:
http://pastebin.com/WivWysRK

Essentially ctrl+F, then look for "deathhand2277", watch the
scrollbar, and keep pressing the "Next" button until you see the
scrollbar jump to the next record (will take 17 clicks).

There's two records that are exactly the same! Off one page request
from Twitch!

theGeekPirate

unread,
Feb 9, 2012, 7:45:25 PM2/9/12
to Justin.tv API Developers
Just realised how bad the formatting was for the last pastebin link,
here's a new one: http://pastebin.com/JpXAx3ck

theGeekPirate

unread,
Feb 15, 2012, 7:41:39 AM2/15/12
to Justin.tv API Developers
...does no one care that we aren't getting accurate data?

Mike Ossareh

unread,
Feb 15, 2012, 12:38:29 PM2/15/12
to justintv-ap...@googlegroups.com
Hey,

We do care, but we also know that there are issues in our API and we'll be creating a new one which better serves the need of gaming / twitch related api users.

If it would help we could create a ticketing system for you to log such issues so that when we get around to the new one we can have a body of test cases.

Cheers,

mike

On Wed, Feb 15, 2012 at 4:41 AM, theGeekPirate <andymr...@gmail.com> wrote:
...does no one care that we aren't getting accurate data?

theGeekPirate

unread,
Feb 15, 2012, 3:55:22 PM2/15/12
to Justin.tv API Developers
Hi Mike,

I was worried after not receiving any sort of confirmation after a
week! Your response is much appreciated.

A ticketing system for issues would be a fantastic addition, as there
definitely needs to be a way to sort/manage any issues which may arise
during this process.

I didn't realize you were updating your API because the current one
has issues, I assumed you were only extending/limiting functionality.

As far as I know, this stream duplication wasn't a problem before the
forced pagination (although I only started using your API a week
before this happened).

Please let me know if I can help in any way!

Mike Ossareh

unread,
Feb 15, 2012, 4:02:46 PM2/15/12
to justintv-ap...@googlegroups.com
On Wed, Feb 15, 2012 at 12:55 PM, theGeekPirate <andymr...@gmail.com> wrote:
Hi Mike,

I was worried after not receiving any sort of confirmation after a
week! Your response is much appreciated.

A ticketing system for issues would be a fantastic addition, as there
definitely needs to be a way to sort/manage any issues which may arise
during this process.


I didn't realize you were updating your API because the current one
has issues, I assumed you were only extending/limiting functionality.

As far as I know, this stream duplication wasn't a problem before the
forced pagination (although I only started using your API a week
before this happened).

Please let me know if I can help in any way!

theGeekPirate

unread,
Feb 15, 2012, 6:04:12 PM2/15/12
to Justin.tv API Developers
<3
Reply all
Reply to author
Forward
0 new messages