Using setFollow With Ghetto collect.

43 views
Skip to first unread message

wca...@gmail.com

unread,
Sep 9, 2015, 11:57:17 AM9/9/15
to Phirehose Users
I've been playing with this feature with some limited success. I have Phirehose setup on a dedicated server with monitoring so I have a rough idea of what has been happening on it.

I can follow 1 user with pretty good throughput from twitter (tested with a test account I have and would get most tweets in real time and @replies) 

Now as I push through and build up the list past 5-10 accounts it starts to choke and burst valid tweets on occasion.

Keep in mind that I'm only getting a couple of tweets out (maybe 5-10 in an hour) and then radio silence for hours. I know this is not accurate since I have followed 
these individual accounts (which have a HIGH traffic of @'s and RT's when followed by themselves.) So I know something is up with either twitter's feed or something
in the library.

TLDR:
I see phirehose ghetto collector using bandwidth talking to twitter, but I'm not getting near the amount of content expected for the bandwidth used.
Some hints as to where to look would be appreciated.

I've also started looking at the Twitter reference library (hbc @ https://github.com/twitter/hbc) to see what they are doing to parse incoming json stream as I think that this may be the source of trouble/breakage in tweets.


Thanks for the anticipated feed back!
--Kyle


Fenn Bailey

unread,
Sep 9, 2015, 7:58:21 PM9/9/15
to phireho...@googlegroups.com
As a quick debug step, I'd switch to a very simple script that just prints the tweet to the screen and see if you get a bunch more tweets (that correlate closer to the bandwidth usage you're seeing).

That said, if you're concerned that some of the tweets aren't even making it to the enqueueStatus() method, then that's a separate (and more concerning) issue.

You could try using a different client entirely and compare the results but it would be somewhat surprising if Phirehose was missing certain tweets (though definitely theoretically possible if twitter has done something strange with the response format).

--

---
You received this message because you are subscribed to the Google Groups "Phirehose Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phirehose-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wca...@gmail.com

unread,
Sep 9, 2015, 10:20:32 PM9/9/15
to Phirehose Users
Debugging has proved successful in showing that it is a problem else where in the chain of getting the tweets. I was able to decode the $status in the enqueueStatus and print out valid tweets. So the collector looks good, now on to debugging the consumer.

Thanks for pointing me in the right direction!

I'll update if there are any oddities found along the rest of the way.

--Kyle

wca...@gmail.com

unread,
Sep 10, 2015, 12:43:36 AM9/10/15
to Phirehose Users, wca...@gmail.com
K as an update I noticed this posting: https://groups.google.com/forum/#!topic/phirehose-users/YEDD6LxPYqQ
With your advice about checking the file and then adding a new line character at the end of the $status for the collector's enqueueStatus() when it fputs it into the queue file.


Some reason on a x64 build (Ubuntu 14.04.3LTS with php 5.5.9) the only thing that it checks against is whatever the native end of line character is.

So to fix the issue make sure to use the php built in constant for the end of line character is via:
PHP_EOL

So the code should now look like this:
fputs($this->getStream(), $status.PHP_EOL);

This fixes when viewing one of the queue files via nano as a note as well. (it previous had all of the json objects on a single line)

My speculation is that this has something to do with running on a x64 platform?

Thanks again for the help Fenn, hopefully this also helps others/gets patched.

--Kyle
Reply all
Reply to author
Forward
0 new messages