Streaming API + PHP and Python

18 views
Skip to first unread message

Chad Etzel

unread,
Jun 8, 2009, 4:16:03 PM6/8/09
to twitter-deve...@googlegroups.com
Hi All,

I am stumped. For several days I have tried to write a simple PHP
script that can interact with the Streaming API by just pulling in the
stream and echoing out the contents. This is the script I have:

http://pastie.org/private/9owdxerouwhitz5nfacrw

Right now it just pulls in the feed and echos it. I am not parsing
anything at the moment.

This works great for a while, then the fread will start timing out
every 60 seconds (I have set the stream_timeout to 60). It will do
this after an undeterministic number of updates or number of bytes
received.

netstat shows I am still connected to stream.twitter.com but Wireshark
shows that no new data is arriving.

I have tried this on 3 different machines (2 behind the same
NAT/firewall, and 1 remote server) all with the same results.

I even scraped together a simple python script which should do the
same thing here:

http://pastie.org/private/k0p5286ljlhdyurlagnq

Same results.... works for a while, then it "stops."

Strangely, if I use CURL or telnet to open a raw socket to /spritzer
or /gardenhose it stays up forever. I had a telnet socket open on
/spritzer all weekend with no disconnects...

In the PHP script, if I add code to detect the time-outs and
immediately disconnect the socket and reconnect, the updates start
flowing in again... This is nice for error checking, but I'd really
like to figure out a more robust solution.

1) Can anyone find anything wrong with the scripts I've posted?

2) Does anyone have an example PHP script they are using to connect to
the Streaming API which stays up indefinitely?

I would like to thank John K at Twitter for helping me debug thus far.

Thanks,
-Chad

John Kalucki

unread,
Jun 8, 2009, 4:31:23 PM6/8/09
to Twitter Development Talk
A theory: The PHP client has stopped reading data, for whatever
reason. The TCP buffers fill on the client host, the TCP window
closes, and wireshark shows no data flowing. netstat(1) will show the
number of bytes waiting in the local TCP buffer.

Baseless speculation: There's a limitation in the chunked transfer
coding in the PHP client wherein it cannot support endless streams.
Some resource is exhausted or administratively limited (php.ini), and
the stream stops.

-John Kalucki
Services, Twitter Inc.

Chad Etzel

unread,
Jun 8, 2009, 4:36:40 PM6/8/09
to twitter-deve...@googlegroups.com
I thought those things, too... but the following things made me think otherwise:

a) The stream stops after a different number of updates/bytes each
time, and will happily go on forever if I put an error-catching loop
in the script.

b) The same thing is happening in the python script.

c) Curl/telnet works fine, so it's not a system resource depletion issue....

...still confused,
-Chad

jstrellner

unread,
Jun 8, 2009, 5:00:32 PM6/8/09
to Twitter Development Talk
Hi Chad,

We too have noticed the same behavior in PHP. Initially I wrote
something very similar to your example, and noticed that I'd get a
random time's worth of data before it disconnected. Then I rewrote
it, which you can see at the below URL (modified to remove irrelevant
code to this discussion), but I am still seeing similar results. Now
it goes for 2-3 days, and then stops getting data.

I can see that the script is still running via "ps" on the command
line, and I can still see data going through the server, just PHP
doesn't process it anymore.

http://pastie.org/505012

I'd love to find out what is causing it. I do have a couple of
theories specific to my code that I am trying - the only thing that
sucks is that it is random, so the tests take a few minutes or days,
depending on when it feels like dying.

Let me know if this code works or helps you in any way. Feel free to
bounce any ideas off of me, maybe we can come up with a stable
solution.

-Joel


On Jun 8, 1:36 pm, Chad Etzel <jazzyc...@gmail.com> wrote:
> I thought those things, too... but the following things made me think otherwise:
>
> a) The stream stops after a different number of updates/bytes each
> time, and will happily go on forever if I put an error-catching loop
> in the script.
>
> b) The same thing is happening in the python script.
>
> c) Curl/telnet works fine, so it's not a system resource depletion issue....
>
> ...still confused,
> -Chad
>

Chad Etzel

unread,
Jun 8, 2009, 5:25:40 PM6/8/09
to twitter-deve...@googlegroups.com
Well, glad I'm not the only one :) But still a bummer it's happening...

Another strange thing is that his does *not* seem to happen with the
/follow streams. I have a PHP script running (same source, just
requesting /follow instead of /spritzer) that has been connected for
over 2 days. Of course, it may die at any moment, I'm not sure..

One big difference is that the throughput for that stream is much much
less than the /hose streams, and I'm wondering if the sheer volume of
bytes being pushed has something to do with it? That would be quite
sad.

I have PHP scripts acting as Jabber/XMPP clients that use the similar
fsockopen/fread/fgets/fwrite mechanisms that have been up for months
at a time, so I know those socket connections *can* stay up a long
long time in theory.

-Chad

Jason Emerick

unread,
Jun 8, 2009, 8:36:17 PM6/8/09
to twitter-deve...@googlegroups.com
Here is some rough python code that I quickly wrote last weekend to handle the json spritzer feed: http://gist.github.com/126173

During the 3 or so days that I ran it, I didn't notice it die at any time...

Jason Emerick

The information transmitted (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is intended only for the person(s) or entity/entities to which it is addressed and may contain confidential and/or privileged material.  Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited.  If you received this in error, please contact the sender and delete the material from any computer.

Chad Etzel

unread,
Jun 8, 2009, 8:52:48 PM6/8/09
to twitter-deve...@googlegroups.com
Hi Jason,

Thanks! I've tried it out, and it seems that it doesn't like unicode
characters? Here's the traceback I get:

Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
self.run()
File "spritzer.py", line 31, in run
print '%s -- %s' % (t['user']['screen_name'], t['text'])
UnicodeEncodeError: 'ascii' codec can't encode characters in position
11-14: ordinal not in range(128)

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
self.run()
File "spritzer.py", line 31, in run
print '%s -- %s' % (t['user']['screen_name'], t['text'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2625' in
position 9: ordinal not in range(128)

I'm not fluent in python, so I'm not sure of the unicode
capabilites... but otherwise it looks like it's connecting and
receiving data.

-Chad

Nick Arnett

unread,
Jun 8, 2009, 10:15:16 PM6/8/09
to twitter-deve...@googlegroups.com
Try calling encode("utf-8") on the strings before you do anything else with them.... but when you do, you may find that you have to add Python components.

In other words, if the string is foo, do this:

foo = foo.encode("utf-8")

Nick

Jason Emerick

unread,
Jun 10, 2009, 10:21:42 AM6/10/09
to twitter-deve...@googlegroups.com
I believe I had to set the default locale of my system to use UTF-8 by setting the appropriate environment variable.

I believe it was the following on an ubuntu server:
LANG="en_US.UTF-8"

The other option as Nick pointed out is using the following:
foo = foo.encode("utf-8")

Jason Emerick

The information transmitted (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is intended only for the person(s) or entity/entities to which it is addressed and may contain confidential and/or privileged material.  Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited.  If you received this in error, please contact the sender and delete the material from any computer.


Joseph

unread,
Jul 24, 2009, 2:20:41 AM7/24/09
to Twitter Development Talk
Have you resolved this problem? suggestion: did you try writing the
raw output to a file (like every hour, and then create another file,
and so on), and then have another script process the JSON?

On Jun 8, 1:16 pm, Chad Etzel <jazzyc...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages