accepting gzipped responses

131 views
Skip to first unread message

Augusto Santos

unread,
Dec 29, 2010, 2:30:13 PM12/29/10
to phireho...@googlegroups.com
Hi Folks,

Is there a way to Phirehose read twitter api content with compression enable?
I'm taking more than 1 GB of incoming per day.

Thanks, Augusto.

--

Augusto Santos

unread,
Dec 30, 2010, 8:01:31 AM12/30/10
to phireho...@googlegroups.com
Hi,

Anyone had made Phirehose support gzip,deflate encoding?

Thanks.
--

Fenn Bailey

unread,
Jan 3, 2011, 10:16:07 PM1/3/11
to phireho...@googlegroups.com
Hi Augusto,

That's a great question - At the moment, Phirehose consumes uncompressed content by default. Assuming that the Twitter Streaming API supports serving content with gzip,deflate enabled, it would be possible to modify Phirehose to consume this.

Rather than using fopen, I believe we could set Phirehose to use a stream handler that's capable of decompressing responses (I think). 

Do you happen to know if the streaming API supports serving compressed content?

Cheers,

  Fenn.

John Kalucki

unread,
Jan 4, 2011, 12:41:45 AM1/4/11
to phireho...@googlegroups.com
We don't currently support gzip on streams, but we'd like to.

-John

Karthik Murugan

unread,
Feb 6, 2012, 11:58:44 PM2/6/12
to phireho...@googlegroups.com
Streaming API to support gzip compression from this week

Darren Cook

unread,
Feb 7, 2012, 12:46:55 AM2/7/12
to phireho...@googlegroups.com
> Streaming API to support gzip compression from this week
>
> https://dev.twitter.com/blog/gzip-compression-for-stream-twitter-com-this-week

Thanks for the heads-up Karthik! It appears Phirehose does not send a
Accept-Encoding header; so nothing should break.

But, how to add support for gzip to phirehose? This seems to depend on
if each tweet will be gzipped individually, or the whole stream.

If the former it should be quite easy. Just before this line:
$this->enqueueStatus($this->buff);

we would do:
if($this->usingGzip)$this->buff=gzdecode($this->buff);

But if the whole stream is gzipped, how does it work? Doesn't it need to
know what has come before, in order to be able to decode what has come
after? (*)

Well, PHP has gzopen() (http://php.net/gzopen). The comments in the PHP
manual page show a daisy-chain example:
$fp = fopen("compress.zlib://http://some.website.org/example.gz", "r");

But Phirehose uses fsockopen. Can that also cope with daisy-chaining
schemes? E.g. would this work?
$scheme = "compress.zlib://ssl://";

Darren

*: A couple of StackOverflow questions that didn't really answer this
for me:
http://stackoverflow.com/questions/3469040/decoding-gzip-using-php-sockets
http://stackoverflow.com/questions/8395705/how-to-decompress-gzip-stream-chunk-by-chunk-using-php

--
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

Fenn Bailey

unread,
Feb 7, 2012, 1:40:13 AM2/7/12
to phireho...@googlegroups.com
Hmmm, it's an interesting question.

Again, I feel like switching to curl would be a much simpler solution (plus we get proxy, etc, etc) however people have indicated concern wit a curl dependency.

Has anyone actually got a real production environment that doesn't include curl support in PHP?

  Fenn.

Brian

unread,
Mar 21, 2012, 2:34:36 PM3/21/12
to Phirehose Users
Any further thoughts on curl? Would love to control outbound IP for
the streaming API too...

On Feb 7, 2:40 am, Fenn Bailey <fenn.bai...@gmail.com> wrote:
> Hmmm, it's an interesting question.
>
> Again, I feel like switching to curl would be a much simpler solution (plus
> we get proxy, etc, etc) however people have indicated concern wit a curl
> dependency.
>
> Has anyone actually got a real production environment that doesn't include
> curl support in PHP?
>
>   Fenn.
>
>
>
>
>
>
>
> On Tue, Feb 7, 2012 at 4:46 PM, Darren Cook <dar...@dcook.org> wrote:
> > > Streaming API to support gzip compression from this week
>
> >https://dev.twitter.com/blog/gzip-compression-for-stream-twitter-com-...
>
> > Thanks for the heads-up Karthik! It appears Phirehose does not send a
> > Accept-Encoding header; so nothing should break.
>
> > But, how to add support for gzip to phirehose? This seems to depend on
> > if each tweet will be gzipped individually, or the whole stream.
>
> > If the former it should be quite easy. Just before this line:
> >  $this->enqueueStatus($this->buff);
>
> > we would do:
> >  if($this->usingGzip)$this->buff=gzdecode($this->buff);
>
> > But if the whole stream is gzipped, how does it work? Doesn't it need to
> > know what has come before, in order to be able to decode what has come
> > after? (*)
>
> > Well, PHP has gzopen() (http://php.net/gzopen). The comments in the PHP
> > manual page show a daisy-chain example:
> >  $fp = fopen("compress.zlib://http://some.website.org/example.gz", "r");
>
> > But Phirehose uses fsockopen. Can that also cope with daisy-chaining
> > schemes? E.g. would this work?
> >  $scheme = "compress.zlib://ssl://";
>
> > Darren
>
> > *: A couple of StackOverflow questions that didn't really answer this
> > for me:
> >http://stackoverflow.com/questions/3469040/decoding-gzip-using-php-so...
>
> >http://stackoverflow.com/questions/8395705/how-to-decompress-gzip-str...
>
> > --
> > Darren Cook, Software Researcher/Developer
>
> >http://dcook.org/work/(About me and my work)

Augusto Santos

unread,
May 24, 2012, 5:16:16 PM5/24/12
to phireho...@googlegroups.com
Old story, but... how to make Phirehose use curl?

Thanks.
--

Fenn Bailey

unread,
May 24, 2012, 8:35:38 PM5/24/12
to phireho...@googlegroups.com
This is actually pretty easy to do (I simply don't have time right now to work on it).

Ostensibly, replacing this line:


and the subsequent HTTP POST request with a curl call, then replace the stream_select loop at:



It would, however, be slightly messy to support both curl AND fsockopen at the same time (you'd need an additional layer of abstraction).

I think I would prefer to switch Phirehose to use curl as the primary transport - I don't really forsee a case where users would have the ability to run background scripts indefinitely, but NOT have curl support.

Cheers,

  Fenn.

Augusto Santos

unread,
May 25, 2012, 8:32:46 AM5/25/12
to phireho...@googlegroups.com
Hi Fenn,

I don't foresee it too. I guess curl is simpler then php.

Thanks for the solution. 

Cheers,

Augusto.
--

Darren Cook

unread,
Jun 5, 2012, 8:23:48 AM6/5/12
to phireho...@googlegroups.com
> This is actually pretty easy to do (I simply don't have time right now to
> work on it).

Hello Fenn,
When you (or someone) has time I'd love to see this done. Not least
because the curl multi_* functions are practically undocumented, and in
my experiences so far, are hard to get working.

(Or phrased more directly: show me the money Fenn!)

> It would, however, be slightly messy to support both curl AND fsockopen at
> the same time (you'd need an additional layer of abstraction).

You'd only need an abstraction layer if you think you might want to
introduce a 3rd approach in the future (*). With just two options, using
if/else clauses wrapped round the three blocks of code you identified works.

I.e.
if(self::transport=='curl'){
//...curl code...
}
else{
//...leave existing stream_select, etc. code here
}

I agree that this is "slightly messy", but with the emphasis on the
"slightly". And it allows us to introduce a major change in a way that
won't break backwards-compatibility.

Darren

*: Or if you write in java... from what I've seen enterprise java
programmers like to wrap an abstraction layer around everything, even if
there will only ever be a single approach ;-)

--
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)

Fenn Bailey

unread,
Jun 5, 2012, 7:50:35 PM6/5/12
to phireho...@googlegroups.com
Hey hey!

When you (or someone) has time I'd love to see this done. Not least
because the curl multi_* functions are practically undocumented, and in
my experiences so far, are hard to get working.


Yes, they are terribly badly documented, though I have a reasonable amount of experience working with them. As an interesting side effect of adding curl_multi_*, it would be fairly easy to add multiple simultaneous connection to a single Phirehose instance, ie, something like: 

$mp = new MultiPhirehose();
$mp->add(new MyFilterStream($user, $pass, $params));
$mp->add(new MyUserStream($user, $pass, $params));
$mp->consume();
 
(Or phrased more directly: show me the money Fenn!)

:)
 

> It would, however, be slightly messy to support both curl AND fsockopen at
> the same time (you'd need an additional layer of abstraction).

You'd only need an abstraction layer if you think you might want to
introduce a 3rd approach in the future (*). With just two options, using
if/else clauses wrapped round the three blocks of code you identified works.


Yeah, you're probably right - the trickyness would mostly be around the async IO loop, because you'd be replacing stream_select() with a potential curl_multi_select(), so your core loop semantics could be quite different.


*: Or if you write in java... from what I've seen enterprise java
programmers like to wrap an abstraction layer around everything, even if
there will only ever be a single approach ;-)


Hah, so true. My overcomplicating-it thought was to ostensibly have 2 layers of hierarchy, your stream types, ie: User, Filter, Sample, etc and then your transport types: Fsock, CURL, ProxiedCURL, etc.

Probably way too complex, considering my original goal was to need only a single, with a single method to consume the stream :)

  Fenn.

xeon...@gmail.com

unread,
Nov 6, 2012, 5:37:30 PM11/6/12
to phireho...@googlegroups.com
I'm also interested in how we could use gzip encoding. I assume this would be most important for the full firehose if you had a host with less bandwidth and plenty of CPU cycles.

Perhaps every packet sent to the buffer could be gzdecode() if it's being compressed at the packet level.

- David
Reply all
Reply to author
Forward
0 new messages