Should I use publish/subscribe?

253 views
Skip to first unread message

tianyuan

unread,
Jul 18, 2011, 2:31:13 AM7/18/11
to Redis DB
I got about 100 web severs, and I want all of these logs get together.
so I am thinking about using publish/subscribe in Redis.

Every web server publishs to to the same channel,
one log processer subscribe this channel, and process every log it
received.

There are about 30G bytes log everyday.

Is this a appropriate way?

When the subscriber get the log, is the log still in memory or
completely disappeared ?

Dvir Volk

unread,
Jul 18, 2011, 4:37:40 AM7/18/11
to redi...@googlegroups.com
what will happen if the processor goes down for a sec? you will lose all the data because no one is subscribed to this channel.
IMHO this approach is better when you have many readers, but not ideal when you have many writers and one reader. 
you can simply push to a list for that matter, and it will be safer.

but off topic, I am using a non related open source solution for log aggregation over my network, using Facebook's Scribe.
it collects, filters, forwards based on rules and aggregates logs from multiple servers (tens of thousands on Facebook, ~20 in my case (www.doat.com))

the idea is you have a local server running on each machine, with rules on how to forward messages based on their  "category" (prefix). so this is very fast.
then you have a central server the local servers forward messages to. and if you have thousands of servers, you can create a multi tier tree of collectors, or send logs to different collectors based on their categories.




--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.


tianyuan

unread,
Jul 18, 2011, 5:21:58 AM7/18/11
to redi...@googlegroups.com
Thanks for your reply.

Messges will lost if there is no subcribers, right?

I think Scribe is too heavy for my less then 100 servers's simple log.

How about
LPUSH x_log "loglogloglogloglog"

and then 
RPOP x_log
ervery second until I get nil ?

Dvir Volk

unread,
Jul 18, 2011, 5:36:01 AM7/18/11
to redi...@googlegroups.com
a. I don't think scribe is too heavy for your setup. it's been useful for me since I had 4 servers. it's a bit of a pain to compile, but once you've done that setting it up is extremely straightforward. but it's up to you.

b. use BLPOP/BRPOP in your client, that way your consumer just waits until there's something to read, or returns immediately if there is.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/FwyRq9aPRr8J.

Josiah Carlson

unread,
Jul 18, 2011, 12:04:02 PM7/18/11
to redi...@googlegroups.com
I don't have any experience with Scribe, but I agree with Dvir that
you should use something that was designed with logging in mind.

We've used syslog-ng as it increases the log message limit to be much
larger than is allowed in standard syslog (we sometimes log json
blobs), it allows for transport over UDP, TCP, and SSL, it is a
drop-in replacement for syslog (so all of the logging tools that your
platform offers will still work), it offers filtering and redirection
of different log messages (these can get a little ugly to configure,
but it's not bad), etc.

While I am generally a fan of hacking Redis to do just about anything,
in the case of logging: pick one of the standard log packages
(syslog-ng, flume, scribe, etc.). They work great, automatically
include time stamps, origin information, etc., and won't blow up your
memory if your log collection process fails to run for one reason or
another.

Regards,
- Josiah

Dvir Volk

unread,
Jul 18, 2011, 12:54:41 PM7/18/11
to redi...@googlegroups.com
Agreed,
btw, we are using rsyslog as a system log aggregator, and scribe for our application logs that are analyzed and monitored much differently.

tianyuan

unread,
Jul 18, 2011, 9:20:47 PM7/18/11
to redi...@googlegroups.com
ok, I will take a look at Scribe, it seems to be more reliable then hacking Redis.

Matt Ranney

unread,
Jul 19, 2011, 7:07:57 PM7/19/11
to redi...@googlegroups.com
I've been doing something like this in production for a while.  It has been working really well for us.  We don't publish everything to the same channel though, instead our channel names look like this:

process_name:level

One thing that's nice about doing publish/subscribe compared to rpush/blpop for logging is that pub/sub allows multiple readers.  This is interesting because you can attach different readers with different psubscribe patterns to track down a problem and see the data scroll by in realtime.

Matt Billenstein

unread,
Jul 19, 2011, 8:30:53 PM7/19/11
to redi...@googlegroups.com
Hmm, I like that idea for application debug logs and perhaps traceback logging/aggregation, but it seems like this can also be handled with a standard syslog handler going to a central sink no?

I guess the cool thing is that anyone can write a simple client to get at the data they want from a redis instance on their private network, whereas they may not have privileges to login to the central log server in a larger organization.

tianyuan

unread,
Jul 20, 2011, 2:47:59 AM7/20/11
to redi...@googlegroups.com
what if some messages are published but the processers are too busy to handler them?

Josiah Carlson

unread,
Jul 20, 2011, 3:17:08 AM7/20/11
to redi...@googlegroups.com
Are you referring to Redis pubsub, or are you referring to something else?

- Josiah

On Tue, Jul 19, 2011 at 11:47 PM, tianyuan <iamti...@gmail.com> wrote:
> what if some messages are published but the processers are too busy to
> handler them?
>

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/emC1Hq6Lg-IJ.

tianyuan

unread,
Jul 20, 2011, 3:33:41 AM7/20/11
to redi...@googlegroups.com
Sorry, I mean pub/sub in Redis.

emre yılmaz

unread,
Jul 20, 2011, 8:38:47 AM7/20/11
to redi...@googlegroups.com


2011/7/20 tianyuan <iamti...@gmail.com>

i think, it is better for you to use a simple fifo queue implementation. but not with redis, since it's an in-memory database. i would use rabbitmq for queue management and cassandra for storing logs in the disk. (your workers simply  get the messages from rabbitmq in the queue and send it to the cassandra.)

--
web developer
http://www.emreyilmaz.me

Josiah Carlson

unread,
Jul 20, 2011, 12:09:10 PM7/20/11
to redi...@googlegroups.com
I would wholeheartedly recommend against using RabbitMQ, as heavy
writes (sometimes as few as a few hundred/second) can cause it to
segfault. [1]

Also, putting logs into Cassandra doesn't magically make them scale.
To make any on-disk storage system really scale takes either 1) no
indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc).
Sticking with plain logs storage also makes them trivial to backup,
analyze, import into another system if he discovers a need for them
later, etc.

Regards,
- Josiah

[1] http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html


> i think, it is better for you to use a simple fifo queue implementation. but
> not with redis, since it's an in-memory database. i would use rabbitmq for
> queue management and cassandra for storing logs in the disk. (your workers
> simply  get the messages from rabbitmq in the queue and send it to the
> cassandra.)
>
> --
> web developer
> http://www.emreyilmaz.me
>

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.

Josiah Carlson

unread,
Jul 20, 2011, 12:10:03 PM7/20/11
to redi...@googlegroups.com
It is buffered on the Redis server. There was discussions about
whether to disconnect a client that isn't responding fast enough, but
I don't know whether that was implemented.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/lDnbhGUrbq8J.

emre yılmaz

unread,
Jul 20, 2011, 4:50:32 PM7/20/11
to redi...@googlegroups.com


2011/7/20 Josiah Carlson <josiah....@gmail.com>

I would wholeheartedly recommend against using RabbitMQ, as heavy
writes (sometimes as few as a few hundred/second) can cause it to
segfault. [1]


well, in our setup -with to replicated rabbitmq instance- receiving/processing 1000+ messages per second, and it's works like a charm :)

 
Also, putting logs into Cassandra doesn't magically make them scale.
To make any on-disk storage system really scale takes either 1) no
indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc).
Sticking with plain logs storage also makes them trivial to backup,
analyze, import into another system if he discovers a need for them
later, etc.



18 GB per day is a huge data to store in the memory.

Josiah Carlson

unread,
Jul 20, 2011, 6:37:33 PM7/20/11
to redi...@googlegroups.com
On Wed, Jul 20, 2011 at 1:50 PM, emre yılmaz <ma...@emreyilmaz.me> wrote:
>
>
> 2011/7/20 Josiah Carlson <josiah....@gmail.com>
>>
>> I would wholeheartedly recommend against using RabbitMQ, as heavy
>> writes (sometimes as few as a few hundred/second) can cause it to
>> segfault. [1]
>>
>
> well, in our setup -with to replicated rabbitmq instance-
> receiving/processing 1000+ messages per second, and it's works like a charm
> :)

We ran into a segfaulting condition at 75, I've had friends locally
who have run into it at 20/second.

>> Also, putting logs into Cassandra doesn't magically make them scale.
>> To make any on-disk storage system really scale takes either 1) no
>> indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc).
>> Sticking with plain logs storage also makes them trivial to backup,
>> analyze, import into another system if he discovers a need for them
>> later, etc.
>>
>>
>
> 18 GB per day is a huge data to store in the memory.

Who said anything about storing them in memory? I'm talking about
logging them to plain files on disk (then putting them up in S3 or
similar for longer term storage/analysis).

- Josiah

emre yılmaz

unread,
Jul 20, 2011, 7:18:59 PM7/20/11
to redi...@googlegroups.com


2011/7/21 Josiah Carlson <josiah....@gmail.com>



We ran into a segfaulting condition at 75, I've had friends locally
who have run into it at 20/second.

 the numbers you mention are just too low for getting trouble with rabbitmq.
 


Who said anything about storing them in memory? I'm talking about
logging them to plain files on disk (then putting them up in S3 or
similar for longer term storage/analysis).



okay, there is a misunderstood. i replaced 'plain text files' with cassandra. this is a good use case for it, lots of writes, and few reads. plus, searching things, analyzing logs would be so much easier with cassandra.

Josiah Carlson

unread,
Jul 20, 2011, 9:18:52 PM7/20/11
to redi...@googlegroups.com
On Wed, Jul 20, 2011 at 4:18 PM, emre yılmaz <ma...@emreyilmaz.me> wrote:
>
>
> 2011/7/21 Josiah Carlson <josiah....@gmail.com>
>>
>>
>> We ran into a segfaulting condition at 75, I've had friends locally
>> who have run into it at 20/second.
>
>  the numbers you mention are just too low for getting trouble with rabbitmq.

That's what we thought after manually testing at 1k/second, but yet we
and others have segfaulted at that rate (we were running on a 32 bit
box, and apparently suffered some memory fragmentation). Maybe we were
running a buggy version of RabbitMQ, maybe we were running an improper
version of Erlang, I don't know. All I remember from a year and a few
months ago is: it broke about a week after Reddit had theirs break, we
had to spend a week replacing our Celery + RabbitMQ production
infrastructure with ActiveMQ.

>> Who said anything about storing them in memory? I'm talking about
>> logging them to plain files on disk (then putting them up in S3 or
>> similar for longer term storage/analysis).
>>
> okay, there is a misunderstood. i replaced 'plain text files' with
> cassandra. this is a good use case for it, lots of writes, and few reads.
> plus, searching things, analyzing logs would be so much easier with
> cassandra.

If you are running your setup in Amazon AWS, and you are storing your
data in Cassandra, all it is doing is costing you money; it's running
a cluster of Cassandra instances whose purpose is to be available to
query logs (which is rare, by definition). It's better to log to flat
files, rotate/store them every hour/day/week in S3, then run
mapreduces across the logfiles. The storage is cheaper, the mapreduce
is cheaper, and the 2nd cheapest box in AWS can easily handle 100 gigs
of logs/day. That's just not possible with one Cassandra install at
that price. Even worse, if you decide that your X Cassandra machines
aren't enough, and want to go to 2X, your write speeds drop like a
rock every time you add a new one. Again, see the Reddit "our site
totally went down" blog post from last year:
http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html

Regards,
- Josiah

tianyuan

unread,
Jul 22, 2011, 6:32:25 AM7/22/11
to redi...@googlegroups.com
Is there any doc about PUB/SUB ?

在 2011年7月21日星期四UTC+8上午12时10分03秒,Josiah Carlson写道:

Hampus Wessman

unread,
Jul 22, 2011, 6:51:50 AM7/22/11
to redi...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/46pSGkir_nsJ.

Nick Quaranto

unread,
Jul 22, 2011, 9:58:01 AM7/22/11
to redi...@googlegroups.com
I tossed together a blog post about it:

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/46pSGkir_nsJ.

Salvatore Sanfilippo

unread,
Jul 22, 2011, 10:12:03 AM7/22/11
to redi...@googlegroups.com
On Mon, Jul 18, 2011 at 8:31 AM, tianyuan <iamti...@gmail.com> wrote:
> I got about 100 web severs, and I want all of these logs get together.
> so I am thinking about using publish/subscribe in Redis.
>
> Every web server publishs to to the same channel,
> one log processer subscribe this channel, and process every log it
> received.

Hello,

it sounds like a good idea. You are not going to store the logs into
Redis right?
instead you are simply using Pub/Sub as a way to collect logs in a central way.

An alternative is to push instead into a list with LPUSH, and the
"processor" of stats will use BRPOP or alike to get new results. This
way you can stop the collector for some time and logs will accumulate
into Redis memory.

> When the subscriber get the log, is the log still in memory or
> completely disappeared ?

Completely disappeared. This is why you may want to use lists instead.
But depends on your use case.

However Pub/Sub or queues are a good way to communicate between many
instances without inventing your own networking layer.

Cheers,
Salvatore

>
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Reply all
Reply to author
Forward
0 new messages