Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Should I use publish/subscribe?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  23 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
tianyuan  
View profile  
 More options Jul 18 2011, 2:31 am
From: tianyuan <iamtiany...@gmail.com>
Date: Sun, 17 Jul 2011 23:31:13 -0700 (PDT)
Local: Mon, Jul 18 2011 2:31 am
Subject: Should I use publish/subscribe?
I got about 100 web severs, and I want all of these logs get together.
so I am thinking about using publish/subscribe in Redis.

Every web server publishs to to the same channel,
one log processer subscribe this channel, and process every log it
received.

There are about 30G bytes log everyday.

Is this a appropriate way?

When the subscriber get the log, is the log still in memory or
completely disappeared ?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dvir Volk  
View profile  
 More options Jul 18 2011, 4:37 am
From: Dvir Volk <dvir...@gmail.com>
Date: Mon, 18 Jul 2011 11:37:40 +0300
Local: Mon, Jul 18 2011 4:37 am
Subject: Re: Should I use publish/subscribe?

what will happen if the processor goes down for a sec? you will lose all the
data because no one is subscribed to this channel.
IMHO this approach is better when you have many readers, but not ideal when
you have many writers and one reader.
you can simply push to a list for that matter, and it will be safer.

but off topic, I am using a non related open source solution for log
aggregation over my network, using Facebook's Scribe.
it collects, filters, forwards based on rules and aggregates logs from
multiple servers (tens of thousands on Facebook, ~20 in my case (
www.doat.com))

the idea is you have a local server running on each machine, with rules on
how to forward messages based on their  "category" (prefix). so this is very
fast.
then you have a central server the local servers forward messages to. and if
you have thousands of servers, you can create a multi tier tree of
collectors, or send logs to different collectors based on their categories.
https://github.com/facebook/scribe/wiki


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tianyuan  
View profile  
 More options Jul 18 2011, 5:21 am
From: tianyuan <iamtiany...@gmail.com>
Date: Mon, 18 Jul 2011 02:21:58 -0700 (PDT)
Local: Mon, Jul 18 2011 5:21 am
Subject: Re: Should I use publish/subscribe?

Thanks for your reply.

Messges will lost if there is no subcribers, right?

I think Scribe is too heavy for my less then 100 servers's simple log.

How about
LPUSH x_log "loglogloglogloglog"

and then  
RPOP x_log
ervery second until I get nil ?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dvir Volk  
View profile  
 More options Jul 18 2011, 5:36 am
From: Dvir Volk <dvir...@gmail.com>
Date: Mon, 18 Jul 2011 12:36:01 +0300
Local: Mon, Jul 18 2011 5:36 am
Subject: Re: Should I use publish/subscribe?

a. I don't think scribe is too heavy for your setup. it's been useful for me
since I had 4 servers. it's a bit of a pain to compile, but once you've done
that setting it up is extremely straightforward. but it's up to you.

b. use BLPOP/BRPOP in your client, that way your consumer just waits until
there's something to read, or returns immediately if there is.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 18 2011, 12:04 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Mon, 18 Jul 2011 09:04:02 -0700
Local: Mon, Jul 18 2011 12:04 pm
Subject: Re: Should I use publish/subscribe?
I don't have any experience with Scribe, but I agree with Dvir that
you should use something that was designed with logging in mind.

We've used syslog-ng as it increases the log message limit to be much
larger than is allowed in standard syslog (we sometimes log json
blobs), it allows for transport over UDP, TCP, and SSL, it is a
drop-in replacement for syslog (so all of the logging tools that your
platform offers will still work), it offers filtering and redirection
of different log messages (these can get a little ugly to configure,
but it's not bad), etc.

While I am generally a fan of hacking Redis to do just about anything,
in the case of logging: pick one of the standard log packages
(syslog-ng, flume, scribe, etc.). They work great, automatically
include time stamps, origin information, etc., and won't blow up your
memory if your log collection process fails to run for one reason or
another.

Regards,
 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dvir Volk  
View profile  
 More options Jul 18 2011, 12:54 pm
From: Dvir Volk <dvir...@gmail.com>
Date: Mon, 18 Jul 2011 19:54:41 +0300
Local: Mon, Jul 18 2011 12:54 pm
Subject: Re: Should I use publish/subscribe?

Agreed,
btw, we are using rsyslog as a system log aggregator, and scribe for our
application logs that are analyzed and monitored much differently.

On Mon, Jul 18, 2011 at 7:04 PM, Josiah Carlson <josiah.carl...@gmail.com>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tianyuan  
View profile  
 More options Jul 18 2011, 9:20 pm
From: tianyuan <iamtiany...@gmail.com>
Date: Mon, 18 Jul 2011 18:20:47 -0700 (PDT)
Local: Mon, Jul 18 2011 9:20 pm
Subject: Re: Should I use publish/subscribe?

ok, I will take a look at Scribe, it seems to be more reliable then hacking
Redis.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Ranney  
View profile  
 More options Jul 19 2011, 7:07 pm
From: Matt Ranney <m...@ranney.com>
Date: Tue, 19 Jul 2011 19:07:57 -0400
Local: Tues, Jul 19 2011 7:07 pm
Subject: Re: Should I use publish/subscribe?

I've been doing something like this in production for a while.  It has been
working really well for us.  We don't publish everything to the same channel
though, instead our channel names look like this:

process_name:level

One thing that's nice about doing publish/subscribe compared to rpush/blpop
for logging is that pub/sub allows multiple readers.  This is interesting
because you can attach different readers with different psubscribe patterns
to track down a problem and see the data scroll by in realtime.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Billenstein  
View profile  
 More options Jul 19 2011, 8:30 pm
From: Matt Billenstein <mbille...@gmail.com>
Date: Tue, 19 Jul 2011 17:30:53 -0700 (PDT)
Local: Tues, Jul 19 2011 8:30 pm
Subject: Re: Should I use publish/subscribe?

Hmm, I like that idea for application debug logs and perhaps traceback
logging/aggregation, but it seems like this can also be handled with a
standard syslog handler going to a central sink no?

I guess the cool thing is that anyone can write a simple client to get at
the data they want from a redis instance on their private network, whereas
they may not have privileges to login to the central log server in a larger
organization.

m


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tianyuan  
View profile  
 More options Jul 20 2011, 2:47 am
From: tianyuan <iamtiany...@gmail.com>
Date: Tue, 19 Jul 2011 23:47:59 -0700 (PDT)
Local: Wed, Jul 20 2011 2:47 am
Subject: Re: Should I use publish/subscribe?

what if some messages are published but the processers are too busy to
handler them?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 20 2011, 3:17 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 20 Jul 2011 00:17:08 -0700
Local: Wed, Jul 20 2011 3:17 am
Subject: Re: Should I use publish/subscribe?
Are you referring to Redis pubsub, or are you referring to something else?

 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tianyuan  
View profile  
 More options Jul 20 2011, 3:33 am
From: tianyuan <iamtiany...@gmail.com>
Date: Wed, 20 Jul 2011 00:33:41 -0700 (PDT)
Local: Wed, Jul 20 2011 3:33 am
Subject: Re: Should I use publish/subscribe?

Sorry, I mean pub/sub in Redis.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
emre yılmaz  
View profile  
 More options Jul 20 2011, 8:38 am
From: emre yılmaz <m...@emreyilmaz.me>
Date: Wed, 20 Jul 2011 15:38:47 +0300
Local: Wed, Jul 20 2011 8:38 am
Subject: Re: Should I use publish/subscribe?

2011/7/20 tianyuan <iamtiany...@gmail.com>

i think, it is better for you to use a simple fifo queue implementation. but
not with redis, since it's an in-memory database. i would use rabbitmq for
queue management and cassandra for storing logs in the disk. (your workers
simply  get the messages from rabbitmq in the queue and send it to the
cassandra.)

--
web developer
http://www.emreyilmaz.me


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 20 2011, 12:09 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 20 Jul 2011 09:09:10 -0700
Local: Wed, Jul 20 2011 12:09 pm
Subject: Re: Should I use publish/subscribe?
I would wholeheartedly recommend against using RabbitMQ, as heavy
writes (sometimes as few as a few hundred/second) can cause it to
segfault. [1]

Also, putting logs into Cassandra doesn't magically make them scale.
To make any on-disk storage system really scale takes either 1) no
indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc).
Sticking with plain logs storage also makes them trivial to backup,
analyze, import into another system if he discovers a need for them
later, etc.

Regards,
 - Josiah

[1] http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 20 2011, 12:10 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 20 Jul 2011 09:10:03 -0700
Local: Wed, Jul 20 2011 12:10 pm
Subject: Re: Should I use publish/subscribe?
It is buffered on the Redis server. There was discussions about
whether to disconnect a client that isn't responding fast enough, but
I don't know whether that was implemented.

Regards,
 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
emre yılmaz  
View profile   Translate to Translated (View Original)
 More options Jul 20 2011, 4:50 pm
From: emre yılmaz <m...@emreyilmaz.me>
Date: Wed, 20 Jul 2011 23:50:32 +0300
Local: Wed, Jul 20 2011 4:50 pm
Subject: Re: Should I use publish/subscribe?

2011/7/20 Josiah Carlson <josiah.carl...@gmail.com>

> I would wholeheartedly recommend against using RabbitMQ, as heavy
> writes (sometimes as few as a few hundred/second) can cause it to
> segfault. [1]

well, in our setup -with to replicated rabbitmq instance-
receiving/processing 1000+ messages per second, and it's works like a charm
:)

> Also, putting logs into Cassandra doesn't magically make them scale.
> To make any on-disk storage system really scale takes either 1) no
> indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc).
> Sticking with plain logs storage also makes them trivial to backup,
> analyze, import into another system if he discovers a need for them
> later, etc.

18 GB per day is a huge data to store in the memory.

--
web developer
http://www.emreyilmaz.me


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 20 2011, 6:37 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 20 Jul 2011 15:37:33 -0700
Local: Wed, Jul 20 2011 6:37 pm
Subject: Re: Should I use publish/subscribe?

On Wed, Jul 20, 2011 at 1:50 PM, emre yılmaz <m...@emreyilmaz.me> wrote:

> 2011/7/20 Josiah Carlson <josiah.carl...@gmail.com>

>> I would wholeheartedly recommend against using RabbitMQ, as heavy
>> writes (sometimes as few as a few hundred/second) can cause it to
>> segfault. [1]

> well, in our setup -with to replicated rabbitmq instance-
> receiving/processing 1000+ messages per second, and it's works like a charm
> :)

We ran into a segfaulting condition at 75, I've had friends locally
who have run into it at 20/second.

>> Also, putting logs into Cassandra doesn't magically make them scale.
>> To make any on-disk storage system really scale takes either 1) no
>> indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc).
>> Sticking with plain logs storage also makes them trivial to backup,
>> analyze, import into another system if he discovers a need for them
>> later, etc.

> 18 GB per day is a huge data to store in the memory.

Who said anything about storing them in memory? I'm talking about
logging them to plain files on disk (then putting them up in S3 or
similar for longer term storage/analysis).

 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
emre yılmaz  
View profile  
 More options Jul 20 2011, 7:18 pm
From: emre yılmaz <m...@emreyilmaz.me>
Date: Thu, 21 Jul 2011 02:18:59 +0300
Local: Wed, Jul 20 2011 7:18 pm
Subject: Re: Should I use publish/subscribe?

2011/7/21 Josiah Carlson <josiah.carl...@gmail.com>

> We ran into a segfaulting condition at 75, I've had friends locally
> who have run into it at 20/second.

 the numbers you mention are just too low for getting trouble with rabbitmq.

> Who said anything about storing them in memory? I'm talking about
> logging them to plain files on disk (then putting them up in S3 or
> similar for longer term storage/analysis).

okay, there is a misunderstood. i replaced 'plain text files' with
cassandra. this is a good use case for it, lots of writes, and few reads.
plus, searching things, analyzing logs would be so much easier with
cassandra.

--
web developer
http://www.emreyilmaz.me


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 20 2011, 9:18 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 20 Jul 2011 18:18:52 -0700
Local: Wed, Jul 20 2011 9:18 pm
Subject: Re: Should I use publish/subscribe?

On Wed, Jul 20, 2011 at 4:18 PM, emre yılmaz <m...@emreyilmaz.me> wrote:

> 2011/7/21 Josiah Carlson <josiah.carl...@gmail.com>

>> We ran into a segfaulting condition at 75, I've had friends locally
>> who have run into it at 20/second.

>  the numbers you mention are just too low for getting trouble with rabbitmq.

That's what we thought after manually testing at 1k/second, but yet we
and others have segfaulted at that rate (we were running on a 32 bit
box, and apparently suffered some memory fragmentation). Maybe we were
running a buggy version of RabbitMQ, maybe we were running an improper
version of Erlang, I don't know. All I remember from a year and a few
months ago is: it broke about a week after Reddit had theirs break, we
had to spend a week replacing our Celery + RabbitMQ production
infrastructure with ActiveMQ.

>> Who said anything about storing them in memory? I'm talking about
>> logging them to plain files on disk (then putting them up in S3 or
>> similar for longer term storage/analysis).

> okay, there is a misunderstood. i replaced 'plain text files' with
> cassandra. this is a good use case for it, lots of writes, and few reads.
> plus, searching things, analyzing logs would be so much easier with
> cassandra.

If you are running your setup in Amazon AWS, and you are storing your
data in Cassandra, all it is doing is costing you money; it's running
a cluster of Cassandra instances whose purpose is to be available to
query logs (which is rare, by definition). It's better to log to flat
files, rotate/store them every hour/day/week in S3, then run
mapreduces across the logfiles. The storage is cheaper, the mapreduce
is cheaper, and the 2nd cheapest box in AWS can easily handle 100 gigs
of logs/day. That's just not possible with one Cassandra install at
that price. Even worse, if you decide that your X Cassandra machines
aren't enough, and want to go to 2X, your write speeds drop like a
rock every time you add a new one. Again, see the Reddit "our site
totally went down" blog post from last year:
http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html

Regards,
 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "回复:Re: Should I use publish/subscribe?" by tianyuan
tianyuan  
View profile  
 More options Jul 22 2011, 6:32 am
From: tianyuan <iamtiany...@gmail.com>
Date: Fri, 22 Jul 2011 03:32:25 -0700 (PDT)
Local: Fri, Jul 22 2011 6:32 am
Subject: 回复:Re: Should I use publish/subscribe?

Is there any doc about PUB/SUB ?

在 2011年7月21日星期四UTC+8上午12时10分03秒,Josiah Carlson写道:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hampus Wessman  
View profile  
 More options Jul 22 2011, 6:51 am
From: Hampus Wessman <hampus.wess...@gmail.com>
Date: Fri, 22 Jul 2011 12:51:50 +0200
Local: Fri, Jul 22 2011 6:51 am
Subject: Re: 回复:Re: Should I use publish/subscribe?

Have a look at http://redis.io/topics/pubsub.

On 2011-07-22 12:32, tianyuan wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nick Quaranto  
View profile  
 More options Jul 22 2011, 9:58 am
From: Nick Quaranto <n...@quaran.to>
Date: Fri, 22 Jul 2011 09:58:01 -0400
Local: Fri, Jul 22 2011 9:58 am
Subject: Re: 回复:Re: Should I use publish/subscribe?

I tossed together a blog post about it:

http://robots.thoughtbot.com/post/6325247416/redis-pub-sub-how-does-i...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Should I use publish/subscribe?" by Salvatore Sanfilippo
Salvatore Sanfilippo  
View profile  
 More options Jul 22 2011, 10:12 am
From: Salvatore Sanfilippo <anti...@gmail.com>
Date: Fri, 22 Jul 2011 16:12:03 +0200
Local: Fri, Jul 22 2011 10:12 am
Subject: Re: Should I use publish/subscribe?

On Mon, Jul 18, 2011 at 8:31 AM, tianyuan <iamtiany...@gmail.com> wrote:
> I got about 100 web severs, and I want all of these logs get together.
> so I am thinking about using publish/subscribe in Redis.

> Every web server publishs to to the same channel,
> one log processer subscribe this channel, and process every log it
> received.

Hello,

it sounds like a good idea. You are not going to store the logs into
Redis right?
instead you are simply using Pub/Sub as a way to collect logs in a central way.

An alternative is to push instead into a list with LPUSH, and the
"processor" of stats will use BRPOP or alike to get new results. This
way you can stop the collector for some time and logs will accumulate
into Redis memory.

> When the subscriber get the log, is the log still in memory or
> completely disappeared ?

Completely disappeared. This is why you may want to use lists instead.
But depends on your use case.

However Pub/Sub or queues are a good way to communicate between many
instances without inventing your own networking layer.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redis-db@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »