what will happen if the processor goes down for a sec? you will lose all the data because no one is subscribed to this channel. IMHO this approach is better when you have many readers, but not ideal when you have many writers and one reader. you can simply push to a list for that matter, and it will be safer.
but off topic, I am using a non related open source solution for log aggregation over my network, using Facebook's Scribe. it collects, filters, forwards based on rules and aggregates logs from multiple servers (tens of thousands on Facebook, ~20 in my case ( www.doat.com))
the idea is you have a local server running on each machine, with rules on how to forward messages based on their "category" (prefix). so this is very fast. then you have a central server the local servers forward messages to. and if you have thousands of servers, you can create a multi tier tree of collectors, or send logs to different collectors based on their categories. https://github.com/facebook/scribe/wiki
On Mon, Jul 18, 2011 at 9:31 AM, tianyuan <iamtiany...@gmail.com> wrote: > I got about 100 web severs, and I want all of these logs get together. > so I am thinking about using publish/subscribe in Redis.
> Every web server publishs to to the same channel, > one log processer subscribe this channel, and process every log it > received.
> There are about 30G bytes log everyday.
> Is this a appropriate way?
> When the subscriber get the log, is the log still in memory or > completely disappeared ?
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
a. I don't think scribe is too heavy for your setup. it's been useful for me since I had 4 servers. it's a bit of a pain to compile, but once you've done that setting it up is extremely straightforward. but it's up to you.
b. use BLPOP/BRPOP in your client, that way your consumer just waits until there's something to read, or returns immediately if there is.
> To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
I don't have any experience with Scribe, but I agree with Dvir that you should use something that was designed with logging in mind.
We've used syslog-ng as it increases the log message limit to be much larger than is allowed in standard syslog (we sometimes log json blobs), it allows for transport over UDP, TCP, and SSL, it is a drop-in replacement for syslog (so all of the logging tools that your platform offers will still work), it offers filtering and redirection of different log messages (these can get a little ugly to configure, but it's not bad), etc.
While I am generally a fan of hacking Redis to do just about anything, in the case of logging: pick one of the standard log packages (syslog-ng, flume, scribe, etc.). They work great, automatically include time stamps, origin information, etc., and won't blow up your memory if your log collection process fails to run for one reason or another.
On Mon, Jul 18, 2011 at 2:36 AM, Dvir Volk <dvir...@gmail.com> wrote: > a. I don't think scribe is too heavy for your setup. it's been useful for me > since I had 4 servers. it's a bit of a pain to compile, but once you've done > that setting it up is extremely straightforward. but it's up to you. > b. use BLPOP/BRPOP in your client, that way your consumer just waits until > there's something to read, or returns immediately if there is.
> On Mon, Jul 18, 2011 at 12:21 PM, tianyuan <iamtiany...@gmail.com> wrote:
>> Thanks for your reply.
>> Messges will lost if there is no subcribers, right?
>> I think Scribe is too heavy for my less then 100 servers's simple log.
>> How about >> LPUSH x_log "loglogloglogloglog"
>> and then >> RPOP x_log >> ervery second until I get nil ?
>> -- >> You received this message because you are subscribed to the Google Groups >> "Redis DB" group. >> To view this discussion on the web visit >> https://groups.google.com/d/msg/redis-db/-/FwyRq9aPRr8J. >> To post to this group, send email to redis-db@googlegroups.com. >> To unsubscribe from this group, send email to >> redis-db+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/redis-db?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> I don't have any experience with Scribe, but I agree with Dvir that > you should use something that was designed with logging in mind.
> We've used syslog-ng as it increases the log message limit to be much > larger than is allowed in standard syslog (we sometimes log json > blobs), it allows for transport over UDP, TCP, and SSL, it is a > drop-in replacement for syslog (so all of the logging tools that your > platform offers will still work), it offers filtering and redirection > of different log messages (these can get a little ugly to configure, > but it's not bad), etc.
> While I am generally a fan of hacking Redis to do just about anything, > in the case of logging: pick one of the standard log packages > (syslog-ng, flume, scribe, etc.). They work great, automatically > include time stamps, origin information, etc., and won't blow up your > memory if your log collection process fails to run for one reason or > another.
> Regards, > - Josiah
> On Mon, Jul 18, 2011 at 2:36 AM, Dvir Volk <dvir...@gmail.com> wrote: > > a. I don't think scribe is too heavy for your setup. it's been useful for > me > > since I had 4 servers. it's a bit of a pain to compile, but once you've > done > > that setting it up is extremely straightforward. but it's up to you. > > b. use BLPOP/BRPOP in your client, that way your consumer just waits > until > > there's something to read, or returns immediately if there is.
> > On Mon, Jul 18, 2011 at 12:21 PM, tianyuan <iamtiany...@gmail.com> > wrote:
> >> Thanks for your reply.
> >> Messges will lost if there is no subcribers, right?
> >> I think Scribe is too heavy for my less then 100 servers's simple log.
> >> How about > >> LPUSH x_log "loglogloglogloglog"
> >> and then > >> RPOP x_log > >> ervery second until I get nil ?
> >> -- > >> You received this message because you are subscribed to the Google > Groups > >> "Redis DB" group. > >> To view this discussion on the web visit > >> https://groups.google.com/d/msg/redis-db/-/FwyRq9aPRr8J. > >> To post to this group, send email to redis-db@googlegroups.com. > >> To unsubscribe from this group, send email to > >> redis-db+unsubscribe@googlegroups.com. > >> For more options, visit this group at > >> http://groups.google.com/group/redis-db?hl=en.
> > -- > > You received this message because you are subscribed to the Google Groups > > "Redis DB" group. > > To post to this group, send email to redis-db@googlegroups.com. > > To unsubscribe from this group, send email to > > redis-db+unsubscribe@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/redis-db?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
I've been doing something like this in production for a while. It has been working really well for us. We don't publish everything to the same channel though, instead our channel names look like this:
process_name:level
One thing that's nice about doing publish/subscribe compared to rpush/blpop for logging is that pub/sub allows multiple readers. This is interesting because you can attach different readers with different psubscribe patterns to track down a problem and see the data scroll by in realtime.
On Mon, Jul 18, 2011 at 2:31 AM, tianyuan <iamtiany...@gmail.com> wrote: > Every web server publishs to to the same channel, > one log processer subscribe this channel, and process every log it > received.
Hmm, I like that idea for application debug logs and perhaps traceback logging/aggregation, but it seems like this can also be handled with a standard syslog handler going to a central sink no?
I guess the cool thing is that anyone can write a simple client to get at the data they want from a redis instance on their private network, whereas they may not have privileges to login to the central log server in a larger organization.
On Tue, Jul 19, 2011 at 11:47 PM, tianyuan <iamtiany...@gmail.com> wrote: > what if some messages are published but the processers are too busy to > handler them?
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/redis-db/-/emC1Hq6Lg-IJ. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
On Wednesday, July 20, 2011 3:17:08 PM UTC+8, Josiah Carlson wrote:
> Are you referring to Redis pubsub, or are you referring to something else?
> - Josiah
> On Tue, Jul 19, 2011 at 11:47 PM, tianyuan <iamti...@gmail.com> wrote: > > what if some messages are published but the processers are too busy to > > handler them?
> On Wednesday, July 20, 2011 3:17:08 PM UTC+8, Josiah Carlson wrote:
>> Are you referring to Redis pubsub, or are you referring to something else?
>> - Josiah
>> On Tue, Jul 19, 2011 at 11:47 PM, tianyuan <iamti...@gmail.com> wrote: >> > what if some messages are published but the processers are too busy to >> > handler them?
i think, it is better for you to use a simple fifo queue implementation. but not with redis, since it's an in-memory database. i would use rabbitmq for queue management and cassandra for storing logs in the disk. (your workers simply get the messages from rabbitmq in the queue and send it to the cassandra.)
I would wholeheartedly recommend against using RabbitMQ, as heavy writes (sometimes as few as a few hundred/second) can cause it to segfault. [1]
Also, putting logs into Cassandra doesn't magically make them scale. To make any on-disk storage system really scale takes either 1) no indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc). Sticking with plain logs storage also makes them trivial to backup, analyze, import into another system if he discovers a need for them later, etc.
> i think, it is better for you to use a simple fifo queue implementation. but > not with redis, since it's an in-memory database. i would use rabbitmq for > queue management and cassandra for storing logs in the disk. (your workers > simply get the messages from rabbitmq in the queue and send it to the > cassandra.)
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
It is buffered on the Redis server. There was discussions about whether to disconnect a client that isn't responding fast enough, but I don't know whether that was implemented.
On Wed, Jul 20, 2011 at 12:33 AM, tianyuan <iamtiany...@gmail.com> wrote: > Sorry, I mean pub/sub in Redis.
> On Wednesday, July 20, 2011 3:17:08 PM UTC+8, Josiah Carlson wrote:
>> Are you referring to Redis pubsub, or are you referring to something else?
>> - Josiah
>> On Tue, Jul 19, 2011 at 11:47 PM, tianyuan <iamti...@gmail.com> wrote: >> > what if some messages are published but the processers are too busy to >> > handler them?
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/redis-db/-/lDnbhGUrbq8J. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> I would wholeheartedly recommend against using RabbitMQ, as heavy > writes (sometimes as few as a few hundred/second) can cause it to > segfault. [1]
well, in our setup -with to replicated rabbitmq instance- receiving/processing 1000+ messages per second, and it's works like a charm :)
> Also, putting logs into Cassandra doesn't magically make them scale. > To make any on-disk storage system really scale takes either 1) no > indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc). > Sticking with plain logs storage also makes them trivial to backup, > analyze, import into another system if he discovers a need for them > later, etc.
18 GB per day is a huge data to store in the memory.
>> I would wholeheartedly recommend against using RabbitMQ, as heavy >> writes (sometimes as few as a few hundred/second) can cause it to >> segfault. [1]
> well, in our setup -with to replicated rabbitmq instance- > receiving/processing 1000+ messages per second, and it's works like a charm > :)
We ran into a segfaulting condition at 75, I've had friends locally who have run into it at 20/second.
>> Also, putting logs into Cassandra doesn't magically make them scale. >> To make any on-disk storage system really scale takes either 1) no >> indexes (plain log files) or 2) more disks (Cassandra, MongoDB, etc). >> Sticking with plain logs storage also makes them trivial to backup, >> analyze, import into another system if he discovers a need for them >> later, etc.
> 18 GB per day is a huge data to store in the memory.
Who said anything about storing them in memory? I'm talking about logging them to plain files on disk (then putting them up in S3 or similar for longer term storage/analysis).
> We ran into a segfaulting condition at 75, I've had friends locally > who have run into it at 20/second.
the numbers you mention are just too low for getting trouble with rabbitmq.
> Who said anything about storing them in memory? I'm talking about > logging them to plain files on disk (then putting them up in S3 or > similar for longer term storage/analysis).
okay, there is a misunderstood. i replaced 'plain text files' with cassandra. this is a good use case for it, lots of writes, and few reads. plus, searching things, analyzing logs would be so much easier with cassandra.
>> We ran into a segfaulting condition at 75, I've had friends locally >> who have run into it at 20/second.
> the numbers you mention are just too low for getting trouble with rabbitmq.
That's what we thought after manually testing at 1k/second, but yet we and others have segfaulted at that rate (we were running on a 32 bit box, and apparently suffered some memory fragmentation). Maybe we were running a buggy version of RabbitMQ, maybe we were running an improper version of Erlang, I don't know. All I remember from a year and a few months ago is: it broke about a week after Reddit had theirs break, we had to spend a week replacing our Celery + RabbitMQ production infrastructure with ActiveMQ.
>> Who said anything about storing them in memory? I'm talking about >> logging them to plain files on disk (then putting them up in S3 or >> similar for longer term storage/analysis).
> okay, there is a misunderstood. i replaced 'plain text files' with > cassandra. this is a good use case for it, lots of writes, and few reads. > plus, searching things, analyzing logs would be so much easier with > cassandra.
If you are running your setup in Amazon AWS, and you are storing your data in Cassandra, all it is doing is costing you money; it's running a cluster of Cassandra instances whose purpose is to be available to query logs (which is rare, by definition). It's better to log to flat files, rotate/store them every hour/day/week in S3, then run mapreduces across the logfiles. The storage is cheaper, the mapreduce is cheaper, and the 2nd cheapest box in AWS can easily handle 100 gigs of logs/day. That's just not possible with one Cassandra install at that price. Even worse, if you decide that your X Cassandra machines aren't enough, and want to go to 2X, your write speeds drop like a rock every time you add a new one. Again, see the Reddit "our site totally went down" blog post from last year: http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html
> It is buffered on the Redis server. There was discussions about
> whether to disconnect a client that isn't responding fast enough, but
> I don't know whether that was implemented.
> It is buffered on the Redis server. There was discussions about > whether to disconnect a client that isn't responding fast enough, but > I don't know whether that was implemented.
> Regards, > - Josiah
> -- > You received this message because you are subscribed to the Google > Groups "Redis DB" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/redis-db/-/46pSGkir_nsJ. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
>> It is buffered on the Redis server. There was discussions about >> whether to disconnect a client that isn't responding fast enough, but >> I don't know whether that was implemented.
> To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
On Mon, Jul 18, 2011 at 8:31 AM, tianyuan <iamtiany...@gmail.com> wrote: > I got about 100 web severs, and I want all of these logs get together. > so I am thinking about using publish/subscribe in Redis.
> Every web server publishs to to the same channel, > one log processer subscribe this channel, and process every log it > received.
Hello,
it sounds like a good idea. You are not going to store the logs into Redis right? instead you are simply using Pub/Sub as a way to collect logs in a central way.
An alternative is to push instead into a list with LPUSH, and the "processor" of stats will use BRPOP or alike to get new results. This way you can stop the collector for some time and logs will accumulate into Redis memory.
> When the subscriber get the log, is the log still in memory or > completely disappeared ?
Completely disappeared. This is why you may want to use lists instead. But depends on your use case.
However Pub/Sub or queues are a good way to communicate between many instances without inventing your own networking layer.
Cheers, Salvatore
> -- > You received this message because you are subscribed to the Google Groups "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
-- Salvatore 'antirez' Sanfilippo open source developer - VMware
http://invece.org "We are what we repeatedly do. Excellence, therefore, is not an act, but a habit." -- Aristotele