redis in body_filter section

Alexey Nalbat

unread,

Oct 30, 2012, 3:07:35 AM10/30/12

to openre...@googlegroups.com

Hello.

Our task is to get real time statistics of web requests. But we also
serve large video files. Thus I think we should log continuous requests
during execution for example every second. And frequently upload logs to
statistics. It is not possible to log a request multiple times during
its execution with nginx itself. I've found a solution with lua module:

body_filter_by_lua '
ngx.ctx.len = ngx.ctx.len + string.len(ngx.arg[1])
ngx.update_time()
ngx.ctx.now = ngx.now()
if ngx.ctx.now > ngx.ctx.now_prev + 1 -- seconds
then
ngx.ctx.realtime_log:write(
...
ngx.ctx.realtime_log:flush()

Earlier in rewrite_by_lua section I've opened ngx.ctx.realtime_log for
appending and initialized ngx.ctx.len and other counters. Later in
log_by_lua we are logging the end of request and close log file. I've
attached nginx config file, it works ok.

But I'd prefer not to write log file, but send logs directly into redis.
Unfortunately redis operations cannot be used within body_filter_by_lua
section. As I understand, it concerns all cosocket interfaces. Is it a
principal restriction? Or are you going to implement it in future? Or is
there some workaround?

Thanks in advance. And sorry for my bad english.

Alexey Nalbat.

nginx.conf

agentzh

unread,

Oct 30, 2012, 2:02:29 PM10/30/12

to openre...@googlegroups.com

Hello!

On Tue, Oct 30, 2012 at 12:07 AM, Alexey Nalbat wrote:
> Hello.
>
> Our task is to get real time statistics of web requests. But we also
> serve large video files. Thus I think we should log continuous requests
> during execution for example every second. And frequently upload logs to
> statistics. It is not possible to log a request multiple times during
> its execution with nginx itself. I've found a solution with lua module:
>

Well, you can write to the shared memory dictionaries in either
body_filter_by_lua or log_by_lua (the latter is preferred for optimal
performance):

http://wiki.nginx.org/HttpLuaModule#ngx.shared.DICT

And then you can push it into redis in access_by_lua for every second,
for example, or just pull it from an external cronjob script. The
latter approach is recommended, you just need to expose a web service
for the outside to pull the data from the shm store, that way, you
don't have to check if you need to push the data at *every* nginx
request. We (CloudFlare) uses the latter approach in production to
gather (almost) realtime request statistics.

Best regards,
-agentzh

agentzh

unread,

Oct 30, 2012, 2:05:58 PM10/30/12

to openre...@googlegroups.com

Hello!

On Tue, Oct 30, 2012 at 12:07 AM, Alexey Nalbat wrote:
>

> But I'd prefer not to write log file, but send logs directly into redis.
> Unfortunately redis operations cannot be used within body_filter_by_lua
> section. As I understand, it concerns all cosocket interfaces. Is it a
> principal restriction? Or are you going to implement it in future? Or is
> there some workaround?
>

No, cosockets and any other nonblocking I/O APIs provided by ngx_lua
cannot be used in contexts of Nginx output filters or log phase
handlers, due to the limitation of how these things are implemented in
the Nginx core. No workaround exists. Just use the shared memory store
to gather the statistical data as mentioned in my last mail :)

Best regards,
-agentzh

Brian Akins

unread,

Oct 31, 2012, 7:41:13 AM10/31/12

to openre...@googlegroups.com

On Tue, Oct 30, 2012 at 2:02 PM, agentzh <age...@gmail.com> wrote:
> latter approach is recommended, you just need to expose a web service
> for the outside to pull the data from the shm store, that way, you
> don't have to check if you need to push the data at *every* nginx
> request. We (CloudFlare) uses the latter approach in production to
> gather (almost) realtime request statistics.

We do the same. We expose them via json.

Of course, now this has me thinking that there may be a use case for
fetching all the keys and data from a shared dict all at once. Right
now we just loop over them and I wonder if it would be more efficient
to just grab them all at once. Or, if we had the hooks, we could
update the data in place - that's how my old stats handler worked:
atomic increments in shared memory - I wonder if that be a useful Lua
add-on...

agentzh

unread,

Oct 31, 2012, 1:37:35 PM10/31/12

to openre...@googlegroups.com

Hello!

On Wed, Oct 31, 2012 at 4:41 AM, Brian Akins wrote:
> Of course, now this has me thinking that there may be a use case for
> fetching all the keys and data from a shared dict all at once. Right
> now we just loop over them and I wonder if it would be more efficient
> to just grab them all at once. Or, if we had the hooks, we could
> update the data in place - that's how my old stats handler worked:
> atomic increments in shared memory - I wonder if that be a useful Lua
> add-on...
>

The get_keys method for shdict is mostly a hack for making hard things
possible. We should not have done this in the first place in fact due
to the current locking mechanism ;)

The shdict thing was originally implemented as just a simple LRU cache
store. So I don't feel like adding more really expensive
functionalities that do not fit into the shdict's basic design :)

I'm thinking about implementing another lock-free shared memory store
as an Nginx C module that extends ngx_lua via ngx_lua's public C API
:)

Best regards,
-agentzh

Brian Akins

unread,

Nov 2, 2012, 3:21:57 PM11/2/12

to openre...@googlegroups.com

Is it possible to avoid using spin locks and instead use "callback
locks"? IE, pass a function to be called when the lock is obtained?
spin locks in async code doesn't make much sense to me.

Depending on the situation, I use redis for my shared cache. I still
use the shared dicts a good bit as well.

--bakins

agentzh

unread,

Nov 2, 2012, 5:43:21 PM11/2/12

to openre...@googlegroups.com

Hello!

On Fri, Nov 2, 2012 at 12:21 PM, Brian Akins wrote:
> Is it possible to avoid using spin locks and instead use "callback
> locks"? IE, pass a function to be called when the lock is obtained?
> spin locks in async code doesn't make much sense to me.
>

I don't know if this can be implemented for shared memory zones with a
good latency guarantee.

Because the lock holder can be in another OS process, we'll have to
introduce some kind of Inter-Process Communications here to actively
notify other processes waiting on the lock, which is potentially quite
expensive.

Best regards,
-agentzh

Brian Akins

unread,

Nov 3, 2012, 9:15:08 AM11/3/12

to openre...@googlegroups.com

On Nov 2, 2012, at 5:43 PM, agentzh <age...@gmail.com> wrote:

> Because the lock holder can be in another OS process, we'll have to
> introduce some kind of Inter-Process Communications here to actively
> notify other processes waiting on the lock, which is potentially quite
> expensive.

I don't know how the internal nginx event loop works, so this may not work. If the lock is just an atomic integer in shared memory, could the event loop in the locking process just do a compare and swap on the lock and run the callback when it succeeds? In a different project, we did this using an event loops "idle" handler (it was libev) and/or a timer.

So the locking process on a lock call would:
- do a compare_and_swap on the integer in shared memory
- if this succeeds, just run the callback (or schedule it to run immediately)
- if not, schedule a "lock checker" to be ran either during the loops "idle" handler or on a timer, or maybe even every loop iteration.
- the lock checker simply does a compare_and_swap and calls the callback if it succeeds .

This works even if the same process is trying to lock multiple times (ie, 2 or more different requests).

This approach uses a little more cpu depending on lock contention, but is better than the current implementation, IMO. The current implementation "pauses" the locking process until it can get the lock.

Like I said, I don't know the internals of the nginx event loop, so this may not be feasible in nginx.

--Brian

agentzh

unread,

Nov 5, 2012, 2:43:41 PM11/5/12

to openre...@googlegroups.com

Hello!

On Sat, Nov 3, 2012 at 6:15 AM, Brian Akins wrote:
>
>
> I don't know how the internal nginx event loop works, so this may not work. If the lock is just an atomic integer in shared memory, could the event loop in the locking process just do a compare and swap on the lock and run the callback when it succeeds? In a different project, we did this using an event loops "idle" handler (it was libev) and/or a timer.
>
> So the locking process on a lock call would:
> - do a compare_and_swap on the integer in shared memory
> - if this succeeds, just run the callback (or schedule it to run immediately)
> - if not, schedule a "lock checker" to be ran either during the loops "idle" handler or on a timer, or maybe even every loop iteration.
> - the lock checker simply does a compare_and_swap and calls the callback if it succeeds .
>

Thank you for sharing this method but I'm afraid the main problem with
this approach is that for all the requests accessing the shm
dictionary the latency can be unnecessarily long (and mostly
non-predictable), the upper bound of which is depending on the timer
interval or even the server load if the "idle" handler is used.

I think this is essentially time-slicing the spinlock cycles to serve
other requests that are not accessing the shm store. And that's also
why we decide not to implement automatic time-slicing for the "light
threads" (i.e., ngx.thread) in ngx_lua because the latency over event
cycles or timers are unmanageable. The drop in latency and throughput
may be okay for certain apps but not for others.

>
> This approach uses a little more cpu depending on lock contention, but is better than the current implementation, IMO.

I won't say this is generally *better*. I would say this is a
workaround for long-time lock holders which should not have existed in
the first place ;) The standard Memcached wire protocol, for example,
does not bother exposing a key/value iteration API that could possible
hold the lock for long. These cache store was not specifically
designed for such scenarios.

Still I suggest creating a new shm-based store that uses a different
model for this (possibly completely nonblocking) instead of going on
the wrong track for too long :)

> Like I said, I don't know the internals of the nginx event loop, so this may not be feasible in nginx.
>

The event model of Nginx does not look very different than other
popular event-related C libraries, though maybe not decently
encapsulated for the sake of performance.

Best regards,
-agentzh

Brian Akins

unread,

Nov 5, 2012, 6:23:18 PM11/5/12

to openre...@googlegroups.com

On Nov 5, 2012, at 2:43 PM, agentzh <age...@gmail.com> wrote:

> I think this is essentially time-slicing the spinlock cycles to serve
> other requests that are not accessing the shm store

But, if you run one worker per CPU, then you have a CPU doing nothing during a spinlock.

I have an actual real world case not using get_keys, but normal gets/sets on a shared dict where the shm_lock became the bottleneck. Or should I say most of the processes spent a lot of time waiting on the same lock(s). I wound up "sharding" my shared dicts and this helped some but isn't optimal. I really haven't had the time to dig into it further. However, maybe I am an edge case ;)

--Brian

Brian Akins

unread,

Nov 5, 2012, 6:31:00 PM11/5/12

to openre...@googlegroups.com

On Nov 5, 2012, at 6:23 PM, Brian Akins <br...@akins.org> wrote:

>
> On Nov 5, 2012, at 2:43 PM, agentzh <age...@gmail.com> wrote:
>
>> I think this is essentially time-slicing the spinlock cycles to serve
>> other requests that are not accessing the shm store
>
> But, if you run one worker per CPU, then you have a CPU doing nothing during a spinlock.

I mean in the current implementation, this happens. A CPU basically "sleeps" while waiting on the lock. I bumped up the number of workers, but then I ran into lock contention.

I've been experimenting using redis or memcache, but there is overhead with using an external "data store." It is nice, however, because all of the network IO is asynchronous, while the spin locks are synchronous. I've not been overly impressed with the performance however. Having to shard redis to take advantage of multiple CPUs just leads to more complexity.

A general purpose, lockless, in-process, shared memory "hash" would be nice. Perhaps this should be done as an external module from ngx_lua core? Sounds like a good weekend project :)

--Brian

agentzh

unread,

Nov 5, 2012, 6:35:54 PM11/5/12

to openre...@googlegroups.com

Hello!

On Mon, Nov 5, 2012 at 3:31 PM, Brian Akins wrote:
> I mean in the current implementation, this happens. A CPU basically "sleeps" while waiting on the lock. I bumped up the number of workers, but then I ran into lock contention.
>

Yes, this could happen under really heavy load with quite a few Nginx workers.

>
> A general purpose, lockless, in-process, shared memory "hash" would be nice. Perhaps this should be done as an external module from ngx_lua core? Sounds like a good weekend project :)
>

Yes, this is what I love to see :)

Best regards,
-agentzh

Reply all

Reply to author

Forward