setting headers via ngx.header --> no gzip encoding

885 views
Skip to first unread message

bjoe2k4

unread,
Aug 10, 2015, 8:31:04 PM8/10/15
to openresty-en
Hi,

i got the following problem: in my nginx server i have the gzip module enabled, that gzips text/* type files. All works nice, however i have encountered problems (sending non-gziped) with respect to nginx lua module when setting the content-type via ngx.header. I have prepared the following examples to show my issues:

    location = test1 {
        content_by_lua
'
            ngx.header["Content-Type"] = "text/html"
            ngx.say("test")
        '
;
   
}
   
    location
= test2 {
        content_by_lua
'
            ngx.header["Content-Type"] = "text/html; charset=utf-8"
            ngx.say("test")
        '
;
   
}
   
    location
= test3.php {
        include inc
.d/fastcgi;
        fastcgi_pass swtal
;
   
}

test3
.php:
<?php
header
('Content-Type: text/html; charset=utf-8');
echo
"test";
?>

For test1, the response has "Content-Encoding: gzip".
For test2 there is no gzip encoding, although the Content-Type header is perfectly valid.
For test3, there is gzip encoding, although the header is exactly the same as in test2.

Any explanation for this? As a background, i have setup some caching logic for a wordpress blog, which reads cached response headers and response body from a redis hash. Wordpress emits
Content-Type: text-html; charset=utf-8
, which gets cached and on cache hit, this exact content type will be set via ngx.header, which results in the above bug.

Yichun Zhang (agentzh)

unread,
Aug 11, 2015, 1:21:54 AM8/11/15
to openresty-en
Hello!

On Tue, Aug 11, 2015 at 8:31 AM, bjoe2k4 wrote:
> location = test2 {
> content_by_lua '
> ngx.header["Content-Type"] = "text/html; charset=utf-8"
> ngx.say("test")
> ';
> }
>
> For test2 there is no gzip encoding, although the Content-Type header is
> perfectly valid.

Thanks for the bug report and test case! Fixed in git master of
lua-nginx-module:

https://github.com/openresty/lua-nginx-module/commit/539989

Best regards,
-agentzh

bjoe2k4

unread,
Aug 11, 2015, 8:02:32 AM8/11/15
to openresty-en
Thanks! I have tested it and it is working well.

Nelson, Erik - 2

unread,
Aug 11, 2015, 10:54:00 AM8/11/15
to openre...@googlegroups.com
I have a file that I need to process and send as the body of the response, something along the lines of this as a simple implementation:

Location /process_file {

content_by_lua "
for line in io.lines('file.txt') do
ngx.say(line:gsub('|', ','))
end
";

}

There are a few ways this seems to be not good-
1. File I/O is frowned upon because of blocking
2. ngx.say is an expensive operation. http://wiki.nginx.org/HttpLuaModule#ngx.print suggests buffering the data to reduce calls

Buffering is easy enough, but what's the best way to read the file off of the disk? Is it better to use FFI with syscalls? Or use the transparent non-blocking I/O via subrequests as in the HttpLuaModule synopsis? Is there a way to process the subrequest as it is read, so the entire file doesn't need to be buffered?

How would an OpenResty expert do this?

Thanks

Erik


----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.

Yichun Zhang (agentzh)

unread,
Aug 11, 2015, 11:55:11 PM8/11/15
to openresty-en
Hello!

On Tue, Aug 11, 2015 at 10:53 PM, Nelson, Erik - 2 wrote:
> Buffering is easy enough, but what's the best way to read the file off of the disk? Is it better to use FFI with syscalls?

It depends. Most of the time Lua's standard io module should be sufficient.

> Or use the transparent non-blocking I/O via subrequests as in the HttpLuaModule synopsis?

There's no such thing as nonblocking file I/O. Normally nginx uses
plain blocking I/O if you don't configure AIO. But the subrequest API
in ngx_lua always buffers the whole file in memory, which makes
streaming processing impossible. It depends on your own use cases so
better do your own benchmark.

Maybe in the near future we can expose the thread pool API in the new
nginx core to do file I/O on the Lua land without blocking the nginx
main OS thread. Still this way has its own overhead due to OS threads.
Unlike network I/O, there is no C10K friendly way here.

> Is there a way to process the subrequest as it is read, so the entire file doesn't need to be buffered?
>

No.

> How would an OpenResty expert do this?
>

OpenResty experts avoid file I/O whenever they can :)

Regards,
-agentzh

Nelson, Erik - 2

unread,
Oct 15, 2015, 2:22:35 PM10/15/15
to openre...@googlegroups.com
Yichun Zhang (agentzh) wrote on Tuesday, August 11, 2015 11:55 PM
> Subject: Re: [openresty-en] Best practice for reading and processing a file
>
> On Tue, Aug 11, 2015 at 10:53 PM, Nelson, Erik - 2 wrote:
> > Buffering is easy enough, but what's the best way to read the file
> off of the disk? Is it better to use FFI with syscalls?
>
> It depends. Most of the time Lua's standard io module should be
> sufficient.
>
>> How would an OpenResty expert do this?
>
> OpenResty experts avoid file I/O whenever they can :)
>

In a tight file-reading loop like

for line in io.lines('file.txt') do
ngx.say(line:gsub('|', ','))
end

will other light threads be starved? Is it a good idea to periodically explicitly yield?

Yichun Zhang (agentzh)

unread,
Oct 16, 2015, 12:15:05 AM10/16/15
to openresty-en
Hello!

On Fri, Oct 16, 2015 at 2:22 AM, Nelson, Erik - 2 wrote:
> In a tight file-reading loop like
>
> for line in io.lines('file.txt') do
> ngx.say(line:gsub('|', ','))
> end
>
> will other light threads be starved?

Yes, because ngx.say() is an asynchornous call which returns
immediately without waiting for the data to actually get flushed into
the system socket send buffers (which may also accumulate a lot of
buffered data in memory in case of slow downstream connections). So be
very careful when the file has many many lines.

In addition, ngx.say() invokes the nginx output filter chain, which is
kinda expensive. So if you have a lot of short lines (shorter than 4KB
or alike is considered "short"), then you'd better buffer the data
yourself (in a Lua table or something) before feeding it to ngx.say(),
which can save quite some CPU time.

> Is it a good idea to periodically explicitly yield?
>

Yes, but do not yield too often. The synchronous (but still doing
nonblocking I/O) "ngx.flush(true)" call is your friend (which can be
used right after the ngx.say call). See

https://github.com/openresty/lua-nginx-module#ngxflush

Regards,
-agentzh

Nelson, Erik - 2

unread,
Oct 16, 2015, 10:40:02 AM10/16/15
to openre...@googlegroups.com
Yichun Zhang (agentzh) wrote on Friday, October 16, 2015 12:15 AM
> On Fri, Oct 16, 2015 at 2:22 AM, Nelson, Erik - 2 wrote:
> > In a tight file-reading loop like
> >
> > for line in io.lines('file.txt') do
> > ngx.say(line:gsub('|', ','))
> > end
> >
> > will other light threads be starved?
>
> Yes, because ngx.say() is an asynchornous call which returns
> immediately without waiting for the data to actually get flushed into
> the system socket send buffers (which may also accumulate a lot of
> buffered data in memory in case of slow downstream connections). So be
> very careful when the file has many many lines.
>
> In addition, ngx.say() invokes the nginx output filter chain, which is
> kinda expensive. So if you have a lot of short lines (shorter than 4KB
> or alike is considered "short"), then you'd better buffer the data
> yourself (in a Lua table or something) before feeding it to ngx.say(),
> which can save quite some CPU time.

If I have lots of short lines, how would you recommend appending newlines to each a newline to each line? Ngx.say() only adds '\n' after the last element in the table.

I'm currently using

for line in io.lines('file.txt') do
table.insert(buf, line)
table.insert(buf, '\n')
end

which works but maybe there's a better way

>
> > Is it a good idea to periodically explicitly yield?
> >
>
> Yes, but do not yield too often. The synchronous (but still doing
> nonblocking I/O) "ngx.flush(true)" call is your friend (which can be
> used right after the ngx.say call). See
>
> https://github.com/openresty/lua-nginx-module#ngxflush
>
That works well, thanks.

Aapo Talvensaari

unread,
Oct 17, 2015, 4:01:44 AM10/17/15
to openresty-en, erik.l...@bankofamerica.com
On Friday, 16 October 2015 17:40:02 UTC+3, Nelson, Erik - 2 wrote:
Yichun Zhang (agentzh) wrote on Friday, October 16, 2015 12:15 AM
If I have lots of short lines, how would you recommend appending newlines to each a newline to each line?  Ngx.say() only adds '\n' after the last element in the table.

I'm currently using

for line in io.lines('file.txt') do
   table.insert(buf, line)
   table.insert(buf, '\n')
end

Basically this, but with small changes:

I usually append to table with this:
buf[#buf+1] = line

And in case you know the index you don't need to even use #buf (using this #-length operator is just convenient in some more complex cases).

And you don't need to append '\n' to the table, because you can use:
table.concat(table, "\n")

If you know the number of entries, you can preallocate the table with table.new(narray, nhash), and you can reuse the table with table.clear(tab). There is more about those here:

There has also been talks about adding string buffer or builder support as well, but that hasn't materialized yet.

Aapo Talvensaari

unread,
Oct 17, 2015, 5:49:28 AM10/17/15
to openresty-en, erik.l...@bankofamerica.com
On Saturday, 17 October 2015 11:01:44 UTC+3, Aapo Talvensaari wrote:
On Friday, 16 October 2015 17:40:02 UTC+3, Nelson, Erik - 2 wrote:
If I have lots of short lines, how would you recommend appending newlines to each a newline to each line?  Ngx.say() only adds '\n' after the last element in the table.

I'm currently using

for line in io.lines('file.txt') do
   table.insert(buf, line)
   table.insert(buf, '\n')
end

Basically this, but with small changes:

I usually append to table with this:
buf[#buf+1] = line

And in case you know the index you don't need to even use #buf (using this #-length operator is just convenient in some more complex cases).

Some microbenchmarks here:

I know people who still prefer table.insert, though. 

Yichun Zhang (agentzh)

unread,
Oct 17, 2015, 8:00:45 AM10/17/15
to openresty-en
Hello!

On Sat, Oct 17, 2015 at 4:01 PM, Aapo Talvensaari wrote:
> Basically this, but with small changes:
>
> I usually append to table with this:
> buf[#buf+1] = line
>
> And in case you know the index you don't need to even use #buf (using this
> #-length operator is just convenient in some more complex cases).
>

Well, I'm against both for the sake of performance. Both
table.insert(tb, elem) and tb[#tb + 1] are an O(n) operation since
both of them need to calculate the size of the Lua (array) table,
which is not readily available according to the current table
implementation in the standard Lua 5.1 interpreter or LuaJIT 2. So if
you use such things in a simple loop, it can easily become O(n^2) over
all. I used to spot such things in real-world production flame graphs
in the form of hot lj_tab_len() frames. LuaJIT 2 uses lj_tab_len to
calculate the size of Lua array table, which is called from within
both table.insert (without an explicit index argument) and #tb. (It
might be worth mentioning that #str does not have this issue since Lua
strings store their lengths explicitly in memory, unlike
null-terminated C strings).

The recommended way is to use a local Lua variable to track the
current size of the Lua (array) table in the loop yourself, like

local sz = 0
local sz = {}
for _, s in ipairs(...) do
if mytest(s) then
sz = sz + 1
tb[sz] = i
end
end

Or better, pre-allocate the lua table if you can (roughly) predict the
size of the resulting table to avoid table auto growth (which is also
an O(n) operation with potential more expensive memory dynamic
allocations). That is, to replace the line

local tb = {}

with

local tb = tb_new(m, 0)

where tb_new is initialized in the top level scope like this:

local tb_new = require "table.new" -- this requires LuaJIT v2.1

> And you don't need to append '\n' to the table, because you can use:
> table.concat(table, "\n")
>

Well, I must say that it's (usually) more efficient to feed a Lua
(array) table directly into calls like ngx.say() since it can collect
the string pieces in the Lua table directly on the C land without
creating an intermediate (long) Lua string object that is almost
immediately discarded. So manually inserting the "\n" string to the
table in the loop and finally inserting that table directly to
ngx.say() can be more efficient (unless the result of this
table.concat call never changes, which is case when loading a single
invariant request with tools like ab or weighttp). So be careful while
benchmarking this, since Lua's string intern'ing might be helping you
too much to be realistic. Well, just a warning.

> If you know the number of entries, you can preallocate the table with
> table.new(narray, nhash), and you can reuse the table with table.clear(tab).
> There is more about those here:
> http://repo.or.cz/w/luajit-2.0.git/blob_plain/v2.1:/doc/extensions.html
>

Oh, yeah, I've mentioned the table.new() trick above as well. But be
careful with table reuse via table.clear() across multiple requests.
Since you may have race conditions with concurrent requests if you are
not careful enough. Just a caveat. I meant to opensource a
lua-resty-table-pool library I wrote a while back which can manage the
table recycling for you.

> There has also been talks about adding string buffer or builder support as
> well, but that hasn't materialized yet.
>

Indeed. I've been waiting for this as well.

Regards,
-agentzh

Yichun Zhang (agentzh)

unread,
Oct 17, 2015, 8:16:11 AM10/17/15
to openresty-en
Hello!

On Fri, Oct 16, 2015 at 10:39 PM, Nelson, Erik - 2 wrote:
> I'm currently using
>
> for line in io.lines('file.txt') do
> table.insert(buf, line)
> table.insert(buf, '\n')
> end
>
> which works but maybe there's a better way
>

I'd comment on more performance issues in the code snippet above.

Oh, despite the O(n) implication of table.insert() mentioned in my
previous email, there's other issues here.

1. I think you can get better performance to read the file in fixed
size chunks, like 4KB, 8KB, or even larger, instead of in lines, and
then replacing every occurrence of \n with \n\n. This way you can save
a lot of short-lived small Lua strings, reducing the impact to the
Lua/LuaJIT GC. You can save your "buf" table as well.

2. you'd better call ngx.say() or ngx.print() periodically in the loop
(better once for a large enough chunk size, say, 4KB, see 1)) and
calls ngx.flush(true) immediately afterwards accordingly. Otherwise
you're effectively buffering the full file content in the Lua space,
which can be very expensive for large files, again imposing big impact
on the Lua/LuaJIT GC.

Oh yeah, writing performant code is not trivial. But the difference
can be HUGE for really busy sites.

Well, don't hesitate to benchmark and profile your code. But be
careful when designing the test case and interpreting the results.
It's just too easy to do benchmarks (terribly) wrong.

Best regards,
-agentzh

Nelson, Erik - 2

unread,
Oct 20, 2015, 4:39:06 PM10/20/15
to openre...@googlegroups.com
Yichun Zhang (agentzh) wrote on Saturday, October 17, 2015 8:01 AM

>On Sat, Oct 17, 2015 at 4:01 PM, Aapo Talvensaari wrote:
>> Basically this, but with small changes:
>>
>> I usually append to table with this:
>> buf[#buf+1] = line
>>
>> And in case you know the index you don't need to even use #buf (using
>> this #-length operator is just convenient in some more complex cases).

>Well, I'm against both for the sake of performance. Both table.insert(tb, elem) and tb[#tb + 1] are an O(n) operation since both of them need to >calculate the size of the Lua (array) table, which is not readily available according to the current table implementation in the standard Lua 5.1 >interpreter or LuaJIT 2. So if you use such things in a simple loop, it can easily become O(n^2) over all. I used to spot such things in real->world production flame graphs in the form of hot lj_tab_len() frames. LuaJIT 2 uses lj_tab_len to calculate the size of Lua array table, which is >called from within both table.insert (without an explicit index argument) and #tb. (It might be worth mentioning that #str does not have this >issue since Lua strings store their lengths explicitly in memory, unlike null-terminated C strings).
>
>The recommended way is to use a local Lua variable to track the current size of the Lua (array) table in the loop yourself, like
>
> local sz = 0
> local sz = {}
> for _, s in ipairs(...) do
> if mytest(s) then
> sz = sz + 1
> tb[sz] = i
> end
> end
>

Very helpful, thanks

>Or better, pre-allocate the lua table if you can (roughly) predict the size of the resulting table to avoid table auto growth (which is also an >O(n) operation with potential more expensive memory dynamic allocations). That is, to replace the line
>
> local tb = {}
>
>with
>
> local tb = tb_new(m, 0)
>
>where tb_new is initialized in the top level scope like this:
>
> local tb_new = require "table.new" -- this requires LuaJIT v2.1
>

That's a great tip, thanks

>> And you don't need to append '\n' to the table, because you can use:
>> table.concat(table, "\n")
>

>Well, I must say that it's (usually) more efficient to feed a Lua
> (array) table directly into calls like ngx.say() since it can collect the string pieces in the Lua table directly on the C land without creating >an intermediate (long) Lua string object that is almost immediately discarded. So manually inserting the "\n" string to the table in the loop and >finally inserting that table directly to
>ngx.say() can be more efficient (unless the result of this table.concat call never changes, which is case when loading a single invariant request >with tools like ab or weighttp). So be careful while benchmarking this, since Lua's string intern'ing might be helping you too much to be >realistic. Well, just a warning.

I suppose if you're using a fixed-size array, you could assign the '\n' to the odd positions just one time.

>> If you know the number of entries, you can preallocate the table with
>> table.new(narray, nhash), and you can reuse the table with table.clear(tab).

>Oh, yeah, I've mentioned the table.new() trick above as well. But be careful with table reuse via table.clear() across multiple requests.
>Since you may have race conditions with concurrent requests if you are not careful enough. Just a caveat. I meant to opensource a lua-resty->table-pool library I wrote a while back which can manage the table recycling for you.

With this caveat are you essentially warning against sharing the table between light threads? Or is there some non-obvious problem with using a local table?

Nelson, Erik - 2

unread,
Oct 20, 2015, 4:52:48 PM10/20/15
to openre...@googlegroups.com
Yichun Zhang (agentzh) wrote on Saturday, October 17, 2015 8:01 AM

I've used lua_bufflib for some other projects and it seemed okay. Is there some caveat to using it with openresty?

Yichun Zhang (agentzh)

unread,
Oct 20, 2015, 11:19:39 PM10/20/15
to openresty-en
Hello!

On Wed, Oct 21, 2015 at 4:39 AM, Nelson, Erik - 2 wrote:
>>Oh, yeah, I've mentioned the table.new() trick above as well. But be careful with table reuse via table.clear() across multiple requests.
>>Since you may have race conditions with concurrent requests if you are not careful enough. Just a caveat. I meant to opensource a lua-resty->table-pool library I wrote a while back which can manage the table recycling for you.
>
> With this caveat are you essentially warning against sharing the table between light threads? Or is there some non-obvious problem with using a local table?
>

Well, you can safely share read-only Lua tables across light threads
but when you share tables that multiple light threads or requests can
modify at the same time, you really need to be careful. That's why I
created the lua-resty-table-pool library (to be opensourced soon) to
abstract potential issues away.

> I've used lua_bufflib for some other projects and it seemed okay. Is there some caveat to using it with openresty?

I have no experience with lua_bufflib myself. But it seems that it
relies on the standard Lua C API exclusively, which means that any of
your Lua code paths using this library cannot be JIT compiled at all
(unless it uses FFI for C land cooperations).

Regards,
-agentzh
Reply all
Reply to author
Forward
0 new messages