How to read complete response with nginx lua?

11,315 views
Skip to first unread message

Bhargav

unread,
Oct 22, 2013, 8:50:53 AM10/22/13
to openre...@googlegroups.com
Hi,

I want to make few changes in content before delivering it to client browser and for that I tried to modify content with body_filter_by_lua*. When I use regex in string.gsub() to match the pattern and replace it, sometimes it works and sometimes it doesn't work.

I believe the cause of the problem is *chunk*ed data. So is there any proper way to handle this case so that we can match the pattern with complete response body and do needful changes before final delivery to browser?

Thanks,
Bhargav

Yichun Zhang (agentzh)

unread,
Oct 23, 2013, 12:53:08 AM10/23/13
to openresty-en
Hello!

On Tue, Oct 22, 2013 at 5:50 AM, Bhargav wrote:
> I want to make few changes in content before delivering it to client browser
> and for that I tried to modify content with body_filter_by_lua*. When I use
> regex in string.gsub() to match the pattern and replace it, sometimes it
> works and sometimes it doesn't work.
>

The Nginx output body filters always work on data chunks rather than
the complete response body data. That's all the point of streaming
processing and it's crucial for the constant memory usage regardless
of the response body size. The same thing applies to
body_filter_by_lua*, as stated in its official documentation:

https://github.com/chaoslawful/lua-nginx-module#body_filter_by_lua

> I believe the cause of the problem is *chunk*ed data. So is there any proper
> way to handle this case so that we can match the pattern with complete
> response body and do needful changes before final delivery to browser?
>

Generally, you should not buffer all the response data because it has
great impact on memory footprint when the response body is huge. So
the right way is to use streaming regex replacement here. The
ngx_replace_filter module is such an (preliminary) attempt:

https://github.com/agentzh/replace-filter-nginx-module#readme

though my sregex regex engine this module is using is still very young
and lacks a lot of important optimizations :)

If you insist in fully-buffered processing in body_filter_by_lua*,
then you can just buffer all the data chunks yourself there. Let's
consider the following self-contained example that does this:

location = /t {
echo -n he;
echo -n llo;
echo -n ' ';
echo -n 'worl';
echo d;

header_filter_by_lua '
ngx.header.content_length = nil
';
body_filter_by_lua '
-- ngx.arg[1] = string.gsub(ngx.arg[1], "hello world", "HIT!")
-- do return end

local chunk, eof = ngx.arg[1], ngx.arg[2]
local buffered = ngx.ctx.buffered
if not buffered then
buffered = {} -- XXX we can use table.new here
ngx.ctx.buffered = buffered
end
if chunk ~= "" then
buffered[#buffered + 1] = chunk
ngx.arg[1] = nil
end
if eof then
local whole = table.concat(buffered)
ngx.ctx.buffered = nil
whole = string.gsub(whole, "hello world", "HIT!")
ngx.arg[1] = whole
end
';
}

where we use the "echo" directive from the ngx_echo module to emit
response body data chunks. The commented out Lua code should be what
you're doing and you can see that there is no match if you just do the
regex substitutions on each individual data chunk. Instead, we collect
all the data chunks into our own buffer (as a Lua table in the ngx.ctx
table) and do the regex substitution in a single run when we see the
last chunk in the response body stream. When accessing this /t
interface, we get

$ curl localhost:8080/t
HIT!

Hope these help :)

Best regards,
-agentzh

bha...@aum.bz

unread,
Oct 23, 2013, 11:39:41 PM10/23/13
to openre...@googlegroups.com
Hi,

Thanks for your quick reply and solution for this.

I looked "replace-filter-nginx-module" README, according to that when Content-Encoding response header is not empty (like gzip), the response body will always remain intact. Isn't it possible to gunzip the upstream response before using "replace filter"?

If we can use this directive inside the body_filter_by_lua* of nginx-lua, it can be possible to gunzip response body.

Best Regards,
Bhargav

Yichun Zhang (agentzh)

unread,
Oct 25, 2013, 6:57:07 PM10/25/13
to openresty-en
Hello!

On Wed, Oct 23, 2013 at 8:39 PM, bhargav wrote:
>
> I looked "replace-filter-nginx-module" README, according to that when
> Content-Encoding response header is not empty (like gzip), the response body
> will always remain intact. Isn't it possible to gunzip the upstream response
> before using "replace filter"?
>

If the backend bandwidth is not a problem, then the simplest solution
to this is to disable the gzip compression on your backend server and
only compress the response in your nginx. To disable gzip compression
in your backend server, just put the following line in your location
with ngx_proxy configured:

proxy_set_header Accept-Encoding "";

> If we can use this directive inside the body_filter_by_lua* of nginx-lua, it
> can be possible to gunzip response body.
>

Well, you can just configure ngx_replace_filter's output filters to
run *after* ngx_lua's filters. To do this, you need to put
--add-module=/path/to/replace-filter-nginx-module *before*
--add-module=/path/to/lua-nginx-module when building your nginx with
the ./configure command line.

To uncompress the compressed response body with body_filter_by_lua,
you can check out the sample code below:

https://groups.google.com/d/msg/openresty-en/yVNvA6uhjyA/9QlHzPs2zE8J

Alternatively, you may consider hacking the standard ngx_gunzip module
to do the gzip inflation for your:

http://nginx.org/en/docs/http/ngx_http_gunzip_module.html

Regards,
-agentzh

bha...@aum.bz

unread,
Nov 29, 2013, 1:18:51 PM11/29/13
to openre...@googlegroups.com
Hello,

As you suggested I added replace-filter-nginx-module *before* lua-nginx-module and it runs *after* lua-nginx-module too.

Now I came to the situation where I need to run replace-filter-nginx-mdule *before* lua-nginx-module for one location and for another locations need to execute replace-filter-nginx-module *after* lua-nginx-module.

For example,

location = /xyz {   
    content_by_lua
'
       
res = ngx.location.capture("/other");
    ngx.print(res.body)';
}

location /other {
     body_filter_by_lua_file 'file.lua';
     replace_filter 'PATTERN' '' g;
}

With above configuration, call to /xyz doesn't execute replace_filter module for sub_request but if I call /other it executes replace_filter rules as intended.

Best regards,
Bhargav

Yichun Zhang (agentzh)

unread,
Nov 29, 2013, 4:34:29 PM11/29/13
to openresty-en
Hello!

On Fri, Nov 29, 2013 at 10:18 AM, <bha...@aum.bz> wrote:
> location = /xyz {
> content_by_lua '
> res = ngx.location.capture("/other");
> ngx.print(res.body)';
> }
>
> location /other {
> body_filter_by_lua_file 'file.lua';
> replace_filter 'PATTERN' '' g;
> }
>

Don't use replace_filter in the subrequest because there will be no
real gain here because ngx.location.capture will buffer the whole
request body anyway.

You can do the regex substitutions upon the subrequest's response body
by using the ngx.re.gsub or string.gsub API functions in your Lua code
directly:

https://github.com/chaoslawful/lua-nginx-module#ngxregsub

http://www.lua.org/manual/5.1/manual.html#pdf-string.gsub

Best regards,
-agentzh

bha...@aum.bz

unread,
Nov 30, 2013, 2:55:06 AM11/30/13
to openre...@googlegroups.com
Hello,

Thanks for quick reply.

I am using ngx.location.capture inside "for loop" to concatenate output of two URIs.

for k,v in pairs(val) do
    res = ngx.location.capture(v)
    if (res.status == 200) then
        ngx.print(res.body.."\n")
    end
end

As you told ngx.location.capture buffers complete response then it would be more memory consuming I think. So can you suggest some better option for this case? I would like to process chunks with replace_filter (without buffering complete response).

Also can it be possible to run replace_filter wherever it is needed either before or after lua-nginx-module ? Because if I run replace_filter after "for loop" in above example, it will be applied to whole response body instead I need to execute it for some specific subrequest (location specific) when concatenating output of two URIs.

In below example, call to /xyz will call both locations /other1 and /other2. In /other1 replace_filter is called *after* lua-nginx-module. And call to /xyz requires to execute replace_filter (in /other1 location) *before* lua-nginx-module (in /xyz location).

location = /xyz {
    content_by_lua '
    .....
    for k,v in pairs(val) do
        res = ngx.location.capture(v)
        if (res.status == 200) then
            ngx.print(res.body.."\n")
        end
    end
    .....
    '
}

location /other1 {

     body_filter_by_lua_file 'file.lua';
     replace_filter 'PATTERN' '' g;
}

location /other2 {
     body_filter_by_lua_file 'file.lua';
}

Regards,
Bhargav

Yichun Zhang (agentzh)

unread,
Nov 30, 2013, 1:02:41 PM11/30/13
to openresty-en
Hello!

On Fri, Nov 29, 2013 at 11:55 PM, bhargav wrote:
> As you told ngx.location.capture buffers complete response then it would be
> more memory consuming I think. So can you suggest some better option for
> this case?

Just avoid using ngx.location.capture for big responses. What to use
instead depends on what nginx module you configure in the location
accessed by your subrequest. For example, when you use ngx_proxy
there, you can use ngx_lua's cosocket API to access the backend http
service directly, in a streaming fashion. And when you use ngx_static
module to serve local files, then you can read the data chunks from
files in Lua directly.

> I would like to process chunks with replace_filter (without
> buffering complete response).
>

Because you're already using Lua to generate the final response, I
think a better option here is to embed the sregex C library directlty
into Lua and wrap a Lua library around it that can do the task in
ngx_replace_filter, but directly on the Lua land :)

> Also can it be possible to run replace_filter wherever it is needed either
> before or after lua-nginx-module ?

You can not change the running order of an nginx output filter
on-the-fly. The order is fixed at compile time. This limitation is in
the Nginx core.

And I don't think it's a good idea to mix body_filter_by_lua and
replace_filter. As I've mentioned above, we'd better have a Lua-land
library that does what replace_filter does such that a single
body_filter_by_lua is sufficient (if you have to use
body_filter_by_lua anyway).

Regards,
-agentzh

bha...@aum.bz

unread,
Nov 30, 2013, 2:18:53 PM11/30/13
to openre...@googlegroups.com
Hello,

I am new with lua and replace filter modules of Nginx so I have just added those modules in Nginx build according to README.

Can you guide me with some steps how can I embed sregex C library directly into lua and wrap the lua library around it? And how to use it inside body_filter_by_lua*, content_by_lua*  ?

Thank you very much for all of your suggestions and help for my all queries.

Regards,
Bhargav

Yichun Zhang (agentzh)

unread,
Nov 30, 2013, 3:35:34 PM11/30/13
to openresty-en
Hello!

On Sat, Nov 30, 2013 at 11:18 AM, <bha...@aum.bz> wrote:
>
> Can you guide me with some steps how can I embed sregex C library directly
> into lua and wrap the lua library around it? And how to use it inside
> body_filter_by_lua*, content_by_lua* ?
>

This is not trivial. You need to write a new Lua library, by using
LuaJIT FFI to call the sregex C API directly from within Lua. You can
check out how to use sregex in ngx_replace_filter module's C source :)

This library has been on my TODO list. But as you can tell, I haven't
had the time to do it yet :) So you're welcome to contribute your own
implementation :)

Regards,
-agentzh

ish...@umbc.edu

unread,
Aug 31, 2014, 1:46:23 PM8/31/14
to openre...@googlegroups.com
Sorry to bump an old thread.

I understand that body_filter_by_lua will read data in chunks. Let's say there's a very simple script that injects a <script> tag at the bottom of all <html> tags.

ngx.arg[1] = ngx.re.sub(ngx.arg[1], "</html>", "<script>...</script></html>")

Are there circumstances where chunking could split the ending </html> tag, resulting in this not running sometimes? For example, if one chunk ends with "</ht" and the next begins with "ml>".

I don't know a lot about the inner details of when and where chunking occurs, but I'm just assuming for a particularly large response body this could happen in some circumstances. I'm assuming that this would be rare, but is there any way of accounting for something like this or preventing it from happening? Preferably without buffering every response body or doing handmade buffering in Lua, since buffering until you see </html> would likely mean buffering the entire body.

I could also maybe check if "</html>" is in the chunk and do the replace if so, set a ngx.ctx variable, don't check again if the variable is set, and then if EOF is seen and the variable isn't set, just throw the <script> tag at the very end of the response body (which most browsers should have no issue parsing). But that seems a bit ugly to me. Other option is to just blindly throw the tag at the end of every document, but again that's ugly and results in technically broken markup.

Yichun Zhang (agentzh)

unread,
Aug 31, 2014, 2:47:41 PM8/31/14
to openresty-en
Hello!

On Sun, Aug 31, 2014 at 10:46 AM, isheff1 wrote:
> I understand that body_filter_by_lua will read data in chunks. Let's say
> there's a very simple script that injects a <script> tag at the bottom of
> all <html> tags.
>
> ngx.arg[1] = ngx.re.sub(ngx.arg[1], "</html>",
> "<script>...</script></html>")
>
> Are there circumstances where chunking could split the ending </html> tag,
> resulting in this not running sometimes?

Yes, sure. The chunk boundary may happen anywhere.

> For example, if one chunk ends with
> "</ht" and the next begins with "ml>".
>

This can surely happen in an output filter.

Streaming parsing and substitution is hard. That's why I created the
sregex engine and the ngx_replace_filter module:

https://github.com/openresty/sregex

https://github.com/openresty/replace-filter-nginx-module

It's always interesting to expose the sregex engine's C API to the Lua
land via FFI, for example such that we can use it directly in the
body_filter_by_lua* directives.

Also, there're still a lot of optimizations that can be done in the
sregex engine itself to make it faster. Contributions are always
welcome :)

Other similar attempts are based on ragel:

http://www.complang.org/ragel/

Regards,
-agentzh

Ian Shefferman

unread,
Aug 31, 2014, 4:00:13 PM8/31/14
to openre...@googlegroups.com

That makes sense, thanks. Unfortunately I don't know enough C to optimize something like a regex engine. The FFI module might be within my experience level, but I think I'll just stick to setting a variable from Lua and using that as the replace arg.
Is sregex and/or replace_filter currently optimized for replacing literal strings with other literal strings? That's my only use case at the moment; I don't need to fire up a regex engine at all. If there is not yet support for that, I could potentially look into adding support for that case.

--
You received this message because you are subscribed to a topic in the Google Groups "openresty-en" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openresty-en/q-dcQNxpwTA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yichun Zhang (agentzh)

unread,
Aug 31, 2014, 7:45:24 PM8/31/14
to openresty-en
Hello!

On Sun, Aug 31, 2014 at 1:00 PM, Ian Shefferman wrote:
> Is sregex and/or replace_filter currently optimized for replacing literal
> strings with other literal strings? That's my only use case at the moment; I
> don't need to fire up a regex engine at all. If there is not yet support for
> that, I could potentially look into adding support for that case.
>

See http://nginx.org/en/docs/http/ngx_http_sub_module.html

Regards,
-agentzh

focus zheng

unread,
May 18, 2020, 11:22:12 AM5/18/20
to openresty-en
I did not replace the string with ngx.arg[1].as it is not a normal string.it is binary or encoded string?
Reply all
Reply to author
Forward
0 new messages