go-cdn-buster - a dumb fast caching HTTP proxy written in Go

4,414 views
Skip to first unread message

Aliaksandr Valialkin

unread,
Jun 14, 2013, 2:32:36 PM6/14/13
to golan...@googlegroups.com
Hi there!

I wrote simple yet powerful and fast caching HTTP proxy in Go. It consists of 236 lines of code - go-cdn-booster - and it is based on YBC caching library. This proxy can be used instead of Nginx or Varnish in front of large and/or slow static file servers. Also it can be easily set up as a geographically distributed poor-man's CDN for small and medium-sized static files such as images, documents, css, js.

While it has many limitations (interested reader can find them in the README file), it has the following features missing in Nginx and/or Varnish:
  * There is no need in running 'cleaners', 'defragmenters', 'watchdogs' or other third-party tools.
  * There is no need in learning of yet another configuration syntax and writing complex configuration files.
  * Cached items survive proxy restart.
  * It doesn't abuse filesystem with gazillions of files.
  * Performance can linearly scale on I/O-starved workloads (i.e. when working set size exceeds available RAM) by sharding cache files onto multiple physical storage devices.
  * Performance doesn't depend on the number of cached items.
  * The total cache files' size never exceeds limit passed to -cacheSize command line argument.
  * The code is easy to read, hack and extend (thanks, Go!).

Give it a try now! :)

Jacek Furmankiewicz

unread,
Jun 14, 2013, 4:52:45 PM6/14/13
to golan...@googlegroups.com
Very cool.

It could really benefit from POST support. More and more we have complex queries via POST which contains a JSON body with complex query parameters (often nested).

Ernest Micklei

unread,
Jun 14, 2013, 5:09:29 PM6/14/13
to golan...@googlegroups.com
+1 voor Post and use some header value as key. (this we cannot not tell Varnish todo at present).

Aliaksandr Valialkin

unread,
Jun 16, 2013, 4:31:13 AM6/16/13
to Ernest Micklei, golan...@googlegroups.com
Unfortunately I have no near-term plans on POST support, because it will complicate the code and it is really hard to satisfy all possible use cases with POST caching. The main problem is determining what to use as a key for cache. With GET this is really simple - use Request URI - i.e. the string passed in the first line of HTTP GET request:
GET /RequestURI HTTP/1.x
Since currently go-cdn-booster is a single-host proxy, there is no need in adding Host header value into the cache key.

Using RequestURI as a cache key for POST requests won't satisfy the majority of use cases, since usually different POST requests have identical RequestURI and differ by request's body. Using (RequestURI + POST body) as a cache key looks better, but it will fail if some HTTP headers must be included in the key. Moreover, the majority of HTTP proxy cache users don't need and don't expect POST caching.

But you can easily hack go-cdn-booster for your particular use case thanks to Go's simplicity and expressiveness. I believe the result will be better from clarity, performance and maintainability PoV comparing to the solution with complex configuration files provided by Varnish and/or Nginx.

As a bonus here are go-cdn-booster performance numbers (using 'ab' and 'go-cdn-booster-bench' tools) for 100K 'cache hit' requests of http://www.google.com/ page (11Kb):

Keepalive on, 10 workers:
ab: 27469 qps
go-cdn-booster-bench: 34912 qps

Keepalive on, 100 workers:
ab: 24722 qps
go-cdn-booster-bench: 38118 qps

Keepalive on, 1000 workers:
ab: 22525 qps
go-cdn-booster-bench: 35945 qps

Keepalive on, 10000 workers:
ab: 18828 qps
go-cdn-booster-bench: 25055 qps

Keepalive off (only ab numbers, since go-cdn-booster-bench doesn't support keepalive off):

10 workers: 9461 qps
100 workers: 8507 qps
1000 workers: 7404 qps
10000 workers: 5235 qps

These numbers lead to the following conclusions:
- go-cdn-booster easily handles 10K concurrent connections without significant performance degradation (thanks to excellent Go's http Server implementation).
- Issuing multiple requests over a single keepalive connection is 3-5x faster comparing to 'new connection per request' strategy. This is good news, because all modern browsers actively exploit keepalive connections.
- go-cdn-booster-bench has better performance comparing to ab tool. Probably, because it gathers way less stats and has dead simple and short code - currently 127 lines :)

Let's kick Nginx's ass! Test your Nginx, Varnish or any other caching http proxy with go-cdn-booster-bench tool and post performance comparison with go-cdn-booster here :)

P.S. Don't forget building go-cdn-booster and go-cdn-booster-bench with Go1.1 - this will give you 30-40% performance boost comparing to Go1.

On Sat, Jun 15, 2013 at 12:09 AM, Ernest Micklei <ernest....@gmail.com> wrote:
+1 voor Post and use some header value as key. (this we cannot not tell Varnish todo at present).

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/RBORD6qbPhw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Best Regards,

Aliaksandr

Vasiliy Tolstov

unread,
Jun 16, 2013, 4:21:31 PM6/16/13
to Aliaksandr Valialkin, golan...@googlegroups.com
Hello. Thanks for ybc. But i have one question what is the main
difference compared to eblob (http://www.ioremap.net/projects/eblob) ?
And as i see cdn-booster may be ddos-ed by getting very big object
from upsteam because of using io.ReadAll from res.Body

2013/6/14 Aliaksandr Valialkin <val...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Vasiliy Tolstov,
e-mail: v.to...@selfip.ru
jabber: va...@selfip.ru

Aliaksandr Valialkin

unread,
Jun 17, 2013, 1:36:56 PM6/17/13
to Vasiliy Tolstov, golan...@googlegroups.com
On Sun, Jun 16, 2013 at 11:21 PM, Vasiliy Tolstov <v.to...@selfip.ru> wrote:
Hello. Thanks for ybc. But i have one question what is the main
difference compared to eblob (http://www.ioremap.net/projects/eblob) ?

I didn't know about eblob, though I read ioremap blog three years ago, when Evgeniy Polyakov's POHMELFS appeared in linux world (primarily on lwn.net). From the description it looks like ybc and eblob have similar designs. I should look into its' source code more closely.
 
And as i see cdn-booster may be ddos-ed by getting very big object
from upsteam because of using io.ReadAll from res.Body

Yes. I know. This is for the sake of simplicity. Cdn-booster is intended for small and medium-sized objects :) Usually caching proxy servers have high-throughput channels to upstream servers, so io.ReadAll() call should be performed very quickly, and byte slice returned from io.ReadAll() lives only for a few milliseconds while copying its' content to ybc storage. So DDoS on cdn-booster is non-trivial task :)

--
Best Regards,

Aliaksandr

Jeff Mitchell

unread,
Jun 17, 2013, 2:36:15 PM6/17/13
to Aliaksandr Valialkin, Ernest Micklei, golan...@googlegroups.com
Aliaksandr Valialkin wrote:
> Unfortunately I have no near-term plans on POST support, because it will
> complicate the code and it is really hard to satisfy all possible use
> cases with POST caching. The main problem is determining what to use as
> a key for cache. With GET this is really simple - use Request URI - i.e.
> the string passed in the first line of HTTP GET request:
>
> GET /RequestURI HTTP/1.x

Although if different clients order the parameters differently -- for
instance, if this is an API that is public -- then this will fail.
Unless you consider that a corner case (I don't) you're better off
hashing the dict of keys/params -- and then you can use that for POST
requests too (if you include both query and form params in the hash).

Obviously this pays a performance penalty. Maybe something that would be
a good option, to suit different deployments.

FWIW, although I find this project interesting, as far as I can tell
most of the claimed benefits over nginx are either flat out wrong or of
arguable validity/merit, so I haven't yet figured out what would make it
worth switching.

--Jeff

Aliaksandr Valialkin

unread,
Jun 19, 2013, 4:57:57 PM6/19/13
to Vasiliy Tolstov, golan...@googlegroups.com
On Mon, Jun 17, 2013 at 8:36 PM, Aliaksandr Valialkin <val...@gmail.com> wrote:
On Sun, Jun 16, 2013 at 11:21 PM, Vasiliy Tolstov <v.to...@selfip.ru> wrote:
Hello. Thanks for ybc. But i have one question what is the main
difference compared to eblob (http://www.ioremap.net/projects/eblob) ?

It looks like the main difference between eblob and ybc is that eblob is a storage, while ybc is a cache. This means you should implement items eviction algorithms for eblob in the event of storage overflow (aka nginx's 'cache manager'). Ybc automatically evicts cached items, so the maximum data file size never exceeds the given threshold.
 

I didn't know about eblob, though I read ioremap blog three years ago, when Evgeniy Polyakov's POHMELFS appeared in linux world (primarily on lwn.net). From the description it looks like ybc and eblob have similar designs. I should look into its' source code more closely.
 
And as i see cdn-booster may be ddos-ed by getting very big object
from upsteam because of using io.ReadAll from res.Body

Yes. I know. This is for the sake of simplicity. Cdn-booster is intended for small and medium-sized objects :) Usually caching proxy servers have high-throughput channels to upstream servers, so io.ReadAll() call should be performed very quickly, and byte slice returned from io.ReadAll() lives only for a few milliseconds while copying its' content to ybc storage. So DDoS on cdn-booster is non-trivial task :)


The following two quick band-aids can be added in order to avoid these types of DoS attacks:
- Read Content-Length parameter from upstream response and stream response data directly into cache using ybc's SetTxn.ReadFrom() interface. This way we'll avoid creaing temporary byte buffer for storing each item obtained from upstream before storing it into cache.
- Use dogpile-aware Cache.GetDe*() functions for obtaining cached items, so multiple concurrent requests to the same URL missing in the cache won't result into multiple concurrent requests to upstream.


--
Best Regards,

Aliaksandr

Aliaksandr Valialkin

unread,
Jun 19, 2013, 5:05:33 PM6/19/13
to Jeff Mitchell, Ernest Micklei, golan...@googlegroups.com
On Mon, Jun 17, 2013 at 9:36 PM, Jeff Mitchell <jeffrey....@gmail.com> wrote:
Aliaksandr Valialkin wrote:
Unfortunately I have no near-term plans on POST support, because it will
complicate the code and it is really hard to satisfy all possible use
cases with POST caching. The main problem is determining what to use as
a key for cache. With GET this is really simple - use Request URI - i.e.
the string passed in the first line of HTTP GET request:

    GET /RequestURI HTTP/1.x

Although if different clients order the parameters differently -- for instance, if this is an API that is public -- then this will fail. Unless you consider that a corner case (I don't) you're better off hashing the dict of keys/params -- and then you can use that for POST requests too (if you include both query and form params in the hash).


Yes, I consider this a corner case, since usually static resources don't have any query parameters :)
Again, you can easily implement handling for this corner case by converting RequestURI into canonical form (e.g. filtering and sorting query parameters in your case) before creating a cache key from it.
 

Obviously this pays a performance penalty. Maybe something that would be a good option, to suit different deployments.

FWIW, although I find this project interesting, as far as I can tell most of the claimed benefits over nginx are either flat out wrong or of arguable validity/merit, so I haven't yet figured out what would make it worth switching.

Probably. YMMV. I consider the main advantage of cdn-booster over nginx in the area of caching proxy is easy-to-read-and-hack code. Not counting numerous advantages inherited from ybc :)


--
Best Regards,

Aliaksandr
Reply all
Reply to author
Forward
0 new messages