Serving static files

已查看 9 次
跳至第一个未读帖子

morten...@gmail.com

未读,
2009年4月11日 15:28:292009/4/11
收件人 HAppS
Hi again

I have some thought about how to serve static files. I would like your
comments.

There are at least two conceptually different ways for Happstack to
serve static files.
One is with fileServe, i.e. to read the file from disk, or from cached
disk, every time a
request asks for that page. The other is to read all static file into
memory at Happstack start up, and then just serving requests from
memory.

I made a simple test implementation of both, and tested them with
httperf.

I created a file, index.html, of size 1MB in the first test, and 1kB
in the second test.

The code for the disk based solution is:

import Happstack.Server

main = simpleHTTP nullConf {port=8001} $ fileServe [] "."


The code for the memory based solution is:

import Happstack.Server
import qualified Data.ByteString.Lazy.Char8 as B

main = do
content <- B.readFile "index.html"
simpleHTTP nullConf {port=8002} $ return $ (toResponse "")
{rsBody=content}


I test them with something like

httperf --hog --num-calls 100 --num-conns 100 --rate 100000

Varying the number of connections and calls.

For the timing I use the total time from httperf, and for memory i use
real memory from Mac OS X's activity monitor (or from top).

The memory at start up was 816 kB for the memory solution and 772 kB
for the fileServe solution.

With the 1MB index.html, and num-calls=100, num-conns=100, the results
were:

fileServe: 41 seconds, 85 MB memory at finish.
memory: 26 seconds, 8.8 MB memory at finish.

With the 1kB index.html file, and num-calls=10000, num-conns=100, the
results were:

fileServe: 452 seconds, 8.7 MB memory at finish.
memory: 213 seconds, 7.6 MB memory at finish.

So for large files, the fileServe solution is terrible in terms of
memory and it is slower than the memory solution. For small files it
is even slower.
I could imagine that for many static files, with requests to all them,
that the memory solution would be even better.

It seems like the memory consumption is set by the peak number of
connections, and by the amount of memory each connection thread keeps.
The threads are not garbage collected. However, they are reused later.
Issuing the same requests again leads to a slight increase in memory
but not much. So the memory consumption of Happstack might level off,
but I am not certain about that.

Do you know why the threads are not garbage collected after the
connections close?

I should say that I ran the tests with the newest code from darcs.
Without the latest patch, the memory consumption was much much worse.
But there is still space leakage. Great patch!

The memory solution is easily generalized to all the static files.

In pseudo code

main = do
read all static files, gzip them, store them in a Data.Map called
fileMap
simpleHTTP conf (memoryServe fileMap) >>= the rest

There should be no compression filter on the static part. One should
not gzip the same file again and again.

The two big disadvantages with the memory solution is that the files
cannot be changed without restarting Happstack and it only works if
the files are not too large. I would claim that both conditions are
satisfied in most cases. For a critical web site, there would be multi
master redundancy in any case, in which case, it is easy to restart.
That leaves giant files as the only application of fileServe.

By the way, giant files should be gzipped outside the application or
at startup. So static files should never be combined with the
compression filter.

At least, that is how I view it.

Comments?

Cheers,

Morten.

Simon Michael

未读,
2009年4月11日 15:39:512009/4/11
收件人 ha...@googlegroups.com
> Comments?

Nothing substantial from me, just thanks for a very informative post.

So you'd suggest happstack provide a limited sort of memcached built in ? Seems like a good idea to me.

MightyByte

未读,
2009年4月11日 15:53:462009/4/11
收件人 HA...@googlegroups.com
I haven't read the original post yet, but I've also been thinking it
would be a good idea to provide some automatic file caching.

Rick R

未读,
2009年4月11日 16:00:232009/4/11
收件人 HA...@googlegroups.com
That's interesting, I would think the correct approach would be to load the file on its 1st request, then cache in memory, but if fileServe's performance is that bad, it might be best to do as you suggest and load everything into a cache at start up.

I assume however, that the performance of fileServe can be improved so that it doesn't require more memory than the cache.

Also, with sendFile available on most unixes now, it would probably be the most optimal option.
--
We can't solve problems by using the same kind of thinking we used when we created them.
   - A. Einstein

Gregory Collins

未读,
2009年4月11日 16:09:202009/4/11
收件人 HA...@googlegroups.com
MightyByte <might...@gmail.com> writes:

> I haven't read the original post yet, but I've also been thinking it
> would be a good idea to provide some automatic file caching.

I'm going to chime in to respectfully disagree.

Any file that is served this often will be sitting in the Linux buffer
cache. I'll bet dollars-to-donuts that the reason for the 2x performance
penalty is the kernel -> userland -> kernel data copy that fileServe
does, and the extra context switch that comes with it.

If we could figure out how to teach happstack to use sendfile(), the
issue would be moot.

G
--
Gregory Collins <gr...@gregorycollins.net>

morten...@gmail.com

未读,
2009年4月12日 03:42:332009/4/12
收件人 HAppS
Hi

Rick, I agree that the file should be loaded at the first request, and
not at the exact
start up. But that is always what happens due to lazy evaluation. That
is automatic.
In my example, the starting memory foot print was just 800 kB, so
clearly the 1 MB file was
not read into memory at that stage. At the first request, memory jumps
to around 2.5 MB, and
then increases only slightly for the second request. The fileServe,
however, increases more even at the second
request.

Rick and Greg, about sendFile. Sure, it would probably solve the issue
for static files. However,
there is a similar issue for template html. I made a test where I
inserted a time in the middle
of the index.html file. Again comparing "one file read into memory at
the beginning" with
"file read at every request".

The simple example code looks like

import Happstack.Server
import qualified Data.ByteString.Lazy.Char8 as B
import System.CPUTime
import Control.Monad.Trans

main = do
content <- B.readFile "index.html"
simpleHTTP nullConf {port=8002} ( do
time <- liftIO getCPUTime
let (head', tail') = B.splitAt (B.length content `div` 2) content
timeB = B.pack $ show time
return $ (toResponse "") {rsBody= B.concat [head', timeB, tail']})

and

import Happstack.Server
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad.Trans
import System.CPUTime

main = simpleHTTP nullConf {port=8001} (do
content <- liftIO $ B.readFile "index.html"
time <- liftIO getCPUTime
let (head', tail') = B.splitAt (B.length content `div` 2) content
timeB = B.pack $ show time
return $ (toResponse "") {rsBody= B.concat [head', timeB, tail']})

This should simulate a template engine.

The speed and memory was basically the same as above.

So, people who use template html in Happstack, might still consider
reading the templates into memory
at start up. All the connections can then share memory minimizing
memory leakage, and also saving read time from the OS cache or the
disk in worst case.

As a side note, the best way to populate template html with values is
probably in Javascript on the
client browser, meaning that all files are actually static. The state
is then communicated to the browser via ajax or something similar.

I tried running the same httperf test with the 1MB file on Apache. It
crashed! I didn't do any tuning of Apache, but still.

Morten.

Gregory Collins

未读,
2009年4月12日 13:52:202009/4/12
收件人 HA...@googlegroups.com
"morten...@amberbio.com" <morten...@gmail.com> writes:

> Hi


>
> So, people who use template html in Happstack, might still consider
> reading the templates into memory at start up.

Hi,

That's exactly the way HStringTemplate works by default.

G.
--
Gregory Collins <gr...@gregorycollins.net>

Vagif Verdi

未读,
2009年4月12日 14:40:042009/4/12
收件人 HAppS


On Apr 11, 11:28 am, "morten.kr...@amberbio.com"
<morten.kr...@gmail.com> wrote:
> The two big disadvantages with the memory solution is that the files
> cannot be changed without restarting Happstack

Actually this is easy to workaround. Just create a url command in your
web application to refresh files. www.myapp.com/admin/refreshstaticfiles

morten...@gmail.com

未读,
2009年4月12日 16:42:422009/4/12
收件人 HAppS
Nice. I hadn't looked into HStringTemplate at all.

About sendFile, which seems to be the optimal solution for static
files. It doesn't seem to me that there is a way to implement it in
fileServe. It is simpleHTTP that would need to be rewritten to accept
a sendFile system call instead of a bytestring as the response body.
That would change the whole type system of Happstack. Or am I wrong?

Vagif, I understand in principle what you mean, but I don't see how it
is implemented here without getting the bad memory performance back.
Do you want to put the static files into State? That might avoid the
file reads but would be bad in several other respects.
Or do you want to update something outside State on the fly? But that
is not possible or what?

Anyway, I don't think Happstack restart is a big problem. It can be
done fast, and with a multimaster setup it will lead to no downtime.
It might
be necessary in any case unless space leakage is completely eliminated
from Happstack.

Morten.

On Apr 12, 7:52 pm, Gregory Collins <g...@gregorycollins.net> wrote:
> "morten.kr...@amberbio.com" <morten.kr...@gmail.com> writes:
> > Hi
>
> > So, people who use template html in Happstack, might still consider
> > reading the templates into memory at start up.
>
> Hi,
>
> That's exactly the way HStringTemplate works by default.
>
> G.
> --
> Gregory Collins <g...@gregorycollins.net>

Gregory Collins

未读,
2009年4月12日 18:45:442009/4/12
收件人 HA...@googlegroups.com
"morten...@amberbio.com" <morten...@gmail.com> writes:

> Nice. I hadn't looked into HStringTemplate at all.
>
> About sendFile, which seems to be the optimal solution for static
> files. It doesn't seem to me that there is a way to implement it in
> fileServe. It is simpleHTTP that would need to be rewritten to accept
> a sendFile system call instead of a bytestring as the response body.
> That would change the whole type system of Happstack. Or am I wrong?

No, that's exactly right. I've been percolating on this a while, there
are a couple of ways it could go:

* Change ServerPartT/WebT. The output type of WebT would change; instead
of "Maybe (Either Response a, FilterFun Response)" (where "a" will
eventually get a "ToMessage" class constraint), you'd change it do
something like this:

data WebTResponse a = DontHandle
| ShortCircuit (Response, FilterFun Response)
| Finished (a, FilterFun Response)
| HandledViaRawSocket Bool

(Obviously we probably wouldn't export these constructors.)
"HandledViaRawSocket" would be returned if the programmer decided to
use raw I/O; the burden of properly following the HTTP protocol was
would be left to the programmer (although we'd provide useful helper
functions). The argument to "HandledViaRawSocket" would indicate
keepalive status. I imagine that you'd call a function like this:

withRawSocket :: MonadIO m => (Request -> Socket -> IO Bool)
-> ServerPartT m ()

The downside to this is that all of the guts would have to change, but
if you were careful you might get away with not having to change any
client code.


* Create an alternative to simpleHTTP' that would accept ServerPartTs as
they exist now, and also another type of raw handler:

newtype RawServerPartT m a = RawServerPartT {
unRawServerPartT :: ReaderT Request (RawWebT m) a
}

newtype RawWebT m a = RawWebT {
unRawWebT :: ErrorT (Socket -> m Bool) m a
}

I'm not very satisfied with this, but maybe you get the idea; if you
call "withRawSocket" then subsequent monad processing ceases and it'll
run your IO action. (i.e. you don't get to grab the socket unless
you're going to commit to handling the request.) Otherwise, we ignore
the result and treat it as a refusal to handle the request. The driver
function would look something like this:

complicatedHTTP' :: (Monad m, ToMessage b) =>
Conf
-> (m Bool -> IO Bool)
-> RawServerPartT m c
( m (Maybe (Either Response a, FilterFun Response))
-> IO (Maybe (Either Response b, FilterFun Response)))
-> ServerPartT m a
-> IO ()

The other solution is a little more consistent but this one has the
advantage that it doesn't disturb any of the other code.

morten...@gmail.com

未读,
2009年4月13日 11:09:502009/4/13
收件人 HAppS
My test above was performed on localhost. In a real setting, the
internet connection is likely the bottleneck, so
the speed difference is probably irrelevant. The memory consumption is
more serious. But it seems to me, by testing more, that the memory
consumption
levels off, and is just set by the peak load. So the fileSend issue
should only be relevant for very large files. But there it will be
important.

Greg, I prefer your first solution. However, I don't think that WebT
should contain low level info like raw socket. what about just
having a constructor like "FileSend FilePath" and letting simpleHTTP
perform the task. Actually how do you want the webT to connect to
the raw socket? It doesn't know anything about ports and connections.
Also why do you want a boolean in the raw socket part but not in other
parts?

Morten.

Matthew Elder

未读,
2009年4月13日 12:27:352009/4/13
收件人 HA...@googlegroups.com
This is mostly true, you can have it load lazily, at the beginning, or on-demand. So yeah, hstringtemplate has these "batteries included" already.

Matthew Elder

未读,
2009年4月13日 12:35:102009/4/13
收件人 HA...@googlegroups.com
While I am not fully convinced that the userland context switch is responsible for this speed decrease -- I agree with Gregory's sentiment. While happstack is platform independent, anyone considering a scalable solution would pick linux (or some unix variant). And remember, this is static files we are talking about. I think that the sendfile solution though it is low-level and not pure haskell is much more simple and straightforward than implementing our own caching system which is a duplication of effort since linux already has really nice file caching.
 
If people want to serve up static files on a non-sendfile platform and just have to have that extra performance then they can use something like nginx or lighttpd as a reverse proxy frontend. Static file caching is NOP (not our problem) tm. I'd much rather focus on the application development platform than low-level I/O bottlenecks.
 
-- Matt

MightyByte

未读,
2009年4月13日 12:47:082009/4/13
收件人 HA...@googlegroups.com
My primary concern was that the caching be transparent, and a sendfile
solution certainly qualifies. My wording didn't indicate that because
I had only given the problem cursory thought and had not considered
the OS-level solution. I don't have a problem with leaving non-*nix
OS's out here, since that's not an obstacle for me.

Matthew Elder

未读,
2009年4月13日 12:50:552009/4/13
收件人 HA...@googlegroups.com

My primary concern was that the caching be transparent, and a sendfile
solution certainly qualifies.  My wording didn't indicate that because
I had only given the problem cursory thought and had not considered
the OS-level solution.  I don't have a problem with leaving non-*nix
OS's out here, since that's not an obstacle for me.
I did not intend my response as an attack :) I liked seeing several sides to the argument.
 
So, who will be the first to send a sendfile patch ? :)
 
-- Matt

MightyByte

未读,
2009年4月13日 12:57:052009/4/13
收件人 HA...@googlegroups.com
Oh, sorry if my tone came across as defensive. I didn't mean it to be
(and didn't interpret your response as an attack). I was just doing
some retraction and clarification of my previous statement.

morten...@gmail.com

未读,
2009年4月13日 16:24:412009/4/13
收件人 HAppS
Matthew, I agree with you that it is better to focus on the dynamic
part of the application.
However, the solution I sketched in the beginning about loading the
static files at start up is very
easy. SendFile is another story of course.
Large files might just be served by nginx instead.

stepcut

未读,
2009年4月13日 23:00:022009/4/13
收件人 HAppS
On Apr 11, 2:28 pm, "morten.kr...@amberbio.com"
<morten.kr...@gmail.com> wrote:
> Hi again

> It seems like the memory consumption is set by the peak number of
> connections, and by the amount of memory each connection thread keeps.
> The threads are not garbage collected.

GHC does not ever return memory to the OS once it has allocated it, so
that might be why:

http://hackage.haskell.org/trac/ghc/ticket/698

In terms of file serving efficiency, we have pondered the idea of
using hyena instead of the current simpleHTTP.

It would be interesting to see how hyena compares, and if hyena sees
similar benefits when files are preloaded into RAM.

http://github.com/tibbe/hyena/tree/master

- jeremy
回复全部
回复作者
转发
0 个新帖子