You can't avoid disk I/O if you're serving files - zero copy doesn't make disks faster, it just eliminates some inefficiency once you've *got* the data from a file. But you can mitigate it by choosing your hardware, OS and filesystem well.
Before you do anything I suggest here, see if you can find out *why* your disk I/O is slow. Many files being read at the same time? Failing hardware? Old slow hardware?
If possible, have enough memory that for most things, you're reading from the OS's disk cache in memory, rather than disk. Look at filesystems such as ZFS that let you have a secondary cache on SSD, so you get the capacity of spinning disks with the performance of flash, at least for things that are cached.
On the other hand, if you're serving lots different, truly massive files concurrently, cache can become the enemy, since the machine is going to do a lot of work thrashing bytes in and evicting them almost immediately.
Is the disk actually local? If it's a SAN, all bets are off - that multiplies the number of things that could be going wrong enormously, and if that's the case, the first thing to do is see if it's slow on local disk. If it isn't, you know where your problem is.
If you have the misfortune that you're running on Windows, you may be stuck, but hopefully you're not doing that in production. When I was working on NetBeans (an IDE can bang on a lot of files in rapid succession), I spent some time even discussing problems with Jeffery Richter at Microsoft about that - a number of simple operations like checking if a file exists would block for several seconds. If I recall correctly, I think there was a queue somewhere deep in the OS that we were blowing through by trying to queue up too many I/O operations too fast, which caused problems that only existed for Windows users.
Fundamentally what any OS does when you do synchronous file I/O is:
- Put the calling thread to sleep
- Queue up an I/O operation inside the OS
- Move the disk heads to the right place, waiting however long that takes once other pending I/O operations are done
- Copy data into the calling thread's buffer
- Wake the calling thread back up
<rant>
The point of that little circus is so we can all program as if we were running on a 286 that couldn't possibly be doing anything other than sitting around waiting for I/O to complete.
How this actually ought to work in a modern computer is much more like network I/O - hand the work to the OS and get a notification from the OS when something is ready (NodeJS fakes this with thread pools). That doesn't necessarily make anything faster, but it moves the programming model much closer to the reality of what you're asking the machine to do.
</rant>
If you're blocking your server from accepting new connections, and that's part of "slow", then you may want to take the code that does the file I/O and do that in a background thread pool (there's no problem in Netty with multiple threads writing to a channel, just use ChannelFutureListener to make sure your previous write is flushed before you do another).
Okay, that was a bit long.
First, figure out for sure that it's disk I/O:
- Take thread dumps, use a profiler, or preferably both - you'd be surprised how much you can learn just from comparing a bunch of thread-dumps
Second, if it really is disk I/O, figure out where your problem really is:
- Not enough memory for cache?
- Slow disks?
- OS that does this sort of thing badly?
- Thrashing cache?
- Machine busy doing something else?
A lot of those you can do less-than-rigorous testing of by swapping out hardware, OS, adding memory, etc.
HTH,
Tim