— Micheil Smith
> --
> You received this message because you are subscribed to the Google Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
You might want to consider reading in the system mime types. Maybe
like this: http://github.com/rsms/oui/blob/master/oui/mimetypes.js
> --
> You received this message because you are subscribed to the Google Groups
> "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en.
>
--
Rasmus Andersson
Creating HTTP server listener...Listening for incoming requests...Client 127.0.0.1 Connection establishedClient 127.0.0.1 Connection establishedbuffer:73throw new Error('Unknown encoding');^Error: Unknown encodingat Buffer.write (buffer:73:13)at IncomingMessage.authorize (/home/bookmark/Desktop/nodejs/nitrode/index.js:164:43)at [object Object].handle (/home/bookmark/Desktop/nodejs/nitrode/index.js:179:28)at Server.<anonymous> (/home/bookmark/Desktop/nodejs/nitrode/index.js:117:73)at Server.emit (events:33:26)at HTTPParser.onIncoming (http:825:10)at HTTPParser.onHeadersComplete (http:87:31)at Stream.ondata (http:757:22)at IOWatcher.callback (net:517:29)at node.js:266:9
Nitrode is cool though, just had to defend my project :P
> --
> You received this message because you are subscribed to the Google Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
Sent from my iPhone
> Keep in mind, it would take a hundred different ones. And of course, that's a really uncommon scenario (if you're serving 2mb files you probably want S3 or similar)
>
Exactly my point, connect caches the results by unique request. You can serve a 650mb iso with connect with 100 concurrent users and it won't use much more than 650mb of ram, and it will only hit the disk once. It's when you have more unique data than ram that it becomes a problem, and only then if all those requests come in before the cache timeout. The default cache time in connect is 0, so it's a highly unlikely situation.
Besides, like I said, most people are used to serving large files with a system like s3 or nginx/apache on their server.
I would love to collaborate and share code, that's what connect is all about. It's meant to be a set of libraries and tools for framework makers. Express has made good use of is and drastically reduces it code size.
-Tim
> Caching doesn't require that you always load an entire query in to memory or that you hold it in memory before you begin to return the response.
>
> A chain of sys.pump calls is the best use of memory and you can just keep an extra data listener on the last stream to write to cache, then subsequent requests can pull out of the cache.
>
> These two ideas aren't mutually exclusive and should be complementary.
>
> -Mikeal
No offense, but in my experience sys.pump is still way too slow and putting a cache on the end of that would be the worst of both worlds. You get large latency on the initial request, and all the memory saving from the expensive pump is lost in the cache.
I think a switch that streams large files and caches small ones is better.
The reason it's slow is because disk I/O must be done in the eio
thread pool and throttled — each time pump sends through one chunk
(say, 4kB) that result in queueing a read in eio, waiting for it to
get dequeued and then continue.
>
> The other advantage of using sys.pump is the reduced memory usage and
> unlimited file size limit. Projects like connect, read the entire
> file, storing it on memory, before sending it through to the client in
> one big chunk. This works fine, and in-fact has the best performance,
> but given that the file is having to be stored in memory, with a lot
> of clients downloading files, this will become a significant
> problem... You could of course cache these files in memory to make
> this process more efficient but the logic required to ensure not too
> much memory is used and monitoring file changes would just be too much
> of a pain to add and and will probably contribute to a lot more
> overhead.
>
> Linux has a function called sendfile (http://www.kernel.org/doc/man-
> pages/online/pages/man2/sendfile.2.html) which is what sys.pump will
> hopefully use in future, which would significantly increase
> performance by offloading the IO to the system kernal.
It's not limited to unix and can only read file descriptors
representing data on disk. It's available already in node though the
fs module: fs.sendfile(outFd, inFd, inOffset, length, callback)
> --
> You received this message because you are subscribed to the Google Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
--
Mikael,
I guess that comment was aimed at tim. All sys.pump does is copy data
from one stream to another, waiting for the written data to be flushed
before reading and then writing any more.
This process is slow, as you
correctly said, because data is by default sent in 4KB chunks, which
works ok for small files, but with large files you end up with 500k
chunks which really is a problem.
The advantage to smaller chunks is that between the read stream and
the write stream, node.js holds that chunk in memory. If you send the
file in 1 chunk the size of the file itself, then the whole file is
stored in memory. So the best idea would be to create some algorithm
that establishes the best chunk size based on the file size and memory
available, remembering that the larger the chunk size, the more
efficient it is. This is easily done using fs.createReadStream, with
an option (i think its bufferSize) that allows you to define the size
of the chunk. When playing with this i noticed that using sys.pump on
a read stream with the buffer size the same size as the file had the
same performance as using fs.readFile and sending the data straight
(as how connect does it). This is by no way surprising as both methods
do (or at least should do - i haven't checked) use the same construct.
So the problem isn't with sys.pump at all, its with how you've setup
the read stream you're dealing with. And i will continue to use
sys.pump wherever possible and just make sure i optimise the read
stream to have the most appropriate chunk size. Having said all that,
the ideal solution would be to use the sendfile and sendfile64
commands which is faster then manually using read/write commands as
its handled internally by the kernel. However this only works with
file read streams and not TCP sockets, and so complicates the use of
automatic GZip compression streams.
IMO, this is why sys.pump should not do sendfile, ever. Rather,
sendfile should be something that Nitrode or Connect does when it
knows that it can serve the response directly from an on-disk cache,
bypassing the whole "middleware" concept at that point.
Even if the input stream is a file descriptor, and the output stream
is a socket, the actual "Stream" object might have been mutated such
that it changes the incoming data before sending it out. sys.pump
needs to be completely agnostic and accepting of *anything* that
matches the Stream API, or else it's too magical to be reliable.
It's not that we haven't figured out how to do it, I don't think. I
think we've figured out that we can't, and that sendfile is just a
different thing than sys.pump.
--i
--
> Hi Olly,
>
> I didn't expect my question on performance would trigger this
> (interesting) discussion. In any case, all this is way over my head ;)
>
> My use case is as follows:
> I am developing a web app. This implies serving a couple of small
> static files to the client once (using application cache, only the
> meta file is then checked for updates from time to time and triggers
> file re-downloads when available...). The rest of the dynamic data
> (coming from MongoDB) is transmitted via websocket (or downgraded to a
> comet like system if the browser does not support it).
>
> So ideally (in this use case...), static files would be loaded once,
> compressed if necessary and served from memory.
>
> I believe that this schema is exactly what NodeJS can excel at. I hope
> Nitrode will enable this.
>
> Cheers,
> Peter
>
That is exactly connect's default behavior with a simple stack.
Connect.createServer(
Connect.cache(1000),
Connect.gzip(),
Connect.staticProvider()
);
Will serve the small files from disk, compress the compressible ones, and cache it in ram as a buffer for super fast serving.
Socket.io or other websockets work fine with connect. They just hijack the request handler outside of the connect stack.