Is it a bad idea to use synchronous filesystem methods in a Dart web server?

230 views
Skip to first unread message

Danny Tuppeny

unread,
Sep 20, 2014, 5:07:33 AM9/20/14
to mi...@dartlang.org

I'm playing around with HttpServer; and was adding support for serving static files (I'm aware of Shelf; I'm doing this as a learning excercise). I have a list of handlers that are given the opportunity to handle the request in sequence (stopping at the first that handles it):

const handlers = const [
  handleStaticRequest
];

handleRequest(HttpRequest request) {
  // Run through all handlers; and if none handle the request, 404
  if (!handlers.any((h) => h(request))) {
    request.response.statusCode = HttpStatus.NOT_FOUND;
    request.response.headers.contentType = new ContentType("text", "html");
    request.response.write('<h1>404 File Not Found</h1>');
    request.response.close();
  }
}

However, as I implemented the static file handler, I realised that I couldn't return true/false directly (which is required by the handleRequest code above, to signal if the request is handled) unless I use file.existsSync().

In something like ASP.NET, I wouldn't think twice about a blocking call in a request because it's threaded; however in Dart, it seems like it would be a bottleneck if every request is blocking every other request for the duration of IO hits like this.

So, I decided to have a look in Shelf, to see how that handled this; but disappointingly, that appears to do the same (in fact, it does several synchronous filesystem hits).

Am I overestimating the impact of this; or is this a bad idea for a Dart web service? I'm not writing Facebook; but I'd still like to learn to write things in the most efficient way.

If this is considered bad; is there a built-in way of doing "execute these futures sequentially until the first one returns a match for this condition"? I can see Future.forEach but that doesn't have the ability to bail. I guess "Future.any" is probably what it'd be called if it existed (but that doesn't)?

Adam Stark

unread,
Sep 21, 2014, 10:16:13 AM9/21/14
to mi...@dartlang.org
What you're describing is potentially a bottleneck, but how much of one would depend on your system.

I'd say the fileExistsSync check might be slow on machines that have to get a lot of stuff off a remote server. At least you're not using  readAsStringSync. Test it both ways, see what works.

Even if you were to have handlers be a List<Future<bool>>, then your second or third handler would have to wait for all the previous handlers to fail before examining the request.

--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new

To unsubscribe from this group and stop receiving emails from it, send an email to misc+uns...@dartlang.org.



--
Adam Stark

Danny Tuppeny

unread,
Sep 21, 2014, 10:30:41 AM9/21/14
to mi...@dartlang.org
On 21 September 2014 15:16, Adam Stark <llama...@gmail.com> wrote:
What you're describing is potentially a bottleneck, but how much of one would depend on your system.

I'd say the fileExistsSync check might be slow on machines that have to get a lot of stuff off a remote server. At least you're not using  readAsStringSync. Test it both ways, see what works.

I did a little benchmarking earlier (with just two sync methods; the exists and read) and the difference was measurable, but not significant. I was hoping to make a blog post out of it; but I don't think it'd be very interesting without more sync operations; and building a "real world" example with more sync operations is more effort than I want to spend ;(

 
Even if you were to have handlers be a List<Future<bool>>, then your second or third handler would have to wait for all the previous handlers to fail before examining the request.

Yeah; when I started this I thought it made sense to serve static files over dynamic content, and it's the only way to fall-through this way. However, I now think it'd make more sense to go the other way; with some better way of mapping routes into handlers and the static handler last.

Needs more thought! :)

Arron Washington

unread,
Sep 21, 2014, 4:15:46 PM9/21/14
to mi...@dartlang.org
In the Ruby on Rails world the convention is generally to let multi-threaded optimized web server (Apache, nginx) serve all your static content and let your app web server handle the rest via a reverse proxy. Then you add on a CDN like Cloudfront on top of that as your asset host, letting your Apache / nginx setup act as the "origin server" for Cloudfront.

Presto chango, scalability and magic beans!

I don't think I would ever synchronously serve up files in a single-threaded environment -- outside of development machines, of course. It seems too hard to scale up with traffic volume.

Greg Lowe

unread,
Sep 21, 2014, 5:14:12 PM9/21/14
to mi...@dartlang.org
If you're running behind nginx or apache, have a look at x-accel, and sendfile. This allows you to run some logic in your Dart code, and then handover the serving of the file to the highly optimised web server. This also frees up your Dart process to handle more jobs. http://wiki.nginx.org/X-accel

I use the sync apis when I'm writing simple command line scripts. I'd recommend using the async apis for a webapp, unless you are not concerned about achieving a high throughput.

Danny Tuppeny

unread,
Sep 22, 2014, 2:06:45 AM9/22/14
to mi...@dartlang.org


On 21 Sep 2014 21:15, "Arron Washington" <l33...@gmail.com> wrote:
>
> In the Ruby on Rails world the convention is generally to let multi-threaded optimized web server (Apache, nginx) serve all your static content and let your app web server handle the rest via a reverse proxy. Then you add on a CDN like Cloudfront on top of that as your asset host, letting your Apache / nginx setup act as the "origin server" for Cloudfront.

Sometimes a CDN is an unnecessary complication (this is only a toy app to learn some Dart, and CDNs can make automated deployments more involved), but also, static content isn't the only thing you might hit disk for. If you have server-side view templates, they're going to come from local storage. The question was more about the impact of stalling a single threaded web server for IO (and since Shelf does it, I wanted to know if I'd overestimated the impact).

Anders Johnsen

unread,
Sep 22, 2014, 2:41:48 AM9/22/14
to General Dart Discussion
The short answer is 'it depends'.

Reading files synchronously will block the main Dart thread - something you normally would prefer to avoid. However, when you use non-blocking IO calls, they are offloaded to other threads, and that will introduce a minor overhead. That overhead have to be taken into consideration, as well as the estimated time the call will block. Two examples:

Few, small files, on a local disk. In this case, the kernel will most likely have cached the files in memory, making the actual IO call very fast. Here it will most likely be faster to do a blocking call, than to off-load the call through an async call.

Many, large files, on a remote disk. In this case, the kernel will have a harder time to cache the content of the files, and there may be actual reads going on. A read can easily take 5ms+ (largs files/remote disks), so here it is definitely worthwhile to use async file reads.

I hope this can help you to decide on what to do. 

Cheers,

- Anders

--

Arron Washington

unread,
Sep 22, 2014, 3:13:17 AM9/22/14
to mi...@dartlang.org
Well, loading a template is generally a few file system calls, versus say loading many many assets like stylesheets, vendored javascript files, images and sprites, etc. I guess its less of an issue if you can compile everything down into one large spritesheet, one large javascript file, and one large stylesheet.

I see what you're saying though.

By the way, you should check out Cloudfront's origin server feature if you consider CDN deployments hard. They basically proxy a single request to your server to fetch an image when a client requests it from them, then serve it up from their edge locations:

http://myapp.cloudfront.com/homepage_image.png --> http://myapp.com/homepage_image.png (and then never again... until 12 hours pass, anyway).

If you hash your assets on deploy you never have to manually invalidate the CDN yourself, since the filename will always change.

--
For other discussions, see https://groups.google.com/a/dartlang.org/
 
For HOWTO questions, visit http://stackoverflow.com/tags/dart
 
To file a bug report or feature request, go to http://www.dartbug.com/new



--
- AW

Danny Tuppeny

unread,
Sep 22, 2014, 4:19:26 AM9/22/14
to mi...@dartlang.org
On 22 September 2014 08:13, Arron Washington <l33...@gmail.com> wrote:
Well, loading a template is generally a few file system calls, versus say loading many many assets like stylesheets, vendored javascript files, images and sprites, etc. I guess its less of an issue if you can compile everything down into one large spritesheet, one large javascript file, and one large stylesheet.

I see what you're saying though.

By the way, you should check out Cloudfront's origin server feature if you consider CDN deployments hard. They basically proxy a single request to your server to fetch an image when a client requests it from them, then serve it up from their edge locations:

http://myapp.cloudfront.com/homepage_image.png --> http://myapp.com/homepage_image.png (and then never again... until 12 hours pass, anyway).

If you hash your assets on deploy you never have to manually invalidate the CDN yourself, since the filename will always change.

Now you mention this; a colleague had told me that's how we use CloudFront at work; I must've been having a blonde moment! :O)

(though it's not all that useful for now; I'm just messing with Dart, I'm not building the next Facebook ... yet ;-D)

Danny Tuppeny

unread,
Sep 22, 2014, 4:28:38 AM9/22/14
to mi...@dartlang.org
On 22 September 2014 07:41, 'Anders Johnsen' via Dart Misc <mi...@dartlang.org> wrote:
The short answer is 'it depends'.

Reading files synchronously will block the main Dart thread - something you normally would prefer to avoid. However, when you use non-blocking IO calls, they are offloaded to other threads, and that will introduce a minor overhead. That overhead have to be taken into consideration, as well as the estimated time the call will block.

I did some testing over the weekend with a small script that just did an existsSync() and readAsBytesSync() vs an async version. I tested it on xs/s/m Azure VMs (shared core, single core, dual core) with 200, 600, 2000 requests/sec for 60/seconds (using loader.io).

I was hoping to make a blog post from it, but the results were so predictable and consistent for all combinations, I decided it wasn't worth it! In short:
  • The sync version had a higher response time by a few %
  • The sync version had a faster minimum response time; it returned faster for the first few requests during ramp-up (less contention, so requests weren't affected by many IO stalls)
The disk cache did not seem to offset the async overhead; it was still slower on average over the tens of thousands of requests.

I think this makes the decision easy if you expect to have your boxes loaded; but still leaves the decision a bit fuzzy if you expect low load! Personally, I'm going to stick with avoiding the sync APIs, but I think the decision is less clear-cut than I expected (I guess the overhead of using the async APIs is higher than I expected, I assumed it'd always be faster than any sort of disk IO).

Anders Holmgren

unread,
Sep 22, 2014, 5:26:49 PM9/22/14
to mi...@dartlang.org
I'm surprised to learn that shelf_static is using sync calls. I've been OC with doing async in all my shelf packages as I've been going on the assumption that async is best for throughput / scalability and throughput / scalability is king for most / all web server apps.

Kevin Moore, do you have any thoughts on this? Did you code shelf_static that way mainly for convenience or was it some other reasoning?

A

Kevin Moore

unread,
Sep 29, 2014, 10:45:21 PM9/29/14
to mi...@dartlang.org
I did a set of benchmarks with large and small files. It turns out that using the sync functions is faster across the board, although as the size of the file gets bigger the "win" for using sync functions starts to drop off.

The key bit: sending the actual file contents is done async. The only bits that are sync are checking for a file existing, etc.

My tests did not try to use many different files. There's a chance my benchmark is benefitting from caching of FS metadata.

Certainly worth a deeper look.

Anders Holmgren

unread,
Sep 29, 2014, 11:33:25 PM9/29/14
to mi...@dartlang.org
Did you measure throughput? I can imagine it might be faster for a single request but if it blocks the event loop longer it might result in being able to serve less requests concurrently

Danny Tuppeny

unread,
Sep 30, 2014, 1:55:46 AM9/30/14
to mi...@dartlang.org

In my benchmarks, that's what I saw. At low throughout the requests were faster but at high throughput, they were slower (presumably because they got queued up while all the requests in front of them stalled for IO).

Didn't make it any easier to decide which way to do it through :(

On 30 Sep 2014 04:33, "Anders Holmgren" <andersm...@gmail.com> wrote:
Did you measure throughput? I can imagine it might be faster for a single request but if it blocks the event loop longer it might result in being able to serve less requests concurrently

Martin Kustermann

unread,
Sep 30, 2014, 3:58:00 AM9/30/14
to mi...@dartlang.org
Anders already said it: 'it depends'.

For normal disc access the linux kernel's VFS/pagecache will have metadata+data already, which makes it obviously fast.

But, whatever benchmark you ran, the numbers are generally not representative. It depends on many factors: ssd vs. hdd disc, which local filesystem, local fs or network fs, cloud provider, how much RAM you use and how much the kernel can use, how many files you're dealing with ...

One example: Google's ComputeEngine e.g. restricts the IOs/sec and MB/sec to limits which depend on the disc type and size. This means if you do more calls than that limit, they'll just pile up and block. Which means your sync operations can block the dart thread for a very long time!

So be careful :)


--

W. Brian Gourlie

unread,
Oct 6, 2014, 3:43:56 PM10/6/14
to mi...@dartlang.org
In something like ASP.NET, I wouldn't think twice about a blocking call in a request because it's threaded; however in Dart, it seems like it would be a bottleneck if every request is blocking every other request for the duration of IO hits like this.

Don't want to go too off topic, but even in asp.net you can exhaust the thread pool if you're doing blocking IO (assuming moderate volume).  Of course, you can leverage async/await and all the non-blocking methods that .net supports to get the best of both worlds.

Danny Tuppeny

unread,
Oct 7, 2014, 3:50:14 AM10/7/14
to mi...@dartlang.org
On Monday, 6 October 2014 20:43:56 UTC+1, W. Brian Gourlie wrote:
In something like ASP.NET, I wouldn't think twice about a blocking call in a request because it's threaded; however in Dart, it seems like it would be a bottleneck if every request is blocking every other request for the duration of IO hits like this.

Don't want to go too off topic, but even in asp.net you can exhaust the thread pool if you're doing blocking IO (assuming moderate volume).  Of course, you can leverage async/await and all the non-blocking methods that .net supports to get the best of both worlds.

This is true; but in asp.net it only becomes a problem when you run out of threads (which for most devs, is likely rare). In Dart/node, it's a problem right away, and more important to understand. You can't afford to be sloppy with blocking when you're only using a single thread! :)
Reply all
Reply to author
Forward
0 new messages