Ryan - Those look like nice tools, and I'm definitely a fan of projects that are trying to standardize on/move to the HAR format. For single waterfall charts I think your second project is going to be the way things go in the future - convert to HAR and import into your favorite tool (HttpWatch, Firebug, etc.). Too bad about the lack of aggregation, but oh well.
Andy - Thanks for the info, "fabulous and frustrating" is probably the same way I would describe Coradiant. All in all it's a pretty great tool and it gives a high level of visibility into what's happening on the server side. You can look at percentiles, filter to your heart's desire, set SLA's, monitor very specific sections of traffic, etc. The UI can be a bit clunky, but once you get used to it there are a lot of options. They also have a "render time" measurement that we are beta testing which is essentially supposed to capture the time between the delivery of the HTML and the onload event. In our experience this hasn't been too accurate though, and depends fairly heavily on server side performance (for example if a server side event bumps host time up by 1 second for a small window we see the same bump in render time).
Our main complaints with Coradiant are as follows:
- It can take a really long time to drill down into requests. When you stop looking at high level aggregated statistics and want to look at a group of individual requests (say all image requests taking over X seconds) the report takes multiple minutes to run, which isn't bad if you do this once in a while but definitely gets frustrating if you want to look at a lot of reports quickly.
- Coradiant is great at letting you know when things slow down, but not great at letting you know why. This is not really a problem with the tool, it has no idea why a given packet (or set of packets) took longer to arrive, but this is the benefit that a product like DynaTrace provides. When you are in the middle of an incident Coradiant gives you very limited information about what is causing the problem (depending on the nature of the incident). It can report on client IP, resource size, user agent, etc. but once you start drilling into that kind of data things slow down quite a bit which makes it tough to diagnose an ongoing issue.
- Coradiant is fairly expensive, and as we move to multiple data centers and our traffic keeps increasing it is only going to get more pricey.
I'm starting to feel like a lot of the larger companies probably just build something in house to monitor their performance. For client side performance I really like the idea of Boomerang, but haven't gotten the chance to actually implement it on our sites yet. The server side is really where we want a good, scalable tool. Like I said above Coradiant has been really great for us so far, we are just getting to the size where it makes sense to see what other people are doing to see if we can build something in house or if there are other viable options out there.
Thanks for all the input,
Jonathan