Thanks for posting the question. Our website is needing an update, and we have been very busy recently, but have hired new staff so should be able to update it soon.
To try to answer your question there are two types of remote testing. Moderated and UnModerated. What approach you take depends on your research question.
Loop11 and Webnographer falls into the UnModerated category. WhatUsersDo fits kind of half way between the two. The Moderated, and WhatUsersDo approach means that you have to go through each response to see where the issues are appearing.
The other approach is to use metrics. Loop 11 collects metrics. Time on Task, pages visited etc. Where Webnographer differs in that it also collects the metrics, then it uses statistical methods to uncover where the issues lie. We are more expensive than Loop11, but you will get lots of hand holding, and metrics that we believe that you can take actions on.
Where we pride ourselves is helping you identify what are the issues that people are/will experience, and the root cause. At the moment we have experimental playback, but you would not want to sit through 50 to 1000 user sessions. The purpose of the Webnographer playback function is to be able to show a client where somebody is falling over, and not for discovery where the issues lies.
If you mean by secure, that the site been tested needs to be kept off the Internet (Alpha Release/Intranet) or that the user has to log in, we often do those tests, but requires more set up time or our side.
In answer to David Jarvis´s interesting points, I will answer then in relationship to Webnographer.
With Webnographer we are trying to model peoples real behaviour on the web.
What we have found triangulating between web stats, Webnographer, and lab testing is that participants spend about 2 to 4 times longer in the lab than in either Webnographer, or on the real site. Depending on the number of participants Webnographer will have a margin of error from +/-4% to +/-15%. The more time people spend on a task the higher the task completion rate is. Reduce task times, and number of errors increase. We can counter the lack of time one participant spends by having more participants. We normally aim for between 50 to 1000 participants.