With performance tests, typically measurements should done specifically for the action(s) that need to be measured. The problem with starting and stopping a stopwatch in the hooks is that it may also include the measurement of any setup and teardown that the code may be doing or the overhead of SpecFlow itself (or any other extraneous activities happening within the context of the test that may skew numbers). In addition, it's very important for performance tests to measure the scenario a set number of iterations repeatedly, with the final measure reported as some sort of aggregate of the sample (e.g. taking an average of all the samples and throwing out outliers, reporting stddev, etc). Finally, you may want to consider having the notion of a 'warmup' where the action under test is done a small number of times to make sure the system is warmed with things properly cached. I've had some experience in trying to use existing integration tests for performance and it always resulted in poor measurements. Integration tests often do a lot more than what you really want to measure. Perf tests should be targeted specifically towards what you want to measure only and reduce extraneous overhead as much as possible.
That being said, I did do some small benchmark tests using SpecFlow for some APIs that I'm testing. This worked out ok as a quick and dirty micro benchmark and suited my purpose. I called my API in my When, and structured it something like "When I call MyTestAPIThingy (.*) times" where the parameter allowed me to change the number of iterations (or 'samplesToTake' in the below code). In the When's stepdefinition, I created a stopwatch and wrapped the start/stop around the API. For example:
List<long> perfSamples = new List<long>();
for(int i = 0; i < samplesToTake; i++)
{
long elapsedTime;
Stopwatch watch = new Stopwatch();
watch.Start();
responses = MyTestAPIThingy();
watch.Stop();
Trace.WriteLine(" Elapsed time: " + watch.ElapsedMilliseconds);
perfSamples.Add(watch.ElapsedMilliseconds);
}
You can then take the perfSamples list, put it in the ScenarioContext.Current dictionary, and then retrieve it in the scenario's "Then" to do your math and assert that it's less than whatever your goal is for this scenario.
For something that is not an API level test, you could do something similar, but try to wrap the stopwatch around the specific action. In your example above, consider changing your When to something like, "When I copy and paste Element1 20 times". In the step definition, wrap the actual copy and paste action with the stopwatch to take measurements x number times. And your Then may be something like, "Then New Element2 should be created on average in less than 5 seconds". In the stepdef, retrieve the perfSamples from the ScenarioContext.Current, and do the necessary math for the assertion. Remember, if you decide not to do a warmup, then you may need to do some extra math on your perf samples to throw out any outliers, or exclude the first x number of samples when calculating average.
Thanks,
Nithin