I assume you're asking about the complete latency on the spout.
1. The timer is started when the tuple is emitted from the spout and it is stopped when the tuple is acked. So it measures the time for the entire tuple tree to be completed.
2. There's no "ideal" time. Every topology has different characteristics. For example, in our stream processing topologies we care more about throughput than latency and make use of batching techniques that increase throughput at the cost of worsened latency. In our distributed RPC topologies however, we optimize for latency.
3. The completion time will be affected by: a) the processing latency of the bolts, which you can improve by doing optimization b) the amount of congestion in the topology, which you can reduce by increasing parallelism. So if you add up the processing latencies of the bolts and they don't come close to the complete latency, you should try adding more parallelism.