If you're making i2c calls from user space (as it seems you are)
there's the syscall overhead (not 0), any buffering the kernel does
(not 0), scheduling the i2c operation (not 0), doing the i2c operation
(your scope plot), and returning that data. Plus, your getting the
start and end times have non-zero execution time, too.
There also might be measurement error introduced with your start and
stop times based on how low of a unit of time can be measured by your
system.
If you work at the kernel level, there will still be some delays, but
they should be significantly less than when performing user space
operations. In order to remove as much delay as possible, either write
some bare metal code (ie: don't use Linux) or dig into the i2c
subsystem in the kernel and manually tweak it to get it to do what you
want. Also be aware that the scheduler might introduce some delays
still.
If you're looking for bare metal latency (like you'd get on a
microcontroller), that's rather difficult to get in Linux or any other
general purpose OS.
-Andrew