I've found the source of some confusion. The first bullet in this
message implies that processors have a stride size. That was
incorrect. I probably could have inserted the word "optimal" before
the word "processor" and it would read correctly. The performance of
the different stride sizes should allow you to deduce the cache's line
size. That being said, I have yet to see a graph that clearly shows
me the line size, so such deduction is probably still just an
estimate.
If you're still confused about what the stride size is, you probably
need to study the source code for lab 4 more. A stride of 1 means it
reads all of the elements. A stride of 2 means it skips every other
element. A stride size of 3 means the code skips 2 elements for every
one element it reads.