I haven't had this occur, though I might be able to help diagnose it with a bit more info. Which distribution / driver / CUDA version?
And to clarify, you have tried stopping or terminating the instance, then launching again? And you've only periodically gotten this failure? Have you tried launching in a separate availability zone?
If your configuration is working for 3/4, that hints towards a hardware issue, but it might still be drivers, as Dan said. Since the g2.8xlarge instances use 4 separate K520s, I suspect that they're the same K520s that g2.2xlarge instances access individually. You might have one that previously had a separate driver installed via a g2.2xlarge, and was not cleaned properly.