[Note: I haven't been able to build the firmware yet, so I haven't been able to try my idea. Otherwise, I'd be sharing results. For now, this is an idea offered to the community.]
One of my frustrations with Pixy is that the bounding box for a recognized object is very "shaky". As a result, the camera pan/tilt motors are often hyperactive even if an object is not moving.
The object detection algorithm is very noisy, which is actually okay, but the bounding box should not be so sensitive that stray border pixels affect it so much. The problem is highly dependent on lighting conditions and the tuning of signatures, which is something we've all struggled with.
My motivation comes from my work on the Xbox Kinect. The output of a Kinect is the skeletal joints of a person, and is similarly noisy. However, we know that things like the length of a forearm bone don't change 50 times per second. Thus, we use a lot of weighted averages, center of mass calculations, and so on.
For Pixy pan\tilt, we're ultimately looking at a single point, so all we care about the object bounding box is just the center point. I.e. the center of mass.
Imagine a single frame of a detected object such as the blob shown here:

The equal weighting of stray border pixels causes a bounding box that isn't aligned with the center of mass (shown as a black circle). As these stray pixels are noisy, and fluctuate 50 times per second, the bounding box has a serious case of the jitters.
My idea is twofold:
(1) Average the bounding box with results from the previous n (1 or more) frames.
(2) Weight the pixels towards the center of mass.
The first one is easy, and I always start with the simplest math possible. It can be improved by imposing a limit by how much a bounding box can change in a single frame. That is, put a frequency filter on it.
The second one is easy, too, code-wise. The challenge with the Pixy is that we have a very limited processor, and that's where I need to test.
What I'd actually do from there is change step 1 from averaging the bounding box to averaging the center of mass.
BTW, in case your wondering, YES, this would introduce lag. I can promise you from 16 years making Xbox games, I'm well-versed in lag. Especially with a system like the Kinect, which has 200-300ms of built-in latency. At 50 frames per second (20ms) on the Pixy, we're much faster, though, than those pan tilt servos can even move. In other words, lag is not an issue here. Stability and robustness is.
Cheers,
Mikey Wetzel