Object detection is still a challenging problem. There are two basic approaches:
1) Process images on the device - probably with some pre-filter ie looking for cat sized differences in the current images to the long term average, followed by machine learning based scoring.
2) Send images to an image recognition API for processing (will still require on device/your own server filtering as you can't just stream video to an API).
The Rasberry pi 3 can run run basic filtering algorithms at reasonable frame rates (ie ~10 frames per second). Most machine learning algorithms will not run at real time - hence need for a pre-filter. You do not have to train the machine learning models on the Pi - it is better to use a pre-trained network. There are various one around online or you can train your model on a more powerful computer/AWS and then stick the trained model on the Pi. However, getting something to even roughly work will probably take months of mucking around - and even then it will not be 100% accurate. Although there has been huge progress in ML for object detection - there's still a pretty steep learning curve to get these models working well (unless your application can be solved by just calling one of the existing APIs).
I'd be up to run the OpenCV workshop again if there is enough interest probably sometime after Christmas though. I'm currently in the midst of submitting my thesis.
cheers,
Finn