In this Instructable, you'll learn how to intercept the video, microphone, and controls of the $30 Kaicong SIP1602 wireless pan-tilt camera on Windows, Linux, or OSX! Everything is rolled neatly into python scripts; you can use the output data for things like voice transcription, computer vision, and automated directional control. If you're feeling truly adventurous, keep on reading and you'll learn my methods to discover and reverse engineer wireless cameras!
For anything other than just installing and running the camera code, intermediate-level experience in Python and OpenCV will also be very useful. Let's get to it!
If you like this hack, don't forget to follow us on Instructables, Facebook or Twitter, and check out our other projects on our website!
On the box that contains the camera is Kaicong's motto: "Nothing Important Than Safty". And it shows - they really made the manual secure, because anyone that can't read Mandarin is going to have a pretty hard time understanding it! That said, installation is surprisingly simple.
By default, these cameras are viewable to anyone on the internet who guesses your .kaicong.info address - which can be awesome for projects, but not so awesome for security and privacy. To solve this, you can either change your DDNS username and password, or simply set both of them to blank (thereby making it impossible to access your camera outside of your local network)
Now that we've got the dependencies out of the way, head over to the git repository where this project is hosted, download it, and extract the files. Open up a command window or terminal in the directory with the extracted files, and run each script with the following commands, replacing 192.168.1.19 with the IP address of your camera:
We saved the webpage to disk and looked at monitor.htm. It was there that we found some interesting looking variables, such as PTZ_UP and PTZ_STOP, which appeared to be motion control constants. Keeping that in mind, we opened up the web inspection console (Ctrl+Shift+C in Chrome) and inspected the network traffic while clicking the camera motion buttons. We found several calls to a decoder_control.cgi page with a "command=" argument matching the constants we found earlier in the HTML - one whenever a click begins, and another whenever a click ends. So the controls are ON/OFF and via HTTP GET request? Let's find out!
into the browser and loaded the page, and sure enough the camera began moving! From then it was a matter of throwing the constants and a formattable URL string into Python to complete the controller. Done.
But we wanted something a bit more efficient: the streamed video that the ActiveX object seemed to receive. The ActiveX object itself didn't seem too useful to disassemble (reversing assembly code is way overrated), so instead we opened up Wireshark. We filtered the capture down to the IP of our camera (Capture->Options->Capture Filter) and started the capture, before reloading the ActiveX control page in our browser. What we found were two GET requests for audiostream.cgi and livestream.cgi, presumably for the audio and video.
Putting aside the audio url for now, we turned to Google to see if anyone had decoded an IP camera video stream before. Under a search for "IP camera HTTP stream" we found a handy little python script to get everything running in OpenCV. All it took was replacing the script's URL with ours, and we were in business!
Getting video wasn't too hard. Hopefully audio would be just as easy, right? After a few hours of Google searching, it looked like no one else has ever managed to successfully pull out and decode the audio stream of an IP camera. We were on our own.
So we read into how ADPCM works - apparently it encodes audio via the difference between samples, and caches the previous audio state so that it can add the two and produce a new sample. After a few more python scripts, we managed to capture the packets directly and reset this state at the start of each packet. Clicks were completely removed, and nothing but camera audio remained. Success!
It's awesome to have such a complex device completely controllable via python. We plan on using our camera for person detection and room occupancy tracking as well as spoken voice commands, but we can think of a few other uses for a camera like this one, such as:
This link was sent to me, because I asked them about sending audio to the cam... because it is possible and Foscam Android App CAN do it (it was the first time that I send audio to my camera and hear from it).
I don't suppose you can get the ir lights to shut down? The standard cmd's 94 & 95 should theoretically turn them on/off but this doesn't seem to work on the 1602, any different result when hacked? Also anyone tried the "Sip1601, 1603,1605,1606 multilingual network camera firmware"? =viewthread&tid=40058 Again with the goal to rest control or the IR programmatically..
Yes,but usually if stock is present..you can watch the price rise! One more question before I try to pull the trigger again. When you say 'should work'....MANY MANY of these cameras are identical on the outside regardless of who is selling it or are the manufacturer, and OFTEN tyhe electronics are ALSO identical. see were I'm going with this...?
Does anyone know how this camera compares to the Foscam equivalent? I ask because it looks almost identical, and the UI looks extremely similar. I'm wondering if this camera and the Foscam are really the same under the hood.
Can't say we've tried it ourselves, though there's a good chance it'd work - at most requiring only small changes to the code. Other people are also saying there's a similarly priced Tenvis camera on the market that'll work!
I have two ffmpeg cameras out of five. When I select the either one of the ffmpeg cameras for a full screen view they only show a small icon. All my other cameras that are not ffmpeg show full screen perfect.
Anyone else have this problem?