I want to build a platform which uses an iOS app to show a "second screen" for a particular television show. The app involves displaying different information at different times throughout the show. The app experience must be passive, in that I'd like the user to just hit "start", and it syncs up automatically, no pausing needed if the video is paused.
Perhaps I'm crazy but it looks like an Echonest server might work. What I'm specifically curious about is how to fingerprint a television episode. My goal is to be able to launch the app at any given point in any episode, and it figures out which episode you are watching (nice to have), and precisely where in that episode you are, with about an accuracy of one second or so (required).
I'm new to audio fingerprinting, so
1) should my echonest server fingerprint the entire episode as one "song"? Any reason to split it up?
2) when my echonest client does a search and finds a match, am I told "where" it matched (eg how far into the "song" / episode)? If not, is there some other means to figure this out?
Any help would be appreciated! Thank you,
Richard