Video identification with AcoustID

240 views
Skip to first unread message

B. Küstner

unread,
Jan 12, 2021, 1:02:07 PM1/12/21
to acou...@googlegroups.com
Hi all,

this is a longer post, but the gist is short:
  • There is a lack of software for content-based video identification.
  • I think that AcoustID + MusicBrainz could fill this vacuum.
  • Spoiler: I think the audio tracks for videos can and should be used.
That said, I would appreciate if you read on. The rest are my more elaborate thoughts so far on this.

And I very much hope that the idea is found worthwhile to be discussed, that this post creates enough interest in knowledgeable folks to look into this and evaluate the possibilities and obstacles.

Best regards
Björn


The situation
Several video management systems (Plex, Emby, Kodi, Jellyfin, …) exist. They all depend on matching video files with a unique ID (in IMDB, TMDB, …). 
This matching process is – gasp – based on filenames and folder structures
That’s how Plex does it, and that seems to be a de facto standard that most applications at least understand. 
There is nothing better out there. At least nothing widespread.

The result …
There are countless threads advising users to rename their collections of movies and series, also to reorganize them.
Users spend endless hours doing that, also taking the time of forum contributors. 
Users are restricted from keeping movies and series together (e. g. Star Trek). 
There is „helpful“ software out there (e. g. Filebot), but that, too, only uses names to then change filenames.
The matching rate is still nothing worthy of a 21st century software. 
When one collection is up to speed, that effort feeds back to the community with exactly 0%. The next user is again all for him-/herself.
… is painful.

How it should work in general:
The content of video files is fingerprinted, then matched against a central db which maps fingerprints with video-IDs.
This video-ID can be used to pull from IMDB, TMDB and the likes. 
These metadata can then fill the tags in the file or rename files … content-based and with high precision.

Fingerprint video by their audio tracks
I see benefits to use the audio tracks over the video track and little to no downsides to this approach:
- Audio tracks are just as unique
- Easier on resources
- Proven algorithms and software (AcoustID, MusicBrainz)
- Even silent movies have audio tracks
- Language identification
- Video with multiple tracks help enrich the db: track 1 (English) is known but not track 2 (non-English). Since they belong to the same video file, track 2 can be mapped in the central db to the same video file. From then on a video file that has only the non-English track is also identified.

Some things are already there
Basic structures are already in place with AcoustID plus MusicBrainz: 
- acoustic fingerprinting 
- reading multimedia file containers
- a central database
- looking up metadata from sources
- writing metadata to multimedia file containers
- more that I am not seeing
- a community for the above!

Still a good bit of work
- use what exists with AcoustID and MusicBrainz, but set up separate handling for video (phase 1)
- accommodate AcoustID to and MusicBrainz to handle movies, e. g. movies are much longer than music (phase 1)
- video files have different containers like mkv (phase 2)
- video files have different audio formats like DTS and AC3 (phase 2)
- meta data must be pulled from new soources like IMDB, TMDB (phase 1)
- more that I am not seeing

The chicken-egg-problem
… can be solved.
There are lots of video collections (both movies and series) out there, that are lovingly maintained. 
Video management systems like Plex, Emby, Kodi, Jellyfin, … have engaged communities. 
I would expect that enough video fans can be found that would allow their videos to be fingerprinted (by audio) and sent to MusicBrainz (VideoBrainz?) for the initial fill of the central database. 
As noted, there are synergies in videos with multiple audio tracks. 
Maybe subtitles can also be used handily.

Why AcoustID and MusicBrainz
For the same reasons that AcoustID and MusicBrainz have come to life in the first place and flourished since. 
Additionally, in this case there seems no (or no well established) alternative for a video identification service for users or apps. 
Synergies, both technically and in terms of community. 
This could significantly grow the visibility and community around AcoustID and MusicBrainz.

Gfy

unread,
Jan 14, 2021, 11:49:05 AM1/14/21
to acou...@googlegroups.com
Hi Björn,

I'm not familiar with those video management systems but I can present
an easier alternative that'll work well for most.

There already exists an ISDb hash for subtitle lookups. "The ISDb hash
is a quick hash, based on the first and last 64 KiB of a file, used by
opensubtitles.org and tools to quickly find a matching subtitle for a
video file." See http://rescene.wikidot.com/faq#isdb
There is source code available in many languages to calculate that
hash: https://trac.opensubtitles.org/projects/opensubtitles/wiki/HashSourceCodes

So far I know of 2 sites that use them and link to an IMDb identifier.
https://www.srrdb.com/browse/isdbhash:ada3402eca52aed3/1
https://www.opensubtitles.org/en/search/sublanguageid-all/moviehash-ada3402eca52aed3

SrrDB archives meta data of movies by the warez scene. The movie
connection is made by the imdb link in the NFO file. This tends to be
wrong sometimes and is often missing. (At the bottom of the release
page there is a link to opensubtitles.)
OpenSubtitles uses the hash to show matching subtitles. There are
false positives as you can see in the above link. This is why isdb
hash + file size is always used by tools to query them.

Both sites can provide to and benefit from a 3rd party service that
keeps clean links to movie ids. Season and episode ids could be nice.
For srrDB (where I'm involved) it would help fix bad movie info (so
store bad known links too) and missing links. On the other hand they
can provide lots of hashes with folder and file names.
For opensubtitles it would provide users with better matches when no
subtitle is uploaded with the hash yet. (not sure whether they siphon
in srrdb data now)

On https://www.srrdb.com/open there are data dumps to get started.
Except for some manual fixes and adding of imdb IDs, most is
automated.

Why I think it'll work well:
- most people have the same files as one of the dozen rips out there.
Even if someone owns the bluray, it's often easier to get a rip from
the web to put on their NAS.
- very fast hash to calculate. a web service and db doesn't have to be
very optimized at the beginning to be usable. AcoustID for example
uses a custom PostgreSQL module.
- can start independently with motivated people; no waiting on
experts/external projects (do one thing and do it well, then go from
there)

People making their own rips will still have to be sure to put
everything in the correct folders. This gives potentially many hashes
that will never be queried again. Maybe some others can chime in on
the feasibility of AcoustID for this.

So this is how I would tackle the problem :)

Cheers
Gfy

N N

unread,
Jan 14, 2021, 12:05:22 PM1/14/21
to AcoustID
I started testing on what might already work.
I checked with two mp4 videos from different sources. 

The first was an actual movie Ray (2004) (weighing it at 2h:19m).
I expected Picard to choke on the size. But Picard opened the file just fine. 
It showed meta data in the file and the technical details for the audio track. 
Picard allowed to analyze the file. No match (of course), but also no error message.
That is the goal, that in the future the movie would be recognized (by the audio track). 

The second was a music video from YouTube: Billie Eilish - Therefore I Am 
Because it was shorter, I expected that this would be easier, and that it might give me a title match. 
But Picard would not open this file, not from the dialog box and not via drag and drop. 
As a non-expert the only difference I could see was by opening both videos in an editor. 
The first started with a header typmp42 mp42mp41 mƒ‘moov flmvhd …
The second started with a header ftypisom isomiso2avc1mp41 free 4 …

Summary: Like a proof of concept, the first test shows that AcoustID + MusicBrainz + Picard are already in a good starting position to identify not just music but also video (by the audio track).

N N

unread,
Jan 14, 2021, 1:30:22 PM1/14/21
to AcoustID

Hi Gfy

thank for looking into this.

> There already exists an ISDb hash for subtitle lookups. "The ISDb hash 
> is a quick hash, based on the first and last 64 KiB of a file

I was not aware of that, probably because I am not much into ripping. My movie collection is mostly from digital recordings of broadcasted content. 

I do not think a hash is a good solution for the task: A hash is in some way the opposite of a fingerprint. 
A hash should be sensitive and change if only one bit of the source changes.
Whereas a fingerprint should be robust and not change for the same content, even if the bits differ a lot.

That said, the bits of the same movie can differ for many reasons. 
Off the top of my head due to different leads and trails, localized credits, encoding settings, edits, different masters, applied filters, commercial cuts, station watermarks.
Sure, creating many hashes for all these combinations would be possible. But many of them will be unique and not re-used. And others will be missing.
It is different with fingerprint: One fingerprint should be enough for identification. 

> This tends to be wrong sometimes and is often missing. 
> […] There are false positives 
> […] be sure to put everything in the correct folders. 
> This gives potentially many hashes that will never be queried again.

Point in case. 😉

But more succinctly: There's a reason why AcoustID + MusicBrainz uses fingerprints and not hashes. 

> how I would tackle the problem :) 

Just to be clear: My post is not about my specific problem. 
It is about closing a software gap with open-source software as a community effort.

I think it is time for the kind of software. I was actually surprised that it does not already exist. 
I see AcoustID + MusicBrainz in a good position, technically and community-wise.

> do one thing and do it well

This! 

It would be cool if AcoustID + MusicBrainz + Picard does not care, whether it is dealing with an audio file, a music video or any kind of video. It just grabs the audio, does the fingerprinting and matching, grabs data from appropriate sources to fill the tags and / or rename files.

Technically the audio track in a video file is not much different from a music file, and sometimes not different at all.
As demonstrated with a little test I ran this afternoon and posted in this thread.
The good thing is that Picard did not choke on the over 2h video. It read the audio track just fine.

The difference between audio files, and audio tracks in {music) videos  more in the semantics. 
For example, videos care more about the "year" tag and less about the "album artist" tag.
But technically that is not much of a difference. 
Different behaviour to cover different semantics can be handled by settings, that match common usage scenarios.
Like the difference between ripping a CD or dealing with a set of individual music titles.

I hope that was clearer.

Cheers
Björn

N N

unread,
Jan 27, 2021, 3:40:52 AM1/27/21
to AcoustID
Hi all,

anybody in this group familiar enough with Chromaprint and fpcalc to weigh in on technical issues?

Like, are there technical issues where chromaprint or acoustid need work to handle movie-length audio tracks?

My tests were encouraging, but still a mixed bag that is difficult to interpret. 

Any feedback welcome. Thanks in advance. 

Ben Franske

unread,
Jan 28, 2021, 12:20:59 AM1/28/21
to acou...@googlegroups.com

It's been a long time; but, if I recall correctly, AcoustID only uses the first 25-40 seconds of audio for matching. I think that would be the major issue in trying to match movie audio tracks. I think you'd get a lot of collisions in the first 25-40 seconds of audio (which is mostly studio/production company logos, etc. and is therefore not particularly unique.

Someone correct me if I'm wrong.

-Ben

--
You received this message because you are subscribed to the Google Groups "AcoustID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acoustid+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/acoustid/4af601aa-bf94-410d-afc6-796985a9ec74n%40googlegroups.com.

Lukáš Lalinský

unread,
Jan 28, 2021, 12:47:24 AM1/28/21
to Acoustid
So, the technology behind AcoustID is fully capable of identifying audio tracks from movies.

However, the public side of AcoustID and how MusicBrainz uses AcoustID is not compatible with such functionality. How AcoustID + MusicBrainz works for music:

1. Fingerprints in the AcoustID database are limited to the first two minutes of the music track
2. Only audio around 0:15 and 0:30 is used to search for the track in the database
3. Once candidates are found using the audio from 0:15-0:30, they are compared to the 2 minute fingerprint in the database to verify correct match

This approach was chosen, because it mostly works for music and it was affordable to host a service with restrictions like this. Full audio fingerprinting solution would certainly be out of reach, financially, back when I started working on AcoustID (I had to pay all the hosting costs myself for the first few years).

For movie audio fingerprints, you need a different approach. It's easily possible that the first few minutes are intros, which will be identical to many other videos. If I was designing a system for identifying movies, it would work like this:

1. Index the full audio track from the movie
2. When searching, select 30 second samples from three random positions in the movie, avoiding the first and last 10 minutes, if possible.
3. Based on the 3 samples, search the database for fingerprint matches. If all three agree on a movie, consider it the matching movie.

I have an open source system that can be used for building a system like this:


But I provide no support for this, it was really just an experiment.

I'd like to develop a full audio fingerprinting solution and make it easily accessible to developers, but that's been my goal for years now and I was never motivated enough to finish it, so I'm not sure when that will happen. :(

Lukas



--

"B. Küstner"

unread,
Jan 29, 2021, 7:12:48 AM1/29/21
to acou...@googlegroups.com
Thanks Ben, Lukáš, for the helpful responses. 

the technology behind AcoustID is fully capable of identifying audio tracks from movies.

Good to have this confirmed. 

However, the public side of AcoustID and how MusicBrainz uses AcoustID is not compatible with such functionality. […] This approach was chosen, because it mostly works for music 

Just to be clear, because „approach“ can mean a very fundamental, architectural thing.

But here it is really „just“ about choosing the right parameters for music or for audio tracks in video respectively. As you suggest:


If I was designing a system for identifying movies, it would work like this:

1. Index the full audio track from the movie
2. When searching, select 30 second samples from three random positions in the movie, avoiding the first and last 10 minutes, if possible.
3. Based on the 3 samples, search the database for fingerprint matches. If all three agree on a movie, consider it the matching movie.

This seems like a relatively easy thing. 

„relatively easy“ when compared to re-design something from the bottom.
„relatively easy“ ≠ I could do it myself. 😅

I'd like to develop a full audio fingerprinting solution and make it easily accessible to developers, but that's been my goal for years now and I was never motivated enough to finish it, so I'm not sure when that will happen. :(

What do you mean by „full audio fingerprinting solution“? 
Do you mean that like „usable for audio fingerprinting of video“ as discussed here?

As noted in the initial post, I think there is a vacuum for such a software. 
And I see several reasons to make it an open source community project. 

I was thinking about what is needed to make it happen. 
Thanks to chromaprint, acoustid, MusicBrainz, there is already a lot available.

Suggestion for a very basic plan to start with: 
1) Build a proof of concept, then 2) a minimum viable product. 

Proof of concept
  • Fingerprings can be uploaded, stored, matched for individual movies (as opposed to music tracks or batch uploads)
  • Private or local server
  • Playground to tune parameters for good recognition
  • No consideration for user experience
Minimum viable product
  • Works with AAC in MP4 (common video format and also used for music)
  • Public server
  • User app, that can <add user stories here>
What’s needed
  • Developer(s), tester
  • Tools: code => github, extend existing code base. communication => github or this group or …?
  • Funding for <add stuff that costs money, like public server, domain, pizza, …>
Unfortunately I cannot fill a developer role. (Really, I should not … and you would immediately agree, if you had seen the code, I have written a looong time ago. 😬)

But I can can contribute 
as tester with everything that goes along with that, like trying over and over again,
with funding when the project gets to that point,
with communication, like reaching out to other relevant communities.

@Lukáš, what might help to kindle the motivation, that you were missing before? 🙂
Also, would it be worthwhile to reach out via the Github project to maybe / hopefully find more support?

Björn

Reply all
Reply to author
Forward
0 new messages