This patch does several things and fixes issue 9 (Exclude tracks that
are too similar to the seed tracks) and issue 22 (A track removed
manually should not be re-added by mirage) and various other things.
Thanks to Dominik's recent Scms changes (well, recent since I wrote a
patch for this back in 2007. Using distance calculations for duplicate
detection has a very low risk of false positives.
I'm looking for feedback on this so i'll explain my methodology.
Apologies if this is disjointed as I built this as a draft email over
several days.
I've made a small change to Mirage/Mir.cs to allow SimilarTracks() to
accept a "distance ceiling" float that it can filter against. It keeps
the ignored tracks (with a distance under this ceiling) in a
Dictionary<int, float>.
The Banshee extension now passes this and when you use a first seed
track that has duplicates you will see output if running with --debug
like this:
[Debug 00:57:50.064] Mirage - Considering [7886] "always" on "Wish You
The Best" a duplicate of [7918] "always" on "Perfect Crime" (distance:
0.05957413)
[Debug 00:57:50.065] Mirage - Considering [8010] "always" on "always"
a
duplicate of [7918] "always" on "Perfect Crime" (distance: 0.1150455)
It gets this information by checking Mir's IgnoreList Dictionary, and
they are already gone from the automatic playlist. The tracks are also
added to the skipped Dictionary and will be excluded in future
iterations.
Unfortunately you soon realize that similar songs can have duplicates
themselves. My solution instead of having Mir do this analysis at once
(and taking 10+ seconds to return the first playlist) was to go ahead
and use this initial playlist and spawn a DuplicateFilter helper
thread
that seeds the next tracks into Mir one by one and see if Mir finds
any
duplicates (or more specifically tracks with distances under the
specified ceiling). This happens in the background and any new
duplicates will print the message above and be added to the exclude
list
for future iterations. Unfortunately I can't figure out a proper way
to
remove them from the active playlist as that will appear to require
enhancements to the API in Banshee itself. However the duplicates will
never actually get played since 60% into the first song the new
playlist
iteration will exclude the duplicates and they will disappear. It is
an
annoying visual bug though.
This isn't complete just yet. But if Bertrand or Dominik approve of
this
route I'd like to tweak it some more and add GUI preferences for this.
I've tried to keep the separation between Banshee.Mirage and Mirage,
and
I'd like to build on it and offer "duplicate finder" source view that
will cluster distances. I don't really have a use for that but it
seems
to be a frequent request and I could use the experience.
http://ezri.org/dupefilter.patch
-Wade "kurros" Menard