Automatic Collection Scanning / Resuming Scanning

RustyNail

unread,

Aug 28, 2008, 2:52:40 PM8/28/08

to Mirage - Automatic Playlist Generation

When Mirage is scanning you collection, if it gets interrupted (by
you, Banshee crashes, or you close Banshee, etc, ) You have to click
"Rescan he music collection", which SEEMS to start the whole process
from the beginning (with it starting at 1% and everything... :-/ ) But
you can use already scanned files, and they don't seem to be
rescanned...

So my question: Does Mirage start scanning your collection from the
beginning when you click 'Rescan the music collection'?
And would it be possible to have Mirage resume scanning after being
stopped (hopefully automatically, unless manually stopped...)

Bertrand Lorentz

unread,

Aug 28, 2008, 3:36:36 PM8/28/08

to mirag...@googlegroups.com

Mirage keeps track of which tracks were scanned, and stores this info in
the banshee DB. When you click 'Rescan the music collection', it only
scans the tracks that are not already scanned. The percentage shown is
relative to the number of tracks to be scanned, not to all tracks.
It might be a good idea to show the progress as the global percentage of
tracks scanned in the library

The "Reset..." menu item makes Mirage forget which tracks were scanned.

Resume scanning would be possible, but we need to be careful : if
scanning a track crashes banshee (shouldn't happen, but you never
know..), we don't want to be stuck in a "crash-restart-crash" loop,
making banshee unusable.

Thank you for your input !

--
Bertrand Lorentz <bertrand...@gmail.com>
> http://flickr.com/photos/bl8/ <

signature.asc

RustyNail

unread,

Aug 29, 2008, 3:24:55 AM8/29/08

to Mirage - Automatic Playlist Generation

On Aug 28, 11:36 pm, Bertrand Lorentz <bertrand.lore...@gmail.com>
wrote:

> Mirage keeps track of which tracks were scanned, and stores this info in
> the banshee DB. When you click 'Rescan the music collection', it only
> scans the tracks that are not already scanned. The percentage shown is
> relative to the number of tracks to be scanned, not to all tracks.
> It might be a good idea to show the progress as the global percentage of
> tracks scanned in the library
>
> The "Reset..." menu item makes Mirage forget which tracks were scanned.
>
> Resume scanning would be possible, but we need to be careful : if
> scanning a track crashes banshee (shouldn't happen, but you never
> know..), we don't want to be stuck in a "crash-restart-crash" loop,
> making banshee unusable.
>
> Thank you for your input !

AH. Thanks for clearing that up.

Yeah, showing the percentage in terms of the whole library would be
more intuitive.
But I think that 'Rescan the music collection' kind of implies that
it's starting over,
you might think of changing it to something like "Continue scanning
the collection" or "Resume..."
But then again, As I was thinking about it, it seems to be hard to
think of a proper name for it :-/

As for the auto-resume... maybe, when exiting cleanly, mirage could
flip some "exited cleanly" variable,
and then when starting could check against it....
By the way, if you make an auto resume feature, you could also think
about making a low-priority scan setting, or is that impossible?
It's just that while scanning my 2500+ song collection, my computer
becomes verry unresponsive, especially on
long songs (X JAPAN), or Led Zeppelin. On those 2, it completely
freezes for about 5-10 minutes, and then continues on with the next
song.

But once you get past the scanning, the playlist generation is very
nice, though, as was said in the discussion bellow, a bit confusing at
first.
BTW, would there be any way to check for duplicate songs? (Same songs
from different albums)

eric casteleijn

unread,

Aug 29, 2008, 4:28:25 AM8/29/08

to mirag...@googlegroups.com

> But once you get past the scanning, the playlist generation is very
> nice, though, as was said in the discussion bellow, a bit confusing at
> first.
> BTW, would there be any way to check for duplicate songs? (Same songs
> from different albums)

To butt in here, possibly inappropriately: I have written a python
plugin (originally for Quod Libet, now cross player) that does
similarity lookups on last.fm, and can use tags to look up similar
songs. I would love to add acoustic similarity lookups as well, and I've
been looking at mirage for this.

Where this touches on your question: the plugin currently has the smarts
to not play the same song or tracks by the same artists for a
configurable duration, and subject to further restrictions or
relaxations, depending on what kind of advanced queries the player supports.

If you guys think it would be interesting, I would be happy to port my
plugin to banshee (Is there a python API? I might need some help
otherwise ;) and I would very much like to add the option of acoustic
similarity, so I'm wondering: how deeply is mirage tied to Banshee? I
understand that the similarity scores are stored in the banshee
database, but I would be happy to invest some time to make that
configurable, so that other players (and my plugin ;) could take
advantage of this great feature. I don't know if that is feasible, but I
see no reason in theory that it wouldn't be.

Anyhoo, the plugin's located here:

http://code.google.com/p/autoqueue/

Currently supported are Quod Libet (fully featured, fully configurable)
Rhythmbox (some features missing, default configuration, but fully
working) and there are plans for supporting Pytone, Itunes, mpd and xmms.

Sorry if this is a bit OT but I'm looking to exchange ideas, and
hopefully even make our stuff work together without reinventing too many
wheels.

--
- eric casteleijn
http://thisfred.blogspot.com

signature.asc

Bertrand Lorentz

unread,

Aug 29, 2008, 7:08:48 AM8/29/08

to mirag...@googlegroups.com

On Fri, 2008-08-29 at 00:24 -0700, RustyNail wrote:
> On Aug 28, 11:36 pm, Bertrand Lorentz <bertrand.lore...@gmail.com>
> wrote:
> > Mirage keeps track of which tracks were scanned, and stores this info in
> > the banshee DB. When you click 'Rescan the music collection', it only
> > scans the tracks that are not already scanned. The percentage shown is
> > relative to the number of tracks to be scanned, not to all tracks.
> > It might be a good idea to show the progress as the global percentage of
> > tracks scanned in the library
> >
> > The "Reset..." menu item makes Mirage forget which tracks were scanned.
> >
> > Resume scanning would be possible, but we need to be careful : if
> > scanning a track crashes banshee (shouldn't happen, but you never
> > know..), we don't want to be stuck in a "crash-restart-crash" loop,
> > making banshee unusable.
> >
> > Thank you for your input !
>
> AH. Thanks for clearing that up.
>
> Yeah, showing the percentage in terms of the whole library would be
> more intuitive.
> But I think that 'Rescan the music collection' kind of implies that
> it's starting over,
> you might think of changing it to something like "Continue scanning
> the collection" or "Resume..."
> But then again, As I was thinking about it, it seems to be hard to
> think of a proper name for it :-/

Yes, I think we have a terminology confusion here. I think "scan" should
be used for "find which tracks should be analyzed", and "analyze" should
be used for "do the math that allows to determine music similarity".
I'm also guilty of this confusion.

> As for the auto-resume... maybe, when exiting cleanly, mirage could
> flip some "exited cleanly" variable,
> and then when starting could check against it....
> By the way, if you make an auto resume feature, you could also think
> about making a low-priority scan setting, or is that impossible?

The thread that does the analysis is already set to the lowest priority
(ThreadPriority.Lowest)

> It's just that while scanning my 2500+ song collection, my computer
> becomes verry unresponsive, especially on
> long songs (X JAPAN), or Led Zeppelin. On those 2, it completely
> freezes for about 5-10 minutes, and then continues on with the next
> song.

I hope you meant 5 or 10 seconds, because 5 or 10 minutes is way too
long to analyze a track. It usually takes less than 10 seconds for a
"normal" track.
Could you chack your CPU usage while mirage is analyzing those tracks ?

> But once you get past the scanning, the playlist generation is very
> nice, though, as was said in the discussion bellow, a bit confusing at
> first.
> BTW, would there be any way to check for duplicate songs? (Same songs
> from different albums)

This has aleady been suggested. I think it's a good idea, I'll try to
look into it.

By the way, for those who don't watch the website, I'd like to mention
that you can report bugs or suggest features on the Issue tracker :
http://code.google.com/p/banshee-unofficial-plugins/issues/list

signature.asc

Bertrand Lorentz

unread,

Aug 29, 2008, 8:11:36 AM8/29/08

to mirag...@googlegroups.com

On Fri, 2008-08-29 at 10:28 +0200, eric casteleijn wrote:
> > But once you get past the scanning, the playlist generation is very
> > nice, though, as was said in the discussion bellow, a bit confusing at
> > first.
> > BTW, would there be any way to check for duplicate songs? (Same songs
> > from different albums)
>
> To butt in here, possibly inappropriately: I have written a python
> plugin (originally for Quod Libet, now cross player) that does
> similarity lookups on last.fm, and can use tags to look up similar
> songs. I would love to add acoustic similarity lookups as well, and I've
> been looking at mirage for this.
>
> Where this touches on your question: the plugin currently has the smarts
> to not play the same song or tracks by the same artists for a
> configurable duration, and subject to further restrictions or
> relaxations, depending on what kind of advanced queries the player supports.

Seems interesting. Sadly, python syntax confuses me ;)

> If you guys think it would be interesting, I would be happy to port my
> plugin to banshee (Is there a python API? I might need some help
> otherwise ;) and I would very much like to add the option of acoustic
> similarity, so I'm wondering: how deeply is mirage tied to Banshee? I
> understand that the similarity scores are stored in the banshee
> database, but I would be happy to invest some time to make that
> configurable, so that other players (and my plugin ;) could take
> advantage of this great feature. I don't know if that is feasible, but I
> see no reason in theory that it wouldn't be.

Banshee doesn't have a python API. Banshee extensions can be implemented
in .NET/Mono. Banshee also provides a dbus interface, and I think there
are plans to offer access to the banshee DB through dbus. but this would
be better discussed on the banshee mailing-list or on #banshee.

Only a part of Mirage is tied to banshee, see this thread for more
info :
http://groups.google.com/group/mirage-list/browse_thread/thread/43405ec6b315ef15/2c70d893f5c5f20d

The similarity data is stored in a separate sqlite database, along with
the TrackId of the track, to be able to associate it to the track in the
banshee DB.

So I don't see anything that could prevent you from doing what you want.

Have fun,

signature.asc

eric casteleijn

unread,

Aug 29, 2008, 8:27:50 AM8/29/08

to mirag...@googlegroups.com

> Only a part of Mirage is tied to banshee, see this thread for more
> info :
> http://groups.google.com/group/mirage-list/browse_thread/thread/43405ec6b315ef15/2c70d893f5c5f20d
>
> The similarity data is stored in a separate sqlite database, along with
> the TrackId of the track, to be able to associate it to the track in the
> banshee DB.
>
> So I don't see anything that could prevent you from doing what you want.

Cool, that is very exciting news. I will notify this list as far as it
is relevant to mirage, and might ask some stupid questions here in the
process. ;)

signature.asc

RustyNail

unread,

Aug 29, 2008, 1:07:44 PM8/29/08

to Mirage - Automatic Playlist Generation

On Aug 29, 3:08 pm, Bertrand Lorentz <bertrand.lore...@gmail.com>
wrote:

> > It's just that while scanning my 2500+ song collection, my computer
> > becomes verry unresponsive, especially on
> > long songs (X JAPAN), or Led Zeppelin. On those 2, it completely
> > freezes for about 5-10 minutes, and then continues on with the next
> > song.
>
> I hope you meant 5 or 10 seconds, because 5 or 10 minutes is way too
> long to analyze a track. It usually takes less than 10 seconds for a
> "normal" track.
> Could you chack your CPU usage while mirage is analyzing those tracks ?

It's 5-10 Minutes... Hmm. But when I stopped it, and restarted it
(after running through the night, and comepletely freezing the
computer by morning) I am noticing lag, CPU usage is stuck at 100% (by
banshee-1 ~70%, pulseaudio ~15%, firefox ~10%, and multiload-applet
~3%, 2% for everything else), but the computer's usable, with me using
Banshee to play music (though songs lag a bit in the beginning), and
writing this in firefox...
Is there any chance of the adding up of memory leaks during prolonged
use? 'Cause my RAM is constantly filling up bit by bit.
So maybe it's not a CPU problem, but a RAM problem?

Now, I don't know how banshee stores its track data, but if you know
the trackid, shouldn't you be able to find out the song name?
And if you could, you could make Mirage let you set how often you want
Tracks, Albums, Artists to be repeated... How diverse should the
generated playlist be... and so forth.
It would be nice to be able to configure some of Mirage's playlist-
generating. =)

Bertrand Lorentz

unread,

Sep 2, 2008, 3:05:23 PM9/2/08

to mirag...@googlegroups.com

On Fri, 2008-08-29 at 10:07 -0700, RustyNail wrote:
> On Aug 29, 3:08 pm, Bertrand Lorentz <bertrand.lore...@gmail.com>
> wrote:
> > > It's just that while scanning my 2500+ song collection, my computer
> > > becomes verry unresponsive, especially on
> > > long songs (X JAPAN), or Led Zeppelin. On those 2, it completely
> > > freezes for about 5-10 minutes, and then continues on with the next
> > > song.
> >
> > I hope you meant 5 or 10 seconds, because 5 or 10 minutes is way too
> > long to analyze a track. It usually takes less than 10 seconds for a
> > "normal" track.
> > Could you chack your CPU usage while mirage is analyzing those tracks ?
>
> It's 5-10 Minutes... Hmm. But when I stopped it, and restarted it
> (after running through the night, and comepletely freezing the
> computer by morning) I am noticing lag, CPU usage is stuck at 100% (by
> banshee-1 ~70%, pulseaudio ~15%, firefox ~10%, and multiload-applet
> ~3%, 2% for everything else), but the computer's usable, with me using
> Banshee to play music (though songs lag a bit in the beginning), and
> writing this in firefox...
> Is there any chance of the adding up of memory leaks during prolonged
> use? 'Cause my RAM is constantly filling up bit by bit.
> So maybe it's not a CPU problem, but a RAM problem?

We already had reports of high memory usage during the analysis and also
during playlist generation, so that might indeed be the problem.
I haven't tracked it down yet, so any help with that is most welcome !

> Now, I don't know how banshee stores its track data, but if you know
> the trackid, shouldn't you be able to find out the song name?
> And if you could, you could make Mirage let you set how often you want
> Tracks, Albums, Artists to be repeated... How diverse should the
> generated playlist be... and so forth.
> It would be nice to be able to configure some of Mirage's playlist-
> generating. =)

With the TrackId, you can get all the info banshee has about the track.
So what you're suggesting is definitely possible. But i wouldn't want
mirage to have too many configuration options.
"Patches Welcome !" ;)

Cheers,

signature.asc

RustyNail

unread,

Sep 3, 2008, 3:48:38 PM9/3/08

to Mirage - Automatic Playlist Generation

> With the TrackId, you can get all the info banshee has about the track.
> So what you're suggesting is definitely possible. But i wouldn't want
> mirage to have too many configuration options.
> "Patches Welcome !" ;)

Grrr. It seems that if you want something done, you have to do it
yourself now-a days >D
I might try to do some digging with the source code... :-/
I would only need to screw up the Banshee part, right?

Oh, and another question that appeared recently:
A few days ago, I managed to kill my Ubuntu install (i'm getting good
at it :D ),
and had to re-install. After doing that, I happily added all my
favorite repos,
and had about 1Gig of packages download and install. Banshee & Mirage
were among them.
But when I opened Banshee, it DIDN'T see the Mirage extension... And
no matter how I fiddled with it,
it just refused to see it. I currently have no idea on what happened.
If it helps (don't know how), Banshee's list of Radio stations was
also empty, with some giberish styles listed... Everything else works
perfectly. Any Ideas?

Bertrand Lorentz

unread,

Sep 3, 2008, 4:51:15 PM9/3/08

to mirag...@googlegroups.com

On Wed, 2008-09-03 at 12:48 -0700, RustyNail wrote:
> > With the TrackId, you can get all the info banshee has about the track.
> > So what you're suggesting is definitely possible. But i wouldn't want
> > mirage to have too many configuration options.
> > "Patches Welcome !" ;)
>
> Grrr. It seems that if you want something done, you have to do it
> yourself now-a days >D
> I might try to do some digging with the source code... :-/
> I would only need to screw up the Banshee part, right?

Yes, the other parts are for the audio analysis and similarity
calculation.

> Oh, and another question that appeared recently:
> A few days ago, I managed to kill my Ubuntu install (i'm getting good
> at it :D ),
> and had to re-install. After doing that, I happily added all my
> favorite repos,
> and had about 1Gig of packages download and install. Banshee & Mirage
> were among them.
> But when I opened Banshee, it DIDN'T see the Mirage extension... And
> no matter how I fiddled with it,
> it just refused to see it. I currently have no idea on what happened.
> If it helps (don't know how), Banshee's list of Radio stations was
> also empty, with some giberish styles listed... Everything else works
> perfectly. Any Ideas?

Did you do a clean install, or did you keep the content of you home
directory ? ~/.config/banshee-1 in particular.
It might be a packaging issue. I'll try to get the packager to look into
it.

signature.asc

RustyNail

unread,

Sep 6, 2008, 5:45:43 AM9/6/08

to Mirage - Automatic Playlist Generation

> Did you do a clean install, or did you keep the content of you home
> directory ? ~/.config/banshee-1 in particular.
> It might be a packaging issue. I'll try to get the packager to look into
> it.
>

I did a clean install.
... Though I'll try purging banshee-1 and trying again...
It just that the lack of the standard radio stations & it's crash rate
are giving me ideas... I might have a screwed up banshee-1 install.

eric casteleijn

unread,

Sep 8, 2008, 10:25:41 AM9/8/08

to mirag...@googlegroups.com

Bertrand Lorentz wrote:
> On Fri, 2008-08-29 at 10:28 +0200, eric casteleijn wrote:
>>> But once you get past the scanning, the playlist generation is very
>>> nice, though, as was said in the discussion bellow, a bit confusing at
>>> first.
>>> BTW, would there be any way to check for duplicate songs? (Same songs
>>> from different albums)
>> To butt in here, possibly inappropriately: I have written a python
>> plugin (originally for Quod Libet, now cross player) that does
>> similarity lookups on last.fm, and can use tags to look up similar
>> songs. I would love to add acoustic similarity lookups as well, and I've
>> been looking at mirage for this.
>>
>> Where this touches on your question: the plugin currently has the smarts
>> to not play the same song or tracks by the same artists for a
>> configurable duration, and subject to further restrictions or
>> relaxations, depending on what kind of advanced queries the player supports.
>
> Seems interesting. Sadly, python syntax confuses me ;)

Really? Must be because it's so close to C#, I guess. I found I could
read C# quite easily, and had to look up very little syntax when going
through mirage. In fact, I got quite a ways toward reimplementing the C#
parts in python over the weekend, mainly because I didn't see a good way
to interface python code with C# without going to something like
IronPython, but also just to get a better idea of what the code does.

Using ctypes for interfacing with libmirageaudio, and numpy for the
matrix calculations, I expect the execution speed not to be that much
worse, but that's for later.

Should I get a working version, would you be at all interested in having
it in mirage itself? I understand, of course, if it's not your first
concern, but it would make it easier to integrate mirage into players
that have a python plugin API.

Bertrand Lorentz

unread,

Sep 10, 2008, 4:45:25 PM9/10/08

to mirag...@googlegroups.com

If get a working version, I think we'll have to spin off libmirageaudio
as a separate package. Both your python stuff and the mirage C# stuff
would then depend on it.
I don't know if libmirageaudio is ready for other uses than the one we
currently have, but feel free to try !

And keep us posted on your progress. Good luck !

signature.asc

RustyNail

unread,

Sep 12, 2008, 10:58:17 AM9/12/08

to Mirage - Automatic Playlist Generation

Yay. It works now.
Now to look at the source :)

eric casteleijn

unread,

Sep 22, 2008, 5:34:09 PM9/22/08

to mirag...@googlegroups.com

Hey all,

After banging my head against it some more, I have an initial working
version. It takes about 8 seconds per song for the analysis, which on my
machine seems at least in the same neighborhood as the banshee plugin
operates.

The code attached is not polished at all, and the database layer,
playlist generation hasn't been tested/implemented.

What works is everything up to the distance calculation, which I think
was the hardest part. Herein too lies the rub: I'm not really sure that
I got all the algorithms correct, because I had to get rid of some
nested loops that made things way too slow in python (8 minutes per
track instead of 8 seconds). Luckily the numpy/scipy extension does
almost everything you could want to do with a matrix/array, and it does
it at c speed.

I think that everything is not as it should be though: the distance
between the scms of a song and the scms of the same song is never 0,
which I would expect, and can even be negative. Other distances have
also shown negative values.

On an insignificantly small test set, the values do seem to make some
sense though[*], but I want to make sure that that's not coincidence.

My question is: do any of the original developers have test data that I
can use to build some unit tests? Perhaps a little script with some very
small matrices that exercises the code, or a test mp3 and some of the
resulting data from that (if possible not just the end result, but also
the intermediate matrices generated in Matrix.multiply,
CovarianceMatrix, Vector, Mfcc etc.). I will gladly (try to) contribute
back unit tests in C# if you can help me out with some test data, so
that I know my script is doing the right thing. I am not very well
versed in matrix and vector mathematics, so I'm sure some of the things
I changes are just plain wrong, and also, with unit tests, I can try
some more optimization tricks without worrying about breaking 'correctness'.

[*]: I took 5 mp3s and oggs:

1. felix - don't you want me baby
2. joni mitchell - cactus tree
3. james taylor - hey mister that's me upon the juke box
4. john larner & slater_hogan - gettin' ready
5. ricardo rae - lead the way

where 2 and 3 are 70s folk, 1 is 90s house, and 4 and 5 are consecutive
tracks of a fabric mix cd. As expected, 4 and 5 had the smallest
distance (they are mixed together, so acoustically *very* close)
followed by 2 and 3. 1 is not really like any of the others, which was
reflected.

With the filter files from svn and the windowsize and sampling rate
doubled, things got a tiny bit slower, but the same relations more or
less showed up.

mirage.py

eric casteleijn

unread,

Oct 27, 2008, 3:37:19 AM10/27/08

to mirag...@googlegroups.com

Replying to my own message, which may have been held because it
contained an attachment:

On 09/22/2008 11:34 PM eric casteleijn wrote:
> After banging my head against it some more, I have an initial working
> version. It takes about 8 seconds per song for the analysis, which on my
> machine seems at least in the same neighborhood as the banshee plugin
> operates.
>
> The code attached is not polished at all, and the database layer,
> playlist generation hasn't been tested/implemented.
>
> What works is everything up to the distance calculation, which I think
> was the hardest part. Herein too lies the rub: I'm not really sure that
> I got all the algorithms correct, because I had to get rid of some
> nested loops that made things way too slow in python (8 minutes per
> track instead of 8 seconds). Luckily the numpy/scipy extension does
> almost everything you could want to do with a matrix/array, and it does
> it at c speed.
>
> I think that everything is not as it should be though: the distance
> between the scms of a song and the scms of the same song is never 0,
> which I would expect, and can even be negative. Other distances have
> also shown negative values.
>
> On an insignificantly small test set, the values do seem to make some
> sense though[*], but I want to make sure that that's not coincidence.

Finally got some time to work on this again. Turns out I *was* doing it
wrong, there was a stupid off by one error in my code somewhere. I've
fixed that, and now I'm getting the same values as the C# version does.
Yay! On to a database layer! (I think I will write mine to work directly
with the sqlite database that my plugin already has, but I will try to
keep it general enough so that people could use the code without my plugin.)

Anyone that is interested, the code lives here:

http://code.google.com/p/autoqueue/source/browse/trunk

It's in mirage.py and test_mirage.py. That last file contains some
budding unittests. I've tried to look into porting those to C# too, but
that ended up looking like more work than I was willing to spend right
now, without knowing if anyone is even interested. (Mainly because I
couldn't get the C# unit testing framework to run on my machine.) The
python unit tests could be adapted, and I would convert them if someone
else is interested, and has some time to test them.

I have copied the the /res directory verbatim, and added Dominik and
Bertrand to the copyright files. (I think I'll leave out your email
addresses to prevent questions about my software ending up on your desk,
but let me know if you wish to have them in.) My project is GPL as well,
so I think that should be sufficient, but please let me know if you have
additional wishes/requirements.

Reply all

Reply to author

Forward