'media use statistics' format discussion

4 prikaza
Preskoči na prvu nepročitanu poruku

Oliver Horn

nepročitano,
7. tra 2010. 06:02:4307. 04. 2010.
u Desktop CouchDB
Hi there,

I wrote a Python plugin for Rhythmbox that syncs statistics such as
play_count, last_played and rating to desktopcouch and I'm working on
another one for Exaile which can already do this, too. Both programs
can now sync their statistics and with Ubuntu One that also should
work across different computers. While at the moment the plugins only
sync audio files I plan to extend them to sync podcast and stream
entries. Thinking of players with video library support such as
Banshee, one should already think about syncing video file statistics,
too.

Before I start publishing those plugins I thought it would be nice to
have some kind of standard format underlying. The format I do now use
is very similar to the formats documented for notes, contacts and
bookmarks:

CouchDB name: "media"
{
"_id": string - uri of this document, # NOT internal to CouchDB
"_rev": string - revision_for_this_document, #internal to CouchDB
"record_type": "http://does/not/exists/yet",
"record_type_version": "0.1",
"type": string - "audio" # in the future maybe
"podcast"/"stream"/"video",
"last_changed": int - posix date of last modification,
"last_played": int - posix date of last play,
"play_count": int - how often this has been played,
"rating": int - from 0 to 5 (e.g. stars),
"application_annotations": {
"Rhythmbox": {
"last_changed": int - posix date of last modification with
Rhythmbox,
...maybe more...
}
"Exaile": {
"last_changed": int - posix date of last modification with
Exaile,
...maybe more...

}
}
}

Some thoughts about that:
I use the URI (file address/url/...) as id. This has one disadvantage:
having different file paths on multiple computers, those statistics
would not get synced. But I don't see another way since there is no
single property that is unique to a file and can be handled by the
mediaplayer. I you have ideas please let me know.

Instead of making one database for media files one could also make one
for every type, e.g. one for audio, one for podcasts etc.

There may be more entry properties that can be synced, while
"first_added" could make sense, I don't see syncing "title", "artist",
"album" because those can be saved directly in the audio file. This
may not hold for podcasts and streams and the question is if one
should make a difference here or not.

The rating seems to be different on multiple programs. While Rhythmbox
has an int 0 to 5 star rating, exaile saves an int between 0 and 100
or alternatively a float between 0 and 5.0. I know that Banshee has
both ratings, one rating with up to 5 stars and one with 0 to 100.

I look forward on your thoughts.
Kind regards,
Oliver

Oliver Horn

nepročitano,
7. tra 2010. 08:48:0607. 04. 2010.
u Desktop CouchDB
I forgot to mention "jump_count" and that it might be possible to sync
playlists, but I'm quite sure this is kind of tricky.

Duane Hinnen

nepročitano,
7. tra 2010. 09:40:0907. 04. 2010.
u desktop...@googlegroups.com
Oliver,
I am excited by all the new programs that are popping up using CouchDB, desktopcoouch, and/or Ubuntu One. I have started a wiki page on the Ubuntu One wiki to serve as a place where people writing such programs can find each other. I think having this central repository of code and ideas will be an extremely useful resource.  My hope is the page will serve as a place for inspiring ideas and spurring collaboration. I would love it if you would add your project to the page. You can edit the wiki your self or send me the info and i will add it for you.

This invitation is of course open to all people developing projects that use CouchDB, Desktopcouch, and/or Ubuntu One.

https://wiki.ubuntu.com/UbuntuOne/ThirdPartyProjects

--
Duane Hinnen
duane...@ubuntu.com
Ubuntu Community Forums
Oklahoma Ubuntu LoCo
Ubuntu Beginners Team
sip:duane...@ekiga.net

Stuart Langridge

nepročitano,
8. tra 2010. 08:46:5308. 04. 2010.
u desktop...@googlegroups.com

I think it would be worth storing [artist name, album name, song name]
in the record (and using internal IDs). It's very unlikely that you'll
get collisions on that (you might have the same song by the same person
on two different albums, but I doubt very much that you'll have two
different mp3s of the same song on the same album by the same person,
and if you do you'd probably rate them the same anyway :))

> Instead of making one database for media files one could also make one
> for every type, e.g. one for audio, one for podcasts etc.

I'd suggest having different record types for that, rather than
completely different databases.

> There may be more entry properties that can be synced, while
> "first_added" could make sense, I don't see syncing "title", "artist",
> "album" because those can be saved directly in the audio file. This
> may not hold for podcasts and streams and the question is if one
> should make a difference here or not.

See above for that :) I wouldn't add first_added, though, because that
means "first added on this computer", not "first added on any computer".

> The rating seems to be different on multiple programs. While Rhythmbox
> has an int 0 to 5 star rating, exaile saves an int between 0 and 100
> or alternatively a float between 0 and 5.0. I know that Banshee has
> both ratings, one rating with up to 5 stars and one with 0 to 100.

I'd suggest storing ratings as 0-100 in the record, and then scaling it
to that or from that when working with a particular media player?

This could make it a lot easier to migrate from Banshee to Rhythmbox or
vice versa!

sil

Thomas Ibbotson

nepročitano,
8. tra 2010. 08:54:0808. 04. 2010.
u desktop...@googlegroups.com
On 8 April 2010 13:46, Stuart Langridge <stuart.l...@canonical.com> wrote:
> I think it would be worth storing [artist name, album name, song name]
> in the record (and using internal IDs). It's very unlikely that you'll
> get collisions on that (you might have the same song by the same person
> on two different albums, but I doubt very much that you'll have two
> different mp3s of the same song on the same album by the same person,
> and if you do you'd probably rate them the same anyway :))
>
What about live versions on the same album as the studio version? I
guess they're normally suffixed by "[Live]", but depending on the
artist you might give them wildly different ratings ;)

Tom

Eric Castekeijn

nepročitano,
8. tra 2010. 09:54:4408. 04. 2010.
u desktop...@googlegroups.com
On 04/08/2010 08:46 AM, Stuart Langridge wrote:
> On 04/07/2010 11:02 AM, Oliver Horn wrote:
>> Hi there,
>>
>> I wrote a Python plugin for Rhythmbox that syncs statistics such as
>> play_count, last_played and rating to desktopcouch and I'm working on
>> another one for Exaile which can already do this, too. Both programs
>> can now sync their statistics and with Ubuntu One that also should
>> work across different computers. While at the moment the plugins only
>> sync audio files I plan to extend them to sync podcast and stream
>> entries. Thinking of players with video library support such as
>> Banshee, one should already think about syncing video file statistics,
>> too.

It's awesome that you're working on this. It's a subject near and dear
to me, so expect some feedback on the format, but on first glance it
looks very well thought out to me.

>> Some thoughts about that:
>> I use the URI (file address/url/...) as id. This has one disadvantage:
>> having different file paths on multiple computers, those statistics
>> would not get synced. But I don't see another way since there is no
>> single property that is unique to a file and can be handled by the
>> mediaplayer. I you have ideas please let me know.
>
> I think it would be worth storing [artist name, album name, song name]
> in the record (and using internal IDs). It's very unlikely that you'll
> get collisions on that (you might have the same song by the same person
> on two different albums, but I doubt very much that you'll have two
> different mp3s of the same song on the same album by the same person,
> and if you do you'd probably rate them the same anyway :))

Actually I have a number of albums that have the same track title for
all tracks, and can only be disambiguated by including the tracknumber.
I think having those 4 would be enough in all but *very* pathological
cases, but it all depends on how much you care about edge cases: I've
used artist and track title to identify songs when I actually *wanted*
to group different versions of the same song together, and file path
when I didn't. In a distributed system that becomes harder, as you describe.

One interesting option would be to see if you can use musicbrainz[1] ids
where they are available, or even look them up using the web API. I've
tagged most of my library using musicbrainz Picard[2], which means those
tracks have a guaranteed unique musicbrainz id in the id3 tags.

This has the added advantage that you have a way to query for fixes to
the metadata on musicbrainz, and it seems to be what a lot of
applications are converging on as the canonical identifier for music.

[1] http://musicbrainz.org/
[2] http://musicbrainz.org/doc/MusicBrainz_Picard

--
eric casteleijn
https://code.launchpad.net/~thisfred
Canonical Ltd.

Eric Castekeijn

nepročitano,
8. tra 2010. 10:01:5708. 04. 2010.
u desktop...@googlegroups.com

There is a 'version' id3 tag, which I prefer to use to disambiguate
different versions. (Yeah, I'm *very* anal about music metadata) Still,
using artist, album, tracknumber, title should be unique enough. In the
end, use something that's good enough. Perfect is unattainable. ;)

Stuart Langridge

nepročitano,
8. tra 2010. 10:02:1808. 04. 2010.
u desktop...@googlegroups.com
On 04/08/2010 02:54 PM, Eric Castekeijn wrote:
> Actually I have a number of albums that have the same track title for
> all tracks, and can only be disambiguated by including the tracknumber.
> I think having those 4 would be enough in all but *very* pathological
> cases, but it all depends on how much you care about edge cases: I've
> used artist and track title to identify songs when I actually *wanted*
> to group different versions of the same song together, and file path
> when I didn't. In a distributed system that becomes harder, as you
> describe.
>
> One interesting option would be to see if you can use musicbrainz[1] ids
> where they are available, or even look them up using the web API. I've
> tagged most of my library using musicbrainz Picard[2], which means those
> tracks have a guaranteed unique musicbrainz id in the id3 tags.

Ah, you can't use musicbrainz IDs as *the* identifier, because you might
not have access to the internet to look them up :) Also, not all media
players will necessarily have a way to query their music database by
arbitrary ID3 tag, so when you first sync your desktopcouch ratings DB
to a new computer, you have to say "for each record in desktopcouch,
look through every file on disk and get its ID3 tags until you find the
musicbrainz tag that matches this record", which takes forever. Also, if
your mp3 files aren't musicbrainz tagged then even this won't work until
you've musicbrainz-tagged your entire library, which is death. So,
between "files are not tagged" and "you have no internet access", you
need some non-musicbrainz non-ambiguous fallback, and then you might as
well use the fallback for everything. I like the idea of [track number,
album, artist, song], though.

(Musicbrainz IDs potentially help for *wrongly tagged* music, and for
the same song on two different albums, but I think the requirement to
tag your whole library to make them useful *for this use case of syncing
ratings* makes it not worth it.)

sil

Eric Castekeijn

nepročitano,
8. tra 2010. 10:08:4208. 04. 2010.
u desktop...@googlegroups.com

Yeah, fair enough, although you could use the audio fingerprints
musicbrainz generates, though they *can* differ for different files that
encode the same version of the same track. They should uniquely identify
copies of identical files across machines though. Again, you'll have to
have the plugins analyze the file before adding the record to couchdb,
so it's probably not worth the trouble. Good enough is good enough.

Oliver Horn

nepročitano,
8. tra 2010. 10:55:5008. 04. 2010.
u Desktop CouchDB
Thanks for your input.

In between I thought that its true, I should not use file paths as
ids. Its a huge problem if you have music on different computers in
different folders. Instead I had the idea to store (only for files)
something like:

"locations": {
"host1name": string - file path on host1
"host2name": string - file path on host2
}

Lets think this through:
The first sync from any programs database to desktopcouch is easy,
couch internal ids are used, location is stored as described
host1:filepath. Second sync from a program on the same computer is
easy again. You can look up the location host1:filepath in the couch.
If there is no entry, it puts a new record. Now comes a third sync
from another pc where the paths may be different. At first I would
have to look if the file that exists on this second pc is already in
the desktopcouch, I can do this by searching for host2:filepath. If
there is no such entry I should look up if there is an entry with
host1:filepath==host2:filepath (this is the case if I have synchronous
file structure, for instance /home/oliver/music/). If there is I do
store hostname2:filepath anyway, because next time I want to have one
search less. If there is no equality in filepaths, I need to do search
for equality in tags. And like you already mentioned title, artist,
album and tracknumber should work in most cases. If this also doesn't
work I would just put a new record.
Now we have a good database that is synced across different pcs.

I think this is a good idea, because if you think it through it also
works with mediaplayers that change the filepath if the user changes
the tags (e.g. Banshee does this).

btw: I had another idea at first. If all mediaplayers save an internal
id to an entry like Rhythmbox does, one could try to sync that
internal id to the "application_annotations" and fetch entries by that
id. Unfortunately it didn't work, because it seems that one can not
manipulate those ids. But this is sufficient if you already have
different RhythmboxDBs on two pcs. Anyway, not every mediaplayer may
have such ids. Thats why I discard this idea.

If I now store album, artist, title and tracknumber, what stands
against storing the complete tag and sync it? At the moment I'm not
sure how different the tags of different audio formats are. At least
the basic ones disc number, disc count, year, genre, track count could
should be fine.

Thank god, streams, podcast feeds and podcast posts do only have one
location across multiple plattforms.

Oliver Horn

nepročitano,
8. tra 2010. 11:02:5608. 04. 2010.
u Desktop CouchDB
btw:

On 8 Apr., 14:46, Stuart Langridge <stuart.langri...@canonical.com>
wrote:


> This could make it a lot easier to migrate from Banshee to Rhythmbox or
> vice versa!

Is there any way yet to migrate from Banshee to Rhythmbox?

Oliver Horn

nepročitano,
11. tra 2010. 04:02:2111. 04. 2010.
u Desktop CouchDB
A little update:

I split my 2 plugin files into 3 parts.

The Exaile and Rhythmbox-plugins now have only 3 things to do:
* listen to the correct signals for changes on tracks
* on change: gather all interesting data about the track in a
dictionary and send it to a backend
* wait for the backend to return an update and write this update to
the mediaplayers own database

The backend:
* takes the input data from the plugin and searches the couch for
an exisiting record with that data
* compare and sync that input data with that couch record (if there
was none, it adds a new one)
* returns an update back to the plugin or returns none, if an
update is unnecessary

Both plugin can use the backend and don't have to implement any
desktopcouch stuff. Also, I do use couch internal id's now and store
the file location for every host in a dictionary like I mentioned
above.

But now I need a little help: I am back to my second pc now and wanted
to try out synchronisation across multiple pcs. I thought this would
happen automatically throught Ubuntu One? But actually it doesn't.
What did I miss? Where do I have to activate this? I have Lucid and
Ubuntu One on both clients.

If you help me to get this work, I think I can release an alpha of the
rhythmbox plugin :-)

Regards

Odgovori svima
Odgovori autoru
Proslijedi
0 novih poruka