Freetext search

1 view
Skip to first unread message

Daniel Önnerby

unread,
Apr 26, 2008, 5:33:33 PM4/26/08
to musikC...@googlegroups.com
Just thought I'd get this mailinglist started :)

I've been thinking some regarding the freetext search field and have
some ideas.
There are alot of metadata to search through, so for the sake of speed,
maybe we shouldn't search through everything. Although I think this
should be a setting of some kind.
Maybe we should have some sort of ranking on the TracklisstView and the
MetadataFilterViews. Easiest would be just to
have a ranking that is +1 for each metadata value hit in the db and sort
the hits according to the ranking. This method would have to go through
everything before returning any results (could be slow).

Although, maybe speed is the most important here, and we could do this
like this:
- First just list all tracks with the freetext in the title
- Then append everything with it in the artist then the album,
foldername, genre, etc, etc
This would make the search return results much quicker and with some
kind of ranking, but the ranking is not based on the number of hits, but
instead of a weighting of metakeys.

I havn't seen much on the forum regarding this, and I have neither
researched how other apps are doing this.

Best regards
doep


Björn Olievier

unread,
Apr 27, 2008, 7:33:54 AM4/27/08
to musikC...@googlegroups.com
Hi,

I'll reply inline.

On Sat, Apr 26, 2008 at 11:33 PM, Daniel Önnerby <onn...@gmail.com> wrote:

Just thought I'd get this mailinglist started :)

I've been thinking some regarding the freetext search field and have
some ideas.
There are alot of metadata to search through, so for the sake of speed,
maybe we shouldn't search through everything. Although I think this
should be a setting of some kind.

Why would it be slow?  Do you have any data on that?  I would prefer to test this before limiting the way free text search works. 

Some kind of limitation could be interesting to avoid irrelevant results, like when text from the search field appears in tags that the user doesn't actively maintain.  Some tools use the comment field to stores things like "Encoded by...".  I'm not sure how to decide which are the important tags and which ones aren't.


Maybe we should have some sort of ranking on the TracklisstView and the
MetadataFilterViews. Easiest would be just to
have a ranking that is +1 for each metadata value hit in the db and sort
the hits according to the ranking. This method would have to go through
everything before returning any results (could be slow).
Although, maybe speed is the most important here, and we could do this
like this:
- First just list all tracks with the freetext in the title
- Then append everything with it in the artist then the album,
foldername, genre, etc, etc
This would make the search return results much quicker and with some
kind of ranking, but the ranking is not based on the number of hits, but
instead of a weighting of metakeys.

I like the idea of updating the results as the query progresses.

I havn't seen much on the forum regarding this, and I have neither
researched how other apps are doing this.

I think we do create some kind of benchmark to get an idea of performance.  The current DB schema has a table with al metadata values.  I think that's a good start.  If necessary we could denormalize it a bit or add some kind of word index to speed up things.

Björn

Daniel Önnerby

unread,
Apr 27, 2008, 3:06:22 PM4/27/08
to musikC...@googlegroups.com
Björn Olievier wrote:
Hi,

I'll reply inline.

On Sat, Apr 26, 2008 at 11:33 PM, Daniel Önnerby <onn...@gmail.com> wrote:

Just thought I'd get this mailinglist started :)

I've been thinking some regarding the freetext search field and have
some ideas.
There are alot of metadata to search through, so for the sake of speed,
maybe we shouldn't search through everything. Although I think this
should be a setting of some kind.

Why would it be slow?  Do you have any data on that?  I would prefer to test this before limiting the way free text search works. 
What I mean is the "ranking" kind of listing will be a bit slow. Lets say I do a freetext search. The Query will have to search through all metadata and for each track related it will add a +1 ranking, and then finaly sort descending by that ranking. This means that the query will not start returning any results before doing the whole search.
The other option to rank the metakeys in a specified order will return results almost right away. In that case, the query will start by searching the title and return all tracks with the text in the title (very quick), secondly append all tracks with the text in the album or artist, etc (or any way we rank the different metakeys)

But you are right, we should benchmark this before doing anything :)



Some kind of limitation could be interesting to avoid irrelevant results, like when text from the search field appears in tags that the user doesn't actively maintain.  Some tools use the comment field to stores things like "Encoded by...".  I'm not sure how to decide which are the important tags and which ones aren't.


Maybe we should have some sort of ranking on the TracklisstView and the
MetadataFilterViews. Easiest would be just to
have a ranking that is +1 for each metadata value hit in the db and sort
the hits according to the ranking. This method would have to go through
everything before returning any results (could be slow).
Although, maybe speed is the most important here, and we could do this
like this:
- First just list all tracks with the freetext in the title
- Then append everything with it in the artist then the album,
foldername, genre, etc, etc
This would make the search return results much quicker and with some
kind of ranking, but the ranking is not based on the number of hits, but
instead of a weighting of metakeys.

I like the idea of updating the results as the query progresses.

I havn't seen much on the forum regarding this, and I have neither
researched how other apps are doing this.

I think we do create some kind of benchmark to get an idea of performance.  The current DB schema has a table with al metadata values.  I think that's a good start.  If necessary we could denormalize it a bit or add some kind of word index to speed up things.
Denormalizing the database will only make things slower since we will have to start freetext searching through duplicates. Word index might be a good idea. We could use the SQLite FTS3 (full text search plugin).
Although, as you say, lets benchmark first :)
Björn



Reply all
Reply to author
Forward
0 new messages