Here is some discussions on the Search project:
spikeflaptor
===========
“precision” is an incredible imprecise word. Moreover, “relevance”
means different things for different persons. If you’re serious about
improving relevance, the only way to go is to assemble a test team
large enough to be representative of most different search interests
of potential users. In our experience, this in no trivial task, as
you’ll soon realize that all your friends, coworkers, family,
acquaintances, etc, lie in a very particular ecological niche
regarding search habits.
The current search implementation at wordpress does weight different
parts of the page, as title and body. I don’t recall if we’re indexing
comments… our fear was leaving the system very vulnerable to spamming.
Spam control is a neverending story. It a war that forces continuous
adaptations on both bands, while at the same time, it’s not an end on
itself and we’re trying not to waste resources on it. This is why I
think that advanced spam control techniques are better left alone
until needed: show off all your arsenal right from the beginning, and
you’ll be attacked only once.
You’re absolutely right about phrase queries, I think we should enable
them by default.
Hailin Wu wrote
==========
Thanks for your feedback.
Let me give two simple examples to illustrate what I mean by
relevance:
http://en.search.wordpress.com/?q=hailin+wu
Gives no results, although I have a few WP blogs,
and google search gives
hailin.wordpress.com, my main blog.
http://en.search.wordpress.com/?q=matt
gives random results with matt on the post title.
Google search gives
ma.tt
I guess it’s inherently difficult to solve this issue
when the search subjects set is small, such as when it’s within
wordpress.com blogs. Page rank, which is the secret to google’s
precision, can not be adequately applied when the space is small.
However, it seems Google has a way to deal with this. Google sells
site search appliance
http://www.google.com/enterprise/
I am just wondering what algorithms they used to produce more relevant
results.