How EXACTLY does Indexing Service determine rank

Unknown

unread,

Jun 10, 2004, 11:03:14 AM6/10/04

to

So, Googling for the past 3 hours has gotten me ZERO information on how Microsoft Indexing Service determines the numeric value it assigns to Rank. The problem we are having is that we've got an asp search page that queries the indexing catalog on a particular directory on one of our web servers. It's sorted by rank[d], and there are many instances where higher results contain fewer instances of the search terms in the file's title, metadeta, and content than much lower results.

What we are trying to determine is the algorithm that Indexing Service uses to rank certain files higher than others. For example, if the word "fish" appears once in the title for one document, does that get a higher ranking than a file that contains "fish" twice in the content, but not at all in the title? Or, would one appearance of the word near the top of a documents contents allow it to rank higher than if it appeared twice near the very end of a document?

Where can I get this kind of information? Books? Online tutorials? Because msdn.microsoft.com certainly has absolutely no information of the sort whatsoever anywhere in the Knowledge base or elsewhere.

This one is killing me. Any help will be most highly and eternally appreciated.

**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...

George Cheng [MSFT]

unread,

Jun 10, 2004, 1:22:23 PM6/10/04

to

1. The number of times a word appears in a document divided by the total
number of
words in the document. This is further weighted by having hits in areas
like
headers or titles that weigh more than the body of the document.

2. The closer the searched for words are to each other, the higher the
rank, until
the point that they are adjacent becoming a phrase and raising the rank
even
higher.

3. The ranking mechanism is weighted so that the more highly inflected
the word
is from the version asked for originally, the lower its rank in the result
set. For
example, "swim" would be closer to "swims" and further from "swimmer"
because
"swim" and "swimmer" are less related grammatically. In other words, the
plural
noun form is more related grammatically than the past-tense verb form of
the same
word. When resolving queries, the linguistic engine and ranking algorithm
take
these linguistic features into account.

Index server doesn't treat ranking as "x words per document" but rather on
word
density. Such as a document with 200 words vs a doc with 20,000 words, each
containing one instance of the word searched for. The one with 200 words
will have
a density of 1/200 which is higher than the one with 1/20,000. So a small
document
with one hit can outweigh a larger document with more hits. Your result set
may
contain all of the same results, but the ranking values may never be
consistent
because of the "arbitrary algorithm" used to calculate it.

There is no way to change the ranking mechanism. There books on the common
algorithms used in the field of Indexing but there are no whitepapers. The
Indexing Service is based on ranking formulas that are used everywhere from
statistics to molecular biology. These are not listed in any articles or
white
papers because it is subject to change in future versions based on user
feedback
and performance tweaking.

Thank You

George Cheng

Microsoft Application Center & Index Server Support

Note: This article has no warranties implicit or explicit.
All the content is given on the "as is" basis and the user
takes full responsibility for its use and assumption.
Microsoft Corporation Copyright 2004
All Rights Reserved

--------------------
| From: Ron Forte (rfo...@bloomberg.net)
| Subject: How EXACTLY does Indexing Service determine rank
| Message-ID: <e4wsOwvT...@tk2msftngp13.phx.gbl>
| Newsgroups: microsoft.public.inetserver.indexserver
| Date: Thu, 10 Jun 2004 08:03:14 -0700
| NNTP-Posting-Host: shared2.orcsweb.com 66.129.69.1
| Lines: 1
| Path:
cpmsftngxa10.phx.gbl!TK2MSFTFEED01.phx.gbl!TK2MSFTNGP08.phx.gbl!tk2msftngp13
.phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.inetserver.indexserver:29097
| X-Tomcat-NG: microsoft.public.inetserver.indexserver

Unknown

unread,

Jun 10, 2004, 5:06:32 PM6/10/04

to

In my experience the <title> tag is not indexed at all. It's only used to display results. Try it!

Jeff Cochran

unread,

Jun 10, 2004, 5:48:58 PM6/10/04

to

On Thu, 10 Jun 2004 14:06:32 -0700, Marc (caw...@yahoo.com) wrote:

>In my experience the <title> tag is not indexed at all. It's only used to display results. Try it!

In terms of Indexing Services, a Title is the title property of the
document, not text between markup codes. Indexing Services will index
far more than HTML files.

Jeff