What we are trying to determine is the algorithm that Indexing Service uses to rank certain files higher than others. For example, if the word "fish" appears once in the title for one document, does that get a higher ranking than a file that contains "fish" twice in the content, but not at all in the title? Or, would one appearance of the word near the top of a documents contents allow it to rank higher than if it appeared twice near the very end of a document?
Where can I get this kind of information? Books? Online tutorials? Because msdn.microsoft.com certainly has absolutely no information of the sort whatsoever anywhere in the Knowledge base or elsewhere.
This one is killing me. Any help will be most highly and eternally appreciated.
**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
2. The closer the searched for words are to each other, the higher the
rank, until
the point that they are adjacent becoming a phrase and raising the rank
even
higher.
3. The ranking mechanism is weighted so that the more highly inflected
the word
is from the version asked for originally, the lower its rank in the result
set. For
example, "swim" would be closer to "swims" and further from "swimmer"
because
"swim" and "swimmer" are less related grammatically. In other words, the
plural
noun form is more related grammatically than the past-tense verb form of
the same
word. When resolving queries, the linguistic engine and ranking algorithm
take
these linguistic features into account.
Index server doesn't treat ranking as "x words per document" but rather on
word
density. Such as a document with 200 words vs a doc with 20,000 words, each
containing one instance of the word searched for. The one with 200 words
will have
a density of 1/200 which is higher than the one with 1/20,000. So a small
document
with one hit can outweigh a larger document with more hits. Your result set
may
contain all of the same results, but the ranking values may never be
consistent
because of the "arbitrary algorithm" used to calculate it.
There is no way to change the ranking mechanism. There books on the common
algorithms used in the field of Indexing but there are no whitepapers. The
Indexing Service is based on ranking formulas that are used everywhere from
statistics to molecular biology. These are not listed in any articles or
white
papers because it is subject to change in future versions based on user
feedback
and performance tweaking.
Thank You
George Cheng
Microsoft Application Center & Index Server Support
Note: This article has no warranties implicit or explicit.
All the content is given on the "as is" basis and the user
takes full responsibility for its use and assumption.
Microsoft Corporation Copyright 2004
All Rights Reserved
--------------------
| From: Ron Forte (rfo...@bloomberg.net)
| Subject: How EXACTLY does Indexing Service determine rank
| Message-ID: <e4wsOwvT...@tk2msftngp13.phx.gbl>
| Newsgroups: microsoft.public.inetserver.indexserver
| Date: Thu, 10 Jun 2004 08:03:14 -0700
| NNTP-Posting-Host: shared2.orcsweb.com 66.129.69.1
| Lines: 1
| Path:
cpmsftngxa10.phx.gbl!TK2MSFTFEED01.phx.gbl!TK2MSFTNGP08.phx.gbl!tk2msftngp13
.phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.inetserver.indexserver:29097
| X-Tomcat-NG: microsoft.public.inetserver.indexserver
>In my experience the <title> tag is not indexed at all. It's only used to display results. Try it!
In terms of Indexing Services, a Title is the title property of the
document, not text between markup codes. Indexing Services will index
far more than HTML files.
Jeff