Index Server

stepbystep

unread,

Nov 8, 2001, 6:27:43 AM11/8/01

to

Hello MS Inder Server users:

I have some questions about MS Index Server, I would like to use it.

1. As I understand it can index the contents of the files, say *.htm or *.pdf.
or all. So how big is the index compared to the sizes of the files that
you indexed ?

2. does it really index everything in the files, or is it selective. I tried
with some random substrings and it didnt work everytime.

3. To those knowledgeable in its theory and implementation, how exactly is
it implemented. Does it use the concept of concept of suffix trees or
does it do similar to the unix locate db ?

4. Is there a similar utility that I can use on my linux box ?

thanks.

PS: I could not post to comp.database.theory. Where am I likely to get more
info on this topic ?

Highdiver

unread,

Nov 8, 2001, 7:56:33 AM11/8/01

to

stepbystep wrote:

> Hello MS Inder Server users:
>
> I have some questions about MS Index Server, I would like to use it.
>
> 1. As I understand it can index the contents of the files, say *.htm or *.pdf.
> or all. So how big is the index compared to the sizes of the files that
> you indexed ?
>

Actually indices, I got 10Mb for 2Gb of data, mainly docs and xls.

>
> 2. does it really index everything in the files, or is it selective. I tried
> with some random substrings and it didnt work everytime.
>

It indexes all Office files, text, rtf etc. Note some Office 2k files may not
index properly. You need W2K to do that. PDF files, you need to install a filter
from Adobe.

>
> 3. To those knowledgeable in its theory and implementation, how exactly is
> it implemented. Does it use the concept of concept of suffix trees or
> does it do similar to the unix locate db ?
>

No idea. But there is a book on it. MS Index Server

>
> 4. Is there a similar utility that I can use on my linux box ?
>

No. Some Linux search engines, HTdig n swish-e
Note Index Server is integrated with NTFS. If you were to search for all doc
files, it will only show you those files that you can read. Those you don't have
read permission will not show up in the results page. I can't find another search
engine that does this (yet).

>
> thanks.
>
> PS: I could not post to comp.database.theory. Where am I likely to get more
> info on this topic ?

A fairly obvious answer ain't it? You should post this to the microsoft index
server newsgroup.

--
regards
AL
Tech reference and hashing.
http://www.alfredivy.per.sg/users/alloo

stepbystep

unread,

Nov 8, 2001, 10:41:01 PM11/8/01

to

Highdiver <alfre...@bigfoot.com> wrote in message news:<3BEA8101...@bigfoot.com>...
> stepbystep wrote:

>
> Actually indices, I got 10Mb for 2Gb of data, mainly docs and xls.
>

An empty doc's size is 19kb, and and empty xsl's size is 13kb.
So that still doesnt answer my question. If that 2gb is converted to
text will it still be much larger than 10mb ?

> >
> > 3. To those knowledgeable in its theory and implementation, how exactly is
> > it implemented. Does it use the concept of concept of suffix trees or
> > does it do similar to the unix locate db ?
> >
>

Now I understand, that since these index servers use search engine
technologies, the search engines themselves arent really substring
indexing servers. I found out that even google doesnt match just
any substring. and not even wildcard characters(patter* for pattern).

From the pagerank technique google might identify which page to search
first, but can somebody tell me about the indexing technique it uses
to locate the string inside the page.

> > PS: I could not post to comp.database.theory. Where am I likely to get more
> > info on this topic ?
>
> A fairly obvious answer ain't it? You should post this to the microsoft index
> server newsgroup.

this newsrgoup is not there on google.

Highdiver

unread,

Nov 10, 2001, 10:39:00 PM11/10/01

to

msnews.microsoft.com

stepbystep wrote:

--