Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Full Text File Search with Indexing Service on Windows (cont.)

81 views
Skip to first unread message

Chung Leong

unread,
Aug 22, 2006, 1:24:37 AM8/22/06
to
Here's the rest of the tutorial I started earlier:

Aside from text within a document, Indexing Service let you search on
meta information stored in the files. For example, MusicArtist and
MusicAlbum let you find MP3 and other music files based on the singer
and album name; DocAuthor let you find Office documents created by a
certain user; DocAppName let you find files of a particular program,
and so on.

Indexing Service uses plug-ins known as iFilters to extract information
from files it indexes. A default installation of Windows has iFilters
for many common file formats like HTML, Word, PowerPoint, and Excel.
You can extend Indexing Service's capability by installing additional
iFilters. Many are listed at http://www.ifilter.org/, with support
available for PDF, Photoshop, ZIP, Visio, Open Office, and others.

In the previous example, we used CONTAINS(Contents, '$keyword') to
search for a particular key word. Only files containing that exact word
would be returned. If $keyword is 'date,' then Indexing Service would
find those files with the word "date" but not those containing 'dates.'
To relax the criteria somewhat, we can use the FORMSOF (INFLECTIONAL,
<word>) construct. Example:

$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, date)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);

Now Indexing Service will look for all the inflected forms of the word:
date, dates, dating, dated, etc. If the word specified is "good," then
it'd look for good, better, best, and well.

To search on a partial word, we use the * sign:

$keyword = ' "kn*" ';

The double-quotation marks indicate a wild-card search. The above
pattern means any word starting with "kn" is considered a match.

Indexing Service also supports the use of the <field> LIKE '%pattern%'
and <field> = 'value' SQL expressions. They are best avoided, however,
as they can be incredible slow: Matching against the value of a field
often means reading from the files.

To sort the results, we add an ORDER BY clause:

$dir = 'C:\\htdocs'
$keyword = 'FORMSOF (INFLECTIONAL, good)';
$sql = "SELECT filename, size, path
FROM SCOPE('DEEP TRAVERSAL OF \"$dir\"')
WHERE CONTAINS(Contents, '$keyword')
ORDER BY size DESC";
$res = oledb_query($sql, $link);

The above example list the files found from the biggest to the
smallest. "ORDER BY write DESC" would list the more recently modified
files first, while "ORDER BY create DESC" list first the ones more
recently created. You can, of course, also use these file attributes as
search criteria.

Thus far we have been searching on the computer's default catalog. If
searching will be done only in a particular folder, it's worthwhile to
create a separate catalog. You can do this in the Computer Management
console. To search different catalog to OLE-DB, you specify the catalog
name in the connection string as the data source::

$link = oledb_open("Provider=MSIDXS; Data Source=web_cat");

Finally, what if you want to search files residing on a network server?
While it's possible to index a network drive, it's not terribly
efficient. Instead, you'd want to enable Indexing Service on that
computer and perform the search there.

To search a remote catalog, we prepend the SCOPE() statement with the
computer name and the catalog name:

$dir = '\\fileserver\projects'
$keyword = 'FORMSOF (INFLECTIONAL, bad)';
$sql = "SELECT filename, size, path
FROM fileserver.System..SCOPE('DEEP TRAVERSAL OF
\"$dir\"')
WHERE CONTAINS(Contents, '$keyword')";
$res = oledb_query($sql, $link);

Note that the double period is not a typo. Windows Authentication is
used to determine what files are visible. For the code above to work
the web server has to run as a user on the network.

sutt...@yahoo.com

unread,
Aug 23, 2006, 6:04:32 PM8/23/06
to
I just came across this and it is spectacular. It works great and
makes using the indexing service to handle the heavy lifting of
searching a breeze. Thank you.

Is there anywhere to find more advanced examples like boolean searches,
use of wildcard characters, or searching across multiple file
attributes.

Mike

Chung Leong wrote:
> Here's the rest of the tutorial I started earlier:

...

Chung Leong

unread,
Aug 23, 2006, 10:24:26 PM8/23/06
to

sutt...@yahoo.com wrote:
> I just came across this and it is spectacular. It works great and
> makes using the indexing service to handle the heavy lifting of
> searching a breeze. Thank you.
>
> Is there anywhere to find more advanced examples like boolean searches,
> use of wildcard characters, or searching across multiple file
> attributes.

I'm not really an expert in Indexing Service. Here's something I just
came across:

http://www.unis.no/Search/ixqLang_UNIS.Htm

The query string described in the document goes into CONTAINS()
statement. I realize now that what I said about the double quoted
strings was wrong. It's used for searching multiple words in a sequence
(i.e. a sentence). You can use the prefix* syntax without the double
quotes.

To look specifiy multiple criteria, you just join them together in the
WHERE clauses as you would when querying a database.

Example:

SELECT path, filename, size, write
FROM SCOPE()
WHERE CONTAINS(contents, 'love AND NOT sex')
AND size > 10240
AND write > '01-01-2006'
ORDER BY size

The statement above looks for files containing 'love' but not 'sex',
that are larger than 10K and modified some time this year, and lists
them from the smallest to biggest.

To do a wildcard match against the filename, you use the LIKE
'%pattern%' syntax.

Example:

SELECT path, filename, size, write
FROM SCOPE()
WHERE filename LIKE '%.mp3'

This statement looks for files with the .mp3 extension.

sutt...@yahoo.com

unread,
Aug 24, 2006, 8:48:05 PM8/24/06
to
Thanks for the reply and leads. After I posted I was thinking about
the queries and realized about the WHERE ... AND ... thing. Also
looking at how MS implements it in their search dialog helped me
understand what was going on.

Thanks again.

Mike

0 new messages