I have a problem with a SQL server full-text search on a
ntext field containing XML data. The XML Data is something
like this:
<COURSES>
<COURSE>
<NAME>Functiepunt Analyse Workshop</NAME>
<INSTITUTE>B-Company</INSTITUTE>
<FINISHED>2003</FINISHED>
</COURSE>
<COURSE>
<NAME>Microsoft TechEd Europe 2002</NAME>
<INSTITUTE>Microsoft</INSTITUTE>
<FINISHED>2002</FINISHED>
</COURSE>
</COURSES>
When i search for "TechEd" this record is selected.
When i search for "Workshop" this record is not selected.
When i search for "Workshop*" this record is selected.
I know SQL Server Full-text search uses a word breaker and
i think it uses a space as a delimiter for individual
words. The text "Workshop" is concatenated in the XML with
the Tag: "</NAME>". Might this be the reason why it is not
selected when i search for "Workshop"? If so does anybody
know a work-a-round for this problem?
Thanks in advance!
Peter
Also, if you are using SQL Server 2000 are you storing the XML data in an
IMAGE (BLOB) datatype or in a char/varchar/TEXT datatype? SQL FTS does not
use "space as a delimiter", but depending upon your version info and how you
are storing the XML data, "space as a delimiter" can be a factor.
Note, you can also post FTS related questions to the newsgroup:
microsoft.public.sqlserver.fulltext.
Regards,
John
"Pjotr" <peter....@atosorigin.com> wrote in message
news:073501c34d08$7deb5e00$a501...@phx.gbl...
Sorry about not providing the version info. I can imagine
it's important for this issue.
The version is:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17
2002 14:22:05 Copyright (c) 1988-2003 Microsoft
Corporation Enterprise Edition on Windows NT 5.0 (Build
2195: Service Pack 3)
I'm putting the data in a NTEXT datatype field. I don't
know how to retrieve the information about the
wordbreaker.dll.
Regards,
Peter
>.
>
I use the following table to reproduce the problem on Windows 2000 + SQL
Server 2000 SP3.
create table testftsxml (c1 int identity, descrip ntext)
insert testftsxml values(' <COURSES>
<COURSE>
<NAME>Functiepunt Analyse Workshop</NAME>
<INSTITUTE>B-Company</INSTITUTE>
<FINISHED>2003</FINISHED>
</COURSE>
<COURSE>
<NAME>Microsoft TechEd Europe 2002</NAME>
<INSTITUTE>Microsoft</INSTITUTE>
<FINISHED>2002</FINISHED>
</COURSE>
</COURSES>
')
insert testftsxml values('2 values')
After I fulltext the table, I can use the following query to select the
first record.
select * from testftsxml where contains(descrip, 'Workshop')
Could you perform the same steps on your side and check the result, so that
we can work on the same data?
Bill Cheng
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------
| Content-Class: urn:content-classes:message
| From: "Pjotr" <peter....@atosorigin.com>
| Sender: "Pjotr" <peter....@atosorigin.com>
| References: <073501c34d08$7deb5e00$a501...@phx.gbl>
<#zVD2BtT...@TK2MSFTNGP12.phx.gbl>
| Subject: Re: SQL Server Full-Text search problem in XML
| Date: Mon, 21 Jul 2003 00:01:03 -0700
| Lines: 92
| Message-ID: <043201c34f55$d94e55f0$a501...@phx.gbl>
| MIME-Version: 1.0
| Content-Type: text/plain;
| charset="iso-8859-1"
| Content-Transfer-Encoding: 7bit
| X-Newsreader: Microsoft CDO for Windows 2000
| X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300
| Thread-Index: AcNPVdlOMefK63Y+TXKupFFbaK20sg==
| Newsgroups: microsoft.public.sqlserver.xml
| Path: cpmsftngxa06.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.sqlserver.xml:19289
| NNTP-Posting-Host: TK2MSFTNGXA13 10.40.1.165
| X-Tomcat-NG: microsoft.public.sqlserver.xml
John (jkane)
"Bill Cheng [MSFT]" <bill...@online.microsoft.com> wrote in message
news:q85Uxl2T...@cpmsftngxa06.phx.gbl...
I am not sure how you reproduced the problem on your side. Could you try my
steps and let me know the results?
Bill Cheng
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------
| From: "John Kane" <jt-...@comcast.net>
| References: <073501c34d08$7deb5e00$a501...@phx.gbl>
<#zVD2BtT...@TK2MSFTNGP12.phx.gbl>
<043201c34f55$d94e55f0$a501...@phx.gbl>
<q85Uxl2T...@cpmsftngxa06.phx.gbl>
| Subject: Re: SQL Server Full-Text search problem in XML
| Date: Mon, 21 Jul 2003 15:57:02 -0700
| Lines: 167
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.3790.0
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
| Message-ID: <OJJamt9T...@TK2MSFTNGP11.phx.gbl>
| Newsgroups: microsoft.public.sqlserver.xml
| NNTP-Posting-Host: 12-228-206-224.client.attbi.com 12.228.206.224
| Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP11.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.sqlserver.xml:19307
| X-Tomcat-NG: microsoft.public.sqlserver.xml
I've tried it out and followed the steps on 4 different
machines. All have the same configuration according to the
results of the: "SELECT @@Version" statement.
On two of them i find no result on 2 of them i find
results. I have used this XML
string: '<COURSES><COURSE><NAME>Functiepunt Analyse
Workshop</NAME><INSTITUTE>B-
Company</INSTITUTE><FINISHED>2003</FINISHED></COURSE><COURS
E><NAME>TechEd Europe
2002</NAME><INSTITUTE>Microsoft</INSTITUTE><FINISHED>2002</
FINISHED></COURSE></COURSES>'
Weird isn't it?
On the two machines that do not work i only find results
when i search for 'Analyse'.
Regards,
Peter
>.
>
It is really strange. It seems that the problem does not happen on all
computers? Am I right? Could you post the difference between
normally-working and abnormally-working system, e.g. OS?
What if you install a new SQL Server 2000 system to test?
Bill Cheng
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------
| Content-Class: urn:content-classes:message
| From: "Pjotr" <peter....@atosorigin.com>
| Sender: "Pjotr" <peter....@atosorigin.com>
| References: <073501c34d08$7deb5e00$a501...@phx.gbl>
<#zVD2BtT...@TK2MSFTNGP12.phx.gbl>
<043201c34f55$d94e55f0$a501...@phx.gbl>
<q85Uxl2T...@cpmsftngxa06.phx.gbl>
<OJJamt9T...@TK2MSFTNGP11.phx.gbl>
<9MJO#Z$TDHA...@cpmsftngxa06.phx.gbl>
| Subject: Re: SQL Server Full-Text search problem in XML
| Date: Tue, 22 Jul 2003 03:01:52 -0700
| Lines: 269
| Message-ID: <080101c35038$4634ec50$a501...@phx.gbl>
| MIME-Version: 1.0
| Content-Type: text/plain;
| charset="iso-8859-1"
| Content-Transfer-Encoding: 7bit
| X-Newsreader: Microsoft CDO for Windows 2000
| X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300
| Thread-Index: AcNQOEY0SbbxAOYjRmuFqfPGUK07/Q==
| Newsgroups: microsoft.public.sqlserver.xml
| Path: cpmsftngxa06.phx.gbl
| Xref: cpmsftngxa06.phx.gbl microsoft.public.sqlserver.xml:19311
Below is Hilary Cotter's post in the fulltext newsgroup on this subject:
===============================================
The short answer is that the word breakers break on white
space at indexing time. In some circumstances they will
apply language specific rules on the words they index and
when on your search phrase.
The word breakers were designed for pure textual data.
The iFilters (which take the blob of data handed back to
them from MSDmn or CiDaemon) send only text to the word
breakers, they strip out tags like <>.
If you chose to circumvent the design intentions of SQL
FTS or Indexing Service or any of the other Microsoft
Search Products you will run into problems like this.
Store your xml docs in columns using the image data type,
and use a document type column of char(3) or char(4) to
contain the document extension - in your case xml or .xml.
Then install the Microsoft XML iFilter, keeping in mind
that to index attributes/elements which have values
greater than 32 characters you need to install OfficeXP
locally on your IS machine.
The XML iFilter will index
* values of sub-elements of the root element when
the sub elements have no child elements.
* attributes of the root element and attributes of
sub-elements of the root element
Consider a sample XML document:
<?xml version="1.0" ?>
<book title="MyBook">
This a sample book
<author>
First Last
<AGE>
20
</AGE>
</author>
<ISBN>
11111111111
</ISBN>
</book>
The attribute Title of the element Book, and ISBN would be
the only values that would be indexed and query able in
this case.
You might want to consider alternatives to MS's XML iFiltr
QuiLogic http://www.quilogic.cc/ifilter.htm XML IFilter
enables crawling of documents containing xml based data.
Our filter technology enables users to search and index
arbitrary structured xml files based on content. Likewise
VISIO or Excel files can be saved as xml files and indexed
with the help of our XML IFilter component.
Microsoft
3 Tier Technology http://www.3tt.com.au. Contact Robert
Stoneman (rsto...@3tt.com.au). 3TT’s XML iFilter has a
configuration file that allows you to fine tune the
properties you wish to index.
-- My reply to the above:
=================
Hilary, are you pre-assuming that Peter is using the Neutral wordbreaker or
just providing a very simple explanation?
Peter, when you defined the FT Index and FT Catalog on your XML data in a
NTEXT column, what "Language for Word Breaker" did you use? If you don't
know, you can use the system stored procedure sp_help_fulltext_columns to
get this info. Could you post it?
Best Regards,
John
"Bill Cheng [MSFT]" <bill...@online.microsoft.com> wrote in message
news:wgTjfgEU...@cpmsftngxa06.phx.gbl...
On the four machines i've checked the infosoft.dll. They
all are version 5.0.2195.4091.
I also checked the "Language for Word Breaker". On all
machines it is 1033 (English US).
Regards,
Peter
I have switched the "Language for Word Breaker" to
neutral. This solved my problem but not the mystery about
the different behaviour on the four machines.
I found some guidance in an excelent whitepaper about SQL
FTS
(http://download.microsoft.com/download/sqlsvr2000/utility/
2.7/win98mexp/en-us/Full-Text%20Search_Publish.exe):
START QUOTE:
If the data stored in a text column contains special
formatting (such as HTML or XML tags), Microsoft
recommends that you set the language of that column to
neutral.
END QUOTE
To all, thanks a lot for your help. Special thanks to John
Kane. I believe he contributed to this Microsoft
whitepaper.
Regards,
Peter
>.
>