"If the text and asterisk are not delimited by double quotation marks, as in
CONTAINS (column, 'text*'), full-text search considers the asterisk as a
character and will search for exact matches to text*."
Can anybody confirm the above to be true ?
It is my experience that MS documentation in general - and in particular
their examples - contains a lot of errors, but here they specifically
mention that is should be possible to full-text search for 'text*' not
counting the asterisk as a wildcard. When I try to full-text search for
'text*' using CONTAINS the result includes records containing 'text' without
the asterisk.
Best regards
Johnny Nielsen
With double quotation marks wrapping around your search phrase it will
wildcard, so results will match the prefix or beginnings (what comes before
the *), but will have different suffixes or endings, and it will match with
words or tokens in your table that has an * at the end. The * will be
applied to all search arguments.
so a search on "al anon*" will match with al anon, alcoholics anonymous,
allan anont, but not and mike anon*.
If you search without the double quotation marks and you have multiple
search arguments
ie
select *from authors where contains(*,'al anon*')
your query will bomb with
Server: Msg 7631, Level 15, State 1, Line 1
Syntax error occurred near 'anon*'. Expected ''''' in search condition 'al
anon*'.
If you query on a single word or token with no double quotation marks you
will get not wildcarding, but will get an exact match to the anon, and
anon*.
If you query on a single word or token and wrap it in double quotation marks
you will get wildcarding and it will match with anon and anon*.
So for a single search word or token in your search phrase you need to wrap
your search phrase in double quotes to get wildcarding.
For multiple search arguments, words or tokens in your search phrase you
need to wrap your search phrase in double quotes to get wildcarding, but all
search arguments will be wildcarded and you will get matches to tokens or
words in your table ending with the * and matching the search argument
exactly.
In FreeText the * will be ignored, but will be return hits to matches to the
search arguments which are stemmed or unstemmed, and end in an * and don't
end with the *.
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:unzSUC6d...@tk2msftngp13.phx.gbl...
So, just to be sure I understand you correctly - if you perform a SELECT *
FROM SOMETABLE WHERE CONTAINS(*,'test*') you will only match words
containing the five characters 'test*' and not the four-character word
'test' ?
On my system CONTAINS(*,'test*') and CONTAINS(*,' "test*" ') seems to return
the same results (it treats the asterisk in 'test*' as a wildcard which is
not what I want).
> The behavior changes if you have multiple search arguements in your search
phrase.
Can the behavior of the single-quoted simple term 'test*' change by joining
it to other arguments ?
I know the meaning of double-quotes in prefix terms, but I'm only interested
in the single-quoted non-prefix cases (I'm writing a search-condition parser
which is why the specific workings are important to me).
Best regards
Johnny Nielsen
With Select * from tablename where contains(*,'"test*"') I will get hits to
test, test*, and tested as it is now wildcarding
You can't really change the behavior of the wildcarding by joining it to
other clauses, unless perhaps you do a join to a like that looks like this
like '% test %'
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:OXHRNu$dEHA...@TK2MSFTNGP10.phx.gbl...
I'm sorry Hillary, but now I'm really confused. According to the MS
documentation that you seem to agree upon the second asterisk in
CONTAINS(*,'test*') isn't acting as a wildcard but is a part of the word
searched for.
Then how come it will match the word test without an appended asterisk ?
That would be similar to CONTAINS(*,'server') matching the word serve
without the appended 'r'.
> With Select * from tablename where contains(*,'"test*"') I will get hits
to
> test, test*, and tested as it is now wildcarding
I agree and I have now problem understanding the functionality of the
double-quotes - they seem to work as expected :-)
It is only the functionality of the single-quoted cases that troubles me.
> You can't really change the behavior of the wildcarding by joining it to
> other clauses, unless perhaps you do a join to a like that looks like this
>
> like '% test %'
OK, I just tried to find out what you meant by "the behavior changes if you
have multiple search arguements in your search phrase" .. you didn't mean
that the behavior of the single-quoted term then.
PS: This has nothing to do with the current subject, but do you happen to
know the valid range of full-text searchable unicode characters ? I can't
seem to find a clear definition in the MS documentation .. not even a
definition of 'punctuation' symbols.
Best regards
Johnny Nielsen
So test and test* are both indexed as test. So a search on test will return
hits to test and test*.
All text (even ascii) is indexed as its unicode equivalent. Its punctuation
marks which are treated as white space and you get this problem with.
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:u5Jc8EBe...@TK2MSFTNGP09.phx.gbl...
..but as I quoted from the MS online books the asterisk is *not* treated as
white space when used inside single-quotes - that's my point. To repeat the
MS statement from the online books, that I originally questioned: [If the
text and asterisk are not delimited by double quotation marks, as in
CONTAINS (column, 'text*'), full-text search considers the asterisk as a
character and will search for exact matches to text*.]
> So test and test* are both indexed as test. So a search on test will
return
> hits to test and test*.
Well, according to MS the asterisk in single-quotes is treated as a
character and therefore test and test* should be considered two different
words. That is, contains(*,'test*') should never match the word test.
> All text (even ascii) is indexed as its unicode equivalent. Its
punctuation
> marks which are treated as white space and you get this problem with.
Sadly this is rather an additional problem. Building a parser is hard job
when the precise definition of what is considered punctuation symbols and
indexable characters isn't available to the public :-(
In other words a search on test will return the same hits on a search on
test*.
If you wrap the phrase with double quotes you will get wildcarding, but it
will still match with test*.
Think of it this way. While indexing when MSSearch sees non white space, non
alphanumeric characters it strips them out and indexes the alphanumeric
characters.
So test* is indexed as test and is indistinguishable from test.
When you search on test* the * is stripped off and the search is performed
on test.
Confusing I know.
HTH
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:%236Y8P9K...@tk2msftngp13.phx.gbl...
Having my observations confirmed I now know how not to proceed :-)
I really should stop trusting those MS examples...
Best regards
Johnny*Nielsen
"Hilary Cotter" <hil...@att.net> wrote in message
news:O$0froMeE...@TK2MSFTNGP12.phx.gbl...
Here is another stupid SQL FTS trick
try this
select * from tablename where contains(*,'#test*$')
or
select * from tablename where contains(*,'$test*$')
or
select * from tablename where contains(*,'"test*"')
Whatever delimiter you use it still works, even if you mix the delimiter
types.
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:uqCdXGNe...@TK2MSFTNGP10.phx.gbl...
Johnny, what is the OS platform that your SQL Server (select @@version) is
installed on?
Regards,
John
"Hilary Cotter" <hil...@att.net> wrote in message
news:O$0froMeE...@TK2MSFTNGP12.phx.gbl...
In the absense of proper ducumentation I have decided to use the .NET
frameworks char.IsLetterOrDigit to determine if a character is searchable.
But having found a '$' in the noise file I'm not sure if this is the right
approach...
Best regards
Johnny Nielsen
"John Kane" <jt-...@comcast.net> wrote in message
news:OHtNUSQe...@TK2MSFTNGP09.phx.gbl...
I have just discovered that '|' and '&' in a CONTAINS clause is treated as
OR and AND operators (undocumented of cause). And if you think that was to
be expected, then guess what '~' does ... no, it isn't the NOT operator ;-)
The fact that the Books Online isn't to be taken too seriously raises some
thoughts as to how well protected all those SQL/MSSearch based applications
out there really is when it comes to SQL-injection attacks or the developers
imagination of how things are working...
Best regards
Johnny Nielsen
"Hilary Cotter" <hil...@att.net> wrote in message
news:%23RRH1VP...@TK2MSFTNGP10.phx.gbl...
The '!' operator in CONTAINS(*,'!something') will result in a syntax error
if the word 'something' exists. I'm not sure what '!' does excatly but it
probably has some use.
Best regards
Johnny Nielsen
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:%23wo4Efa...@TK2MSFTNGP11.phx.gbl...
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE....@megabit.dk> wrote in message
news:e7aqh5ae...@TK2MSFTNGP11.phx.gbl...
Regards,
John
"Johnny Nielsen" <DONT_WRITE_...@megabit.dk> wrote in message
news:#F121uTe...@TK2MSFTNGP10.phx.gbl...
What DOES the '!' do ? (or maybe - if the operators is that well known - you
could tell me, which operators the CONTAINS clause supports beside the few
mentioned in the Books Online ?)
Best regards
Johnny Nielsen
"Hilary Cotter" <hil...@att.net> wrote in message
news:%237dQRxc...@tk2msftngp13.phx.gbl...
You can use ! in conjunction with AND or & (not that NOT, OR, and AND are
not case sensitive), but not | or OR.
check out the following links for more info
http://members.microsoft.com/partner/help/searchtips.aspx
discussion of ~ as a symbol for NEAR
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/acdata/ac_8_qd_15_0kdw.asp
--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT_WRITE_...@megabit.dk> wrote in message
news:u9zG69ee...@TK2MSFTNGP09.phx.gbl...
The char.IsLetterOrDigit seems to do a fair job, but it isn't bulletprof. An
example: the searchable unicode Currency Sign 'ยค' = U+00A4 (decimal 164)
evaluates to false by char.IsLetterOrDigit.
Best regards
Johnny Nielsen
"John Kane" <jt-...@comcast.net> wrote in message
news:u2%239Ywce...@TK2MSFTNGP10.phx.gbl...