Hi Owen.
The inbuild search provided by BaseX is combining the text from next file and then searching.
So if the line ends with word "end." and next line starts with "less", it will match search criteria "endless"
This is false positive matching. There is nothing much we can do about it as replacing with custom search will be slow.
Naval
Subject: | Re: Fwd: False Positives |
---|---|
Date: | Mon, 6 Mar 2023 21:38:43 +0530 |
From: | Sudarshana <sudar...@epicomm.net> |
To: | Naval Sarda <nsa...@epicomm.net>, jite...@epicomm.net |
Owen,
This was known issue we were informed you.
In fulltext search, if there is any space character
like (tab, space or new line) is present then it is
coming in result.
In file APQC.xml, Board of Governors of the
Federal Reserve System is one organization
and Bombardier Aerospace Inc. is next
adjacent organization.
So Board of Governors of the Federal Reserve System Bombardier Aerospace Inc. highlighted keyword is considering as tembom .
So those files are coming in result.
-Sudarshana
Get Outlook for iOS
From: Owen Ambur <owen....@verizon.net>
Sent: Monday, March 6, 2023 6:35 AM
To: Naval Sarda <nsa...@epicomm.net>
Cc: aboutthe...@googlegroups.com <aboutthe...@googlegroups.com>
Subject: False Positives
Hi Owen,
You may check the full text configuration cappabilities https://docs.basex.org/wiki/Full-Text like possitional filters and Fuzzy Quering. It may be a bug, but I would exclude configuration at first.
I can see that you are making good progresses, and love that you
have taken the basex option. I think that you are on the right
path.
Love to see progresses.
Kind regards.
By default, unless the languages codes ja, ar, ko, th, or zh are specified, a tokenizer for Western texts is used:Whitespaces are interpreted as token delimiters.
Since the logical flow of the text is not interrupted by the child elements, you will typically want to search across elements, so that the above paragraph would match a search for “real text”. For more examples, see XQuery and XPath Full Text 1.0 Use Cases.
To enable this kind of searches, it is recommendable to:Keep whitespace stripping turned off when importing XML documents. This can be done by ensuring that STRIPWS is disabled. This can also be done in the GUI if a new database is created (Database → New… → Parsing → Strip Whitespaces).