Text search using regular expressions in Silverlight document viewer

57 views
Skip to first unread message

Support

unread,
Sep 20, 2012, 6:12:36 PM9/20/12
to silv...@googlegroups.com
Q:
 
need to do a regular expression search in your Silverlight document viewer.  Your javadoc indicates you have support for regular expressions:

 

http://www.pdftron.com/pdfnet/mobile/Javadoc/index.html?pdftron/PDF/TextSearch.html

 

But when I look at the .NET documentation for TextSearch, there’s no RegularExpresson search mode:

 

http://www.pdftron.com/silverdox/documentation/html/921e4239-28bf-0352-f131-fcc39413f520.htm

 

Does the Silverlight client not support regular expressions?  If it does, is there any documentation on the supported syntax, I’ve tried some simple searches and they don’t seem to work, i.e. (foo|bar).
 
------------
A:
 

You are correct that SilverDox does not directly support regular expressions, however you can implement this functionality on your end. You can use the TextSelector class to  programmatically select an entire page at a time (this selection will not be visually represented on screen), which returns the text and the bounding boxes. You could then search the text using a regular expression library, and use its results with the results from the TextSelector class to select the appropriate characters.

 
 

James

unread,
Sep 20, 2012, 8:37:13 PM9/20/12
to silv...@googlegroups.com
Q:  There’s a SelectByStruct and SelectByRect, which one do I use, and how do I get the start and end points for the entire page?

---------------------------------

A: Either would work, but SelectByRect might be slightly faster. The rectangle will be from (0,0) (the top left of the page) to (width, height) (the bottom right of the page), which can be determined using the Document class' Pages property.

James

unread,
Sep 20, 2012, 8:39:11 PM9/20/12
to silv...@googlegroups.com
Q: 

1)  When I select a whole page, does it return the text quads for all the words on a page?

2)  How would I highlight just a portion of that word? 

3)  Also, when I find matching text in the string provided by the text selection, how do I find it’s associated quad in the Quads collection? 

----------------------

A: 

1) It automatically merges quads, so the quads may be part of a word, a whole word, or multiple words.

2) As explained below, you would need to re-search SilverDox using a search result from your regex search, which will return a quad that represents a portion of a word when it is a match.

3) If your regular expression search returns a result, for example "foo" for the search "fo.", you would take note of the substring's character sequence numbers on the page (for example character numbers 257-259). You would then search the same text again, but instead of searching using regex, just search for the substring returned by the previous regex search (i.e. "foo"). If the substring search returns the same character sequence numbers as the regex search, then it's a true match. If the sequence numbers don't match, then you need to search for the substring again, noting that when searching for the same substring using SilverDox, the first result will be a false positive and should not be highlighted. Once you've repeated this procedure as many times as necessary, you can search for the collection of substrings found using regex with the SilverDox TextSearch class, and skip highlighting any of the results which have been detected as false positives.
Reply all
Reply to author
Forward
0 new messages