Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using Application.FileSearch to Search Full Text of Word Documents

142 views
Skip to first unread message

David W. Fenton

unread,
May 22, 2009, 12:14:11 PM5/22/09
to
The interface of the FileSearch dialog (to avoid opening it in the
Task Pane, in Access -- File | Open | Tools | Search) allows you to
search for text in files, but the programmatic interface doesn't
seem to have any properties for a search string that's looked for
inside the document instead of in the file name.

Is this functionality not exposed programmatically?

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

Lars Brownies

unread,
May 22, 2009, 1:53:11 PM5/22/09
to
This was recently posted by Arvin Meyer. You mean the .TextOrProperty
property.
Lars

Function SearchText()
' Search for text in a file
' Note: Must set a reference to Microsoft Office Object Library
Dim q As Long
With Application.FileSearch
.NewSearch
.LookIn = "D:\" 'Path
.SearchSubFolders = True
.FileName = "*.txt"
.TextOrProperty = "Arvin"
If .Execute() > 0 Then
For q = 1 To .FoundFiles.Count
Debug.Print .FoundFiles(q)
Next
End If
End With
End Function

"David W. Fenton" <XXXu...@dfenton.com.invalid> schreef in bericht
news:Xns9C137C7A755C4f9...@74.209.136.97...

Hans Up

unread,
May 22, 2009, 2:37:44 PM5/22/09
to
Lars Brownies wrote:
> This was recently posted by Arvin Meyer. You mean the .TextOrProperty
> property.
> Lars
>
> Function SearchText()
> ' Search for text in a file
> ' Note: Must set a reference to Microsoft Office Object Library

Why? My attempt works without a reference, and I can't see anything in
your version which requires it. Here's mine:

Public Sub fSearch()
Dim i As Integer
With Application.FileSearch
.NewSearch
.LookIn = "D:\Access\wip"
.TextOrProperty = "strFolder"
.MatchTextExactly = True
.FileType = 1 'msoFileTypeAllFiles
If .Execute > 0 Then
Debug.Print "FoundFiles.Count: " & .FoundFiles.Count
For i = 1 To .FoundFiles.Count
Debug.Print .FoundFiles(i)
Next i
End If
End With
'expected output:
'FoundFiles.Count: 1
'D:\Access\wip\uiLib.bas
End Sub

David W. Fenton

unread,
May 22, 2009, 7:27:15 PM5/22/09
to
"David W. Fenton" <XXXu...@dfenton.com.invalid> wrote in
news:Xns9C137C7A755C4f9...@74.209.136.97:

> The interface of the FileSearch dialog (to avoid opening it in the
> Task Pane, in Access -- File | Open | Tools | Search) allows you
> to search for text in files, but the programmatic interface
> doesn't seem to have any properties for a search string that's
> looked for inside the document instead of in the file name.
>
> Is this functionality not exposed programmatically?

OK, I misinterpreted the example code (by not reading it very
carefully). I've already mocked up my proof of concept and it was
very, every easy. I'll spend more time on UI than I will on
controlling the .FileSearch itself.

I should have tried it before posting!

David W. Fenton

unread,
May 22, 2009, 7:28:08 PM5/22/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gv6oq5$26m6$1...@textnews.wanadoo.nl:

> ' Note: Must set a reference to Microsoft Office Object Library

Why? The FileSearch object is a member of the Access Application
object.

Lars Brownies

unread,
May 23, 2009, 2:00:54 AM5/23/09
to
FYI: filesearch functionality has been removed in Access 2007.
Lars

"David W. Fenton" <XXXu...@dfenton.com.invalid> schreef in bericht

news:Xns9C13C5E684337f9...@74.209.136.98...

David W. Fenton

unread,
May 23, 2009, 11:59:17 AM5/23/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gv83ej$2mc8$1...@textnews.wanadoo.nl:

> FYI: filesearch functionality has been removed in Access 2007.

!!!!

I always hated the UI they provided, but as programmatic
functionality, it works pretty well.

Is it provided through some other library distributed with Office?

Lars Brownies

unread,
May 23, 2009, 12:57:15 PM5/23/09
to
> Is it provided through some other library distributed with Office?

I don't know. You might want to check out:
http://www.codematic.net/excel-tools/office-2007-filesearch.htm
which is a VBA-replacement for filesearch, also for Access (not free).

But a quick check shows that for instance the TextOrProperty property is not
supported.

Lars


"David W. Fenton" <XXXu...@dfenton.com.invalid> schreef in bericht

news:Xns9C1479F3E9681f9...@74.209.136.93...

David W. Fenton

unread,
May 23, 2009, 12:59:27 PM5/23/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gv83ej$2mc8$1...@textnews.wanadoo.nl:

> FYI: filesearch functionality has been removed in Access 2007.

OK, based on trolling the Knowledge Base, I see that they say to use
the FileSystemObject of the Windows Scripting Host instead.

But I see no way to search the content of files with that -- all the
examples I see show searching for file names, and the Object Browser
does not seem to indicate that the FSO has any properties for
defining a search string that is used for anything but filenames.

It seems that for searching text, they want you to use Windows
Desktop Search. I've tried installing it on my WinXP laptop and it
never works, so I'm rather worried about depending on it for a
client.

Likewise, the client is a small office with three desktops and a
fourth desktop repurposed as a peer-to-peer server, so this would
mean indexing the relevant files on all three workstations, which I
foresee can lead to all sorts of terrible problems (what if the
index isn't updated on one workstation for some reason -- searches
would then give different results). I'm also concerned with the
issue of indexing across the network (the files to be searched are
stored on the server), and with what kind of UI the Desktop Search
presents to the workstation user -- if it can't be hidden, then
users might be tempted to use it, and I'd prefer to have it indexing
only the data used by their Access app (to insure that it doesn't
bog down the machine, and doesn't waste time indexing files that
aren't crucial to their database app with the result that the
important files don't get into the index as fast as possible).

I guess it's theoretically possible to upgrade that peer-to-peer
server to Windows 2003 Server and then install the server version.

[time passes while I research]

Well, I can't get Windows Desktop Search to index anything on my
computer for some reason. It installs OK (I figured out that the way
to get rid of the MAPI32 error is to not have it index Outlook
Express or Outlook, neither of which I use for email -- this is
progress over the last time I tried installing it), but it says it's
done indexing and hasn't indexed any documents (even though I
pointed it at two folders that have bazillions of documents in them,
a mix of encrypted and unencrypted documents). I stopped and
restarted the indexing service and nothing happens.

Meanwhile, assuming I could eventually get it working, I was trying
to figure out how to script the thing. I found this early on, for
version 3 (the current version is 4):

http://www.microsoft.com/technet/scriptcenter/topics/desktop/wdsearch
.mspx#ETIAC

But after reading the code, it became quite clear that this example
demonstrates only how to search file system metadata (e.g.,
date/time, extended file properties, etc.) and not full text.

I also got to the developer page for Desktop Search 4:

http://www.microsoft.com/windows/products/winfamily/desktopsearch/cho
ose/windowssearch4/developers.mspx

but it's not clear to me whether this SDK:

http://www.microsoft.com/downloads/details.aspx?familyid=645300ae-5e7
a-4ce7-95f0-49793f8f76e8&displaylang=en

will be sufficient for making it work. Browsing the code, they don't
include VB-family samples any more, but from just looking at it, it
becomes obvious that this is just the same thing as what I found in
the version 3 scripting examples -- i.e., it doesn't allow you to
search full text, only file metadata.

Am I hosed here for the future? Should I look for some non-MS
technology instead?

David W. Fenton

unread,
May 23, 2009, 1:10:43 PM5/23/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gv83ej$2mc8$1...@textnews.wanadoo.nl:

> FYI: filesearch functionality has been removed in Access 2007.

Another question:

How is full-text searching provided for in Office 2007? Do you have
to install Windows Desktop Search?

David W. Fenton

unread,
May 23, 2009, 7:03:23 PM5/23/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gv99t8$1t1$1...@textnews.wanadoo.nl:

> I don't know. You might want to check out:
> http://www.codematic.net/excel-tools/office-2007-filesearch.htm
> which is a VBA-replacement for filesearch, also for Access (not
> free).
>
> But a quick check shows that for instance the TextOrProperty
> property is not supported.

It certainly doesn't look like it searches the text of the files,
but then, I didn't think the FileSearch object did either when I
first looked at it's object methods/properties.

If it doesn't, why would anyone buy VBA code to do what the FSO
already does? It's not like it's hard to use the FSO with late
binding! And it would be very annoying to buy the code and find out
it uses the FSO!

Lars Brownies

unread,
May 24, 2009, 8:51:24 AM5/24/09
to
> Am I hosed here for the future? Should I look for some non-MS
> technology instead?

In your subject you specifically mention Word Documents. If you only need to
search those type of files then you might want to try Word automation or
FSO, though I guess it would be too slow for your needs. Other option is to
do a binary read, though I didn't get it to work yet.

For examples on both, see http://www.vbforums.com/showthread.php?t=162686 .

Have you considered Google Desktop? It has API's, also for indexing.

Lars


"David W. Fenton" <XXXu...@dfenton.com.invalid> schreef in bericht

news:Xns9C148426F4A1f9...@74.209.136.82...

David W. Fenton

unread,
May 24, 2009, 1:20:08 PM5/24/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gvbfs7$qj2$1...@textnews.wanadoo.nl:

>> Am I hosed here for the future? Should I look for some non-MS
>> technology instead?
>
> In your subject you specifically mention Word Documents. If you
> only need to search those type of files then you might want to try
> Word automation or FSO,

Do you mean, for instance, opening each document with Word
automation and using Find within Word? Ack. Sounds awful. I'm not
sure how one would use FSO for the same thing because it doesn't do
binary, right? It would be a matter of opening the files with
traditional VBA File I/O functions and looking for the string,
right?

> though I guess it would be too slow for your needs. Other option
> is to do a binary read, though I didn't get it to work yet.
>
> For examples on both, see
> http://www.vbforums.com/showthread.php?t=162686 .

That has instructions for Word automation, which I'd also assume is
going to be way too slow (we're looking at a search of 2K-3K small
documents, approximately 1 or 2 pages each). A binary search should
be faster, but I hestitate to code that up myself -- I've always
found the VBA File I/O functions remarkably opaques (they are
dinosaurs from the days before the "object-oriented" structure of
VB(A), seems to me), and just don't feel comfortable with them.

I've basically reached the conclusion that Windows Desktop Search is
not automatable from Access, based on this:

http://tinyurl.com/o44jxh
(which is this badly-wrapped URL:
http://social.msdn.microsoft.com/forums/en-US/windowsdesktopsearchdev
elopment/thread/edbf24e9-f4a1-448f-b7ce-5202faeaae68)

where the last post says:

you cannot add a reference to wdsquery.dll in Access VB

where "WDSQuery.DLL" is the libary that is the interface to
controlling the querying engine for QDS's full-text index.

So, unless there's some other interface that I can't find a
reference to, MS doesn't appear to be providing any provision in
Office 2007 to replace FileSearch's full-text searching
capabilities. This seems to me to be a terrible oversight on their
part.

> Have you considered Google Desktop? It has API's, also for
> indexing.

I've got the API page open in my browser but haven't looked into it.
Google Desktop has the same problems as standard Windows Desktop
Search, because it has to run on each workstation (so far as I
know). At least with WDS there's the possibility of running the
server version so the users aren't maintaining 3 copies of the
index. And Google Desktop is awfully dangerous, in my opinion, and
has a bloody awful and intrusive UI.

David W. Fenton

unread,
May 24, 2009, 4:17:37 PM5/24/09
to
Well, I seem to have licked the problem using Dir() and File I/O.
Performance issues on the number of documents I've tested it on are
very little different from using the FileSearch object. Searching
747 Word documents totalling just over 100MBs, these are some
results:

FileSearch (Brahms): 14 (00:19)
IO (Brahms): 14 (00:22)

That's searching for "Brahms", returning 14 matching documents, and
taking 19 and 22 seconds respectively. I've run the tests several
times and as long as I let Access have the whole CPU (and don't flip
over to a different program or start up a new web browser instance),
the timings remain pretty uniform. The FileSearch is always slightly
faster, but not enough, it seems to me, to make it problematic.

In reality, the number of searched documents is likely to be 4 times
what I tested with, but the files will on average be smaller. In
other words, the search of each file will take less time, but there
will be more file open/close operations. Even if it is strictly
linear compared to this, I don't feel that searching 3000 documents
is going to be a problem for the end users even if it takes almost
two minutes. I'll have to test the interactive file search on the
client's PCs (which is what they are doing now, searching manually)
to see how long it takes, but I suspect the ability to automate the
matching of the results with the data in their database will be
sufficient to overcome any resistance that might come from having to
wait two minutes.

I can easily do all sorts of things to make it less boring, such as
populating a listbox on a form with each match and letting DoEvents
show it to the end user.

Thanks to Chuck Grimsby for his text file class
(http://www.mvps.org/access/modules/mdl0057.htm), which helped me
pin down some of the issues I needed to deal with (though my results
bear no resemblance to his elaborate class module).

Sample code of my File I/O-based solution follows my signature (may
include some errors due to generalizing it for posting).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

Public Function SearchFileForText(strFileName As String, _
strSearchText As String) As Boolean
On Error GoTo errHandler
Dim intFile As Integer
Dim strFileContent As String

intFile= FreeFile
Open strFileName For Binary As #intFile
strFileContent = String(LOF(intChannelNumber), " ")
Get #intFile, , strFileContent
SearchFileForText = InStr(strFileContent, strSearchText)

exitRoutine:
Close #intFile
Exit Function

errHandler:
MsgBox Err.Number & ": " & Err.Description, vbExclamation, _
"Error in SearchFileForText()"
Resume exitRoutine
End Function

Public Function FileSearch(strSearchFolder as String, _
strFileSpec As String, strSearchFor As String) As String
On Error GoTo errHandler
Dim dteStart As Date
Dim strFileName As String
Dim intRowCount As Integer
Dim strFileList As String

dteStart = Now()
If Len(strSearchFolder ) = 0 Then Exit Sub
strFileName = Dir(strSearchFolder & "\" & strFileSpec)
If Len(strFileName) = 0 Then
MsgBox "No documents in the selected folder!"
Exit Sub
End If
Do Until Len(strFileName) = 0
If Len(strFileName) = 0 Then Exit Do
If SearchFileForText(strSearchFolder & "\" & strFileName, _
strSearchFor) Then
intRowCount = intRowCount + 1
strFileList = strFileList & ", " & strFileName
End If
'Debug.Print "Dir: " & strFileName
strFileName = Dir
Loop
FileSearch = Mid(strFileList, 3)

exitRoutine:
Debug.Print "IO (" & strSearchFor & "): " & intRowCount _
& " (" & Format(Now() - dteStart, "nn:ss") & ")"
Exit Sub

errHandler:
MsgBox Err.Number & ": " & Err.Description, vbExclamation, _
"Error in FileSearch()"
Resume exitRoutine
End Function

David W. Fenton

unread,
May 24, 2009, 5:02:57 PM5/24/09
to
OK, I just realized it's pretty easy to convert what I did into a
class module. Here's the code I was using with the FileSearch
object:

With Application.FileSearch
.NewSearch
.LookIn = Me!txtSearchFolder
.FileName = "*.doc"
.TextOrProperty = Me!txtSearchString
If .Execute() > 0 Then


For i = 1 To .FoundFiles.Count

strFileName = Dir(.FoundFiles(i), vbDirectory)
If Right(strFileName, 3) = "doc" Then


intRowCount = intRowCount + 1

strRowSource = strRowSource & "; " & strFileName
'Debug.Print "FileSearch: " & strFileName
End If
Next i
Me!lstFilesFound.RowSource = Mid(strRowSource, 3)
Me!lstFilesFound.Visible = True
Else
MsgBox "There were no matches."
End If
End With

The class module posted after my signature can pretty easily replace
the above:

Dim clsFileSearch As New clFileSearch

With clsFileSearch
.NewSearch
.LookIn = Me!txtSearchFolder
.FileName = "*.doc"
.TextOrProperty = Me!txtSearchString
If .Execute() > 0 Then


For i = 1 To .FoundFiles.Count

strFileName = Dir(.FoundFiles(i), vbDirectory)
If Right(strFileName, 3) = "doc" Then


intRowCount = intRowCount + 1

strRowSource = strRowSource & "; " & strFileName
'Debug.Print "FileSearch: " & strFileName
End If
Next i
Me!lstFilesFound.RowSource = Mid(strRowSource, 3)
Me!lstFilesFound.Visible = True
Else
MsgBox "There were no matches."
End If
End With

Now, I didn't implement any features of the FileSearch object that
I'm not using, but that pretty much replaces it all simply by
replacing the WITH definition.

The class is after my signature. Any suggestions and improvements
gratefully received.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

' clFileSearch
' May 24, 2009
' Created by David W. Fenton
' David Fenton Associates
' http://dfenton.com/DFA/

Option Compare Database
Option Explicit

Dim strLookIn As String
Dim strFileName As String
Dim strTextOrProperty As String
Dim mcolFoundFiles As Collection

Public Sub NewSearch()
strLookIn = vbNullString
strFileName = vbNullString
strTextOrProperty = vbNullString
Set mcolFoundFiles = New Collection
End Sub

Public Property Let LookIn(pstrLookin As String)
strLookIn = pstrLookin
End Property

Public Property Get LookIn() As String
LookIn = strLookIn
End Property

Public Property Let FileName(pstrFileName As String)
strFileName = pstrFileName
End Property

Public Property Get FileName() As String
FileName = strFileName
End Property

Public Property Let TextOrProperty(pstrTextOrProperty As String)
strTextOrProperty = pstrTextOrProperty
End Property

Public Property Get TextOrProperty() As String
TextOrProperty = strTextOrProperty
End Property

Public Property Get FoundFiles() As Collection
Set FoundFiles = mcolFoundFiles
End Property

Private Function SearchFileForText(strFileName As String, _


strSearchText As String) As Boolean On Error GoTo errHandler
Dim intFile As Integer
Dim strFileContent As String

intFile = FreeFile


Open strFileName For Binary As #intFile

strFileContent = String(LOF(intFile), " ")


Get #intFile, , strFileContent
SearchFileForText = InStr(strFileContent, strSearchText)

exitRoutine:
Close #intFile
Exit Function

errHandler:
MsgBox Err.Number & ": " & Err.Description, vbExclamation, _

"Error in clFileSearch.SearchFileForText()"
Resume exitRoutine
End Function

Public Function Execute() As Integer
On Error GoTo errHandler
Dim strFileMatch As String
Dim strRowSource As String

strFileMatch = Dir(strLookIn & "\" & strFileName)
If Len(strFileMatch) = 0 Then
Exit Function


End If
Do Until Len(strFileName) = 0

If Len(strFileMatch) = 0 Then Exit Do
If SearchFileForText(strLookIn & "\" & strFileMatch,
strTextOrProperty) Then
mcolFoundFiles.Add strLookIn & "\" & strFileMatch, _
strLookIn & "\" & strFileMatch
End If
strFileMatch = Dir
Loop
Execute = mcolFoundFiles.Count

exitRoutine:


Exit Function

errHandler:
MsgBox Err.Number & ": " & Err.Description, vbExclamation, _

"Error in clFileSearch.Execute()"
Resume exitRoutine
End Function

Private Sub Class_Initialize()
Set mcolFoundFiles = New Collection
End Sub

Lars Brownies

unread,
May 24, 2009, 6:35:29 PM5/24/09
to
I tested it and it works well.

I noticed that
Dim i As Integer
Dim strFileName As String
Dim intRowCount As Integer
Dim strRowSource As String
were missing from the code where you call clsFileSearch.

The
.FileName = "*.doc"
and


If Right(strFileName, 3) = "doc" Then

are not congruent. If you'd change .Filename to "*.xls", the code wouldn't
work. But I guess this is just an example.

Searching in underlying folders would be a nice option.

Also, it would be nice if the user could choose between searching for text
in the filename or for the actual text in the file. Since the filename is
always somewhere in the binary text, a user would get a hit, thinking it's
in the actual text but in fact it would be the text in the filename. But
you'd have to hassle with the binary string, which may not be very stable.

Lars


"David W. Fenton" <XXXu...@dfenton.com.invalid> schreef in bericht

news:Xns9C15AD6F4FF11f9...@74.209.136.91...

David W. Fenton

unread,
May 25, 2009, 2:33:14 PM5/25/09
to
"Lars Brownies" <La...@Brownies.com> wrote in
news:gvci3i$16cg$1...@textnews.wanadoo.nl:

> I tested it and it works well.
>
> I noticed that
> Dim i As Integer
> Dim strFileName As String
> Dim intRowCount As Integer
> Dim strRowSource As String
> were missing from the code where you call clsFileSearch.

Sure -- I was only giving the minimal example without the whole
code. You'll note that the listbox I was populating was absent, too!

> The
> .FileName = "*.doc"
> and
> If Right(strFileName, 3) = "doc" Then
> are not congruent. If you'd change .Filename to "*.xls", the code
> wouldn't work. But I guess this is just an example.

You're right. It's a case of holdovers from the original code, which
with the FileSearch object was originally attempting to use the
FileType property. I'm currently rewriting the class module to
support more of the properties/methods of the FileSearch object --
it's my holiday weekend fun project!

> Searching in underlying folders would be a nice option.

Already

> Also, it would be nice if the user could choose between searching
> for text in the filename or for the actual text in the file. Since
> the filename is always somewhere in the binary text, a user would
> get a hit, thinking it's in the actual text but in fact it would
> be the text in the filename. But you'd have to hassle with the
> binary string, which may not be very stable.

I wouldn't even know where to start on that. Depending on the file
type, the file name is likely going to be in a different place.

Also, I'm not certain what happens with discarded text, such as in
the history of a Word document (i.e., as you edit, text is not
discarded, but marked as no longer needed), or in discarded data
pages in an MDB. I assume the FileSearch object is smart enough to
ignore all of that, but there's no way for me to do that so far as I
can see (unless I automate Office apps, which would seem to me to be
way too slow).

Lars Brownies

unread,
May 25, 2009, 4:18:09 PM5/25/09
to
>> Also, it would be nice if the user could choose between searching
>> for text in the filename or for the actual text in the file. Since
>> the filename is always somewhere in the binary text, a user would
>> get a hit, thinking it's in the actual text but in fact it would
>> be the text in the filename. But you'd have to hassle with the
>> binary string, which may not be very stable.
>
> I wouldn't even know where to start on that. Depending on the file
> type, the file name is likely going to be in a different place.

I agree. I wouldn't know either. Thinking about it, it might be an idea to
offer the user 2 search options:
1. Search filenames only
2. Search filenames and text in files

Then, if the user chooses option 1 you don't have to open any file, and
search the filenames only, which will give a quicker result.

Lars


"David W. Fenton" <XXXu...@dfenton.com.invalid> schreef in bericht

news:Xns9C16940E252FEf9...@74.209.136.93...

David W. Fenton

unread,
May 26, 2009, 10:25:12 PM5/26/09
to
I'm posting all of this here in the thread, but I think I'll repost the
whole thing in a new thread, in the hopes of attracting the attention
of those who've stopped paying attention to this particular thread.

"Lars Brownies" <La...@Brownies.com> wrote in

news:gveue0$21me$1...@textnews.wanadoo.nl:

>>> Also, it would be nice if the user could choose between
>>> searching for text in the filename or for the actual text in the
>>> file. Since the filename is always somewhere in the binary text,
>>> a user would get a hit, thinking it's in the actual text but in
>>> fact it would be the text in the filename. But you'd have to
>>> hassle with the binary string, which may not be very stable.
>>
>> I wouldn't even know where to start on that. Depending on the
>> file type, the file name is likely going to be in a different
>> place.
>
> I agree. I wouldn't know either. Thinking about it, it might be an
> idea to offer the user 2 search options:
> 1. Search filenames only
> 2. Search filenames and text in files
>
> Then, if the user chooses option 1 you don't have to open any
> file, and search the filenames only, which will give a quicker
> result.

I've already implemented it that way. It's very fast, of course, as
it's just the result of a Dir(). I haven't sorted the results,
though. I'd leave that up to someone using the class module (they'd
just have to take the .FilesFound collection and output it to an
array and sort it).

Anyway, I've gone ahead and finished it up. While it's intended as a
replacement for most of the functionality that is found in the
FileSearch object, but some of it I couldn't replicated, or didn't
feel had any value.

Here's the summary from the class module header:

' FileSearch Object Properties and Methods
' PROPERTIES
' Application : irrelevant
' Creator : implemented
' FileName : implemented
' FileType : implemented
' FileTypes : implemented
' FileTypeSpecify : add by DWF
' FoundFiles : implemented
' LastModified : implemented
' LastModifiedSpecify : added by DWF
' LastModifiedSpecifyEnd : added by DWF
' LastModifiedSpecifyStart : added by DWF
' LookIn : implemented
' MatchAllWordForms : not implemented
' MatchTextExactly : implemented
' PropertyTests : not implemented
' SearchFolders : not implemented
' SearchPropertiesOnly : added by DWF
' SearchScopes : not implemented
' SearchSubFolders : implemented
' SearchTextOnly : added by DWF
' TextOrProperty : implemented
'
' METHODS
' Execute : implemented -- extended by DWF
' NewSearch : implemented
' RefreshScopes : not implemented

I could not implement MatchAllWordForms, unfortunately, as that
depends on functionality that is not exposed anywhere that I know
of. I didn't feel that the SearchFolders and SearchScopes
collections collections were worth implementing, as they are aspects
of the FileSearch object as it exists that make it incredibly
complicated to work with. I also didn't implement the PropertyTests
collection as I couldn't see much utility in it. Because I didn't
implement the SearchScopes collection, there was no need to
implement the RefreshScopes method.

BTW, if you're wondering about the reference for the Office
FileSearch object, all you need to do is open the VBAOF11.CHM (or 10
or 12) help file and search for "filesearch." The reference is
pretty complete, and it's what I used to work things out.

I have more comments below, but if you're curious, you can download
it here:

http://dfenton.com/DFA/download/Access/FileSearch.zip

To work as designed, you also need to download the DSO OLE Document
Properties Reader:

http://support.microsoft.com/kb/224351 (version 2.1)

and register the DSOFile.dll. The class module should be able to be
used for plain text searches without needing to install and register
this DLL (I used late binding so that if it's not registered, it
doesn't cause fatal errors for non-property-based searches).

Comments:

I implemented two of the Office enumerations, for MsoLastModified
and for MsoFileType. Both raise significant issues.

MsoLastModified
===============
I don't know how to interpret the meaning of the constants, which
have these names:

msoLastModifiedAnyTime
msoLastModifiedLastMonth
msoLastModifiedLastWeek
msoLastModifiedThisMonth
msoLastModifiedThisWeek
msoLastModifiedToday
msoLastModifiedYesterday

(I also added msoLastModifiedSpecify since it was crucial to
implement additional functionality that I wanted to add)

AnyTime, Today and Yesterday are quite clear, of course. But it's
not clear to me hnow LastMonth, LastWeek, ThisMonth and ThisWeek
should be interpreted. I interpreted them respectively as:

msoLastModifiedLastMonth
: Between DateAdd("m", -2, Date And DateAdd("m", -1, Date
msoLastModifiedLastWeek
: Between DateAdd("ww", -2, Date And DateAdd("ww", -1, Date
msoLastModifiedThisMonth
: Between DateAdd("m", -1, Date And Date()
msoLastModifiedThisWeek
: Between DateAdd("ww", -1, Date And Date()

Another interpretation for the months would be that the current
month would be May, and the previous month would be April, and
likewise with week (current week beginning on Monday, May 25th, and
last week being the week beginning Monday, May 18th).

I don't know which makes more sense at all.

The logic behind it is in the class module's LastModified Property
Let and anyone could easily alter the definitions there to suit
their needs. MS hasn't implemented it consistently in any of the
places I looked, which included the WinXP search companion, Windows
Desktop Search and the Office FileSearch UI -- each one has
different choices and they aren't consistent with the choices
defined for the Office FileSearch object. I didn't test what the
results were, because I didn't have files with appropriate dates to
check on. I figured it's the kind of thing you'll either not use at
all (in which case it doesn't matter how I implement it), or it's
something you'll use but probably want to adjust the interpretation
for your own users. In other words, no matter what I implemented,
it's probably not going to get used in the exact form I did it if
it's used at all.

Anyway, in the sample database, I implemented a combo box of the
enumeration types, while also allowing the user to type in a
specific date. If you enter anything into the combo box other than
one of the predefined choices, the combo box's AfterUpdate event
assumes you want intended to type a specific date, and checks to see
if it's valid. If not, it prompts you to enter a valid date. Once
the valid date is entered, the TO textbox is enabled so you can
enter a range.

MsoFileType
===========
This is an enumeration of all the MS Office file types. It is
documented here:

http://msdn.microsoft.com/es-es/library/microsoft.office.core.msofile
type(VS.80).aspx

(I don't know why I ended up with the Spanish version)

I used that as a guide, but didn't follow it exactly. I also added
the OpenXML document extensions (e.g., xlsx, docx, etc.). For anyone
who uses it, note that the .FileType property handles only the file
types defined in the enumeration, which are:

msoFileTypeAllFiles
msoFileTypeBinders
msoFileTypeCalendarItem
msoFileTypeContactItem
msoFileTypeDatabases
msoFileTypeDataConnectionFiles
msoFileTypeDesignerFiles
msoFileTypeDocumentImagingFiles
msoFileTypeExcelWorkbooks
msoFileTypeJournalItem
msoFileTypeMailItem
msoFileTypeNoteItem
msoFileTypeOfficeFiles
msoFileTypeOutlookItems
msoFileTypePhotoDrawFiles
msoFileTypePowerPointPresentations
msoFileTypeProjectFiles
msoFileTypePublisherFiles
msoFileTypeTaskItem
msoFileTypeTemplates
msoFileTypeVisioFiles
msoFileTypeWebPages
msoFileTypeWordDocuments

I added a custom type:

msoFileTypeUserSpecified

so that the class module had a mechanism for handling other file
types. The .FileTypeSpecify property takes a single extension or a
comma-delimited list of extensions. The extension can be passed with
or without the wildcard character (everything before the . is
ignored). So you could assign "doc" or "*.doc" to the property and
it would have the same result, and likewise, "doc, csv, txt" or
"*.doc, *.csv, *.txt".

As I said, I interpreted the meaning of these constants more broadly
than the documentation said. For instance, there was no Excel
Spreadsheet choice that defined xls, xlt, etc. as the valid file
types, but only the ExcelWorkbooks constant. So I defined that as
being these extensions: xls, xlt, wbk, xlsx, xlsm, xltx, xltm, xlsb,
xlam (I don't know if there is an OpenXML wbk format -- I can't find
wbkx in Google, except as radio station call letters!).

If I wanted to provide a wide selection of file types beyond the
Office file types, it would be better to utilize only the
.FileTypeSpecify property since you could, for instance, provide a
multiselect listbox that would allow the selection of multiple file
types and then simply pass a comma-delimited list to the property. I
implemented the enumeration of Office documents in order to be
consistent with the Office FileSearch object but this somewhat
restricts the use of the .FileType property of the class module.
However, it does insure that the interface is completely consistent
in that regard with the FileSearch object -- as a replacement for
it, it will work fine.

Search File Properties
======================
In order to provide the ability to search OLE document properties, I
utilized an unsupported Microsoft ActiveX DLL, the DSO OLE Document
Properties Reader:

http://support.microsoft.com/kb/224351 (version 2.1)

As I said above, I wrote the class module with late binding so that
if it's not registered, it shouldn't be fatal. Without it, the
property searching doesn't work.

The implementation of this was very complicated, as I had to convert
VB6 code to figure out how it works, and it turned out that the
collection that the object returns (if you browse it with the object
viewer, it's the DSOFile.OleDocumentProperties.SummaryProperties
collection. I was never able to loop through that collection as
opposed to the DSOFile.OleDocumentProperties.CustomProperties
collection, which worked just fine), and had to hard code each
property retrieval one at a time. This resulted in some *very* ugly
code. Here's the code for checking date properties:

Private Function SearchPropertiesDate(dteSearch As Date, _
Optional strPropertyName As String) As Boolean
Dim bolSingleProperty As Boolean
Dim bolTemp As Boolean
Dim dteLastSaved As Date

bolSingleProperty = (Len(strProperty) > 0)
If bolSingleProperty Then
Select Case strProperty
Case "DateLastSaved"
GoTo DateLastSaved
Case "DateCreated"
GoTo DateCreated
Case "DateLastPrinted"
GoTo DateLastPrinted
End Select
End If
If dteSearch = 0 Then GoTo exitRoutine
DateLastSaved:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastSaved,dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateCreated:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateCreated, dteSearch)
If bolTemp Or bolSingleProperty Then GoTo exitRoutine
DateLastPrinted:
bolTemp=InStr(DSOFileDoc.SummaryProperties.DateLastPrinted,
dteSearch) If bolTemp Or bolSingleProperty Then GoTo exitRoutine

exitRoutine:
SearchPropertiesDate = bolTemp
End Function

Let me explain that code.

First, when searching all the properties, I wanted it to short
circuit -- that is, as soon as the first property matched the
criteria, I wanted to return True and exit. This is why

Second, I also wanted to be able to provide the capability of
searching for specific properties. The only way I could think to do
that was with GoTos, which is why it's so incredibly ugly!

But it's all caused by the lack of looping through the collection.
The code sample that came with the download of the DLL also didn't
walk through the SummaryProperties collection, so I doubt that it's
possible. If someone can figure it out, I'd gladly rework that code
to make it less stupid! For now, it does work, though.

Search Results May Not Be Identical to FileSearch Object
=========================================================
I've tried to be as flexible and commonsensical as possible in my
implementation of the searching, but there's a major interaction
between the logic of the class module's searching functionality and
UI. Let me explain.

The main guts of the search process is in a subroutine called
SearchFileForText. In it, first the files are searched for text,
then for last update (through the Access FileDateTime() function),
and then for the OLE properties. Whether or not SearchFileForText
returns true depends on two things:

1. the results of the three searches.

2. the appropriate combination of those three results.

Now, because there was no way for me to have a text search ignore
properties embedded in the header of the document, I made a decision
to allow a choice only between searching Properties Only or Both
Properties and Text. This is the way the FileSearch object actually
implements it, so I would tend to think it's not doing anything
special about reading only certain parts of the OLE stream.

I've tried to implement things in a way that makes sense and that
behaves sensibly, but you all know how that goes -- when you're
heads down programming on a project, you lose sight of the bigger
picture.

I hope that some of you will test it out and find bugs and maybe
figure out how to make it work better or more efficiently.

0 new messages