Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Do Windows files have Unique IDs? Can they be retrieved by code?

30 views
Skip to first unread message

YisMan

unread,
Mar 23, 2007, 6:28:03 AM3/23/07
to
Hi Everyone,

I keep an Access database of all my song files, along with their respective
atributes. I want to be able to make changes to their attributes/properties
in the DB and use code to update them in Windows. As the names and hence the
full paths of these files may fall out of sync with the DB, I would like to
know this:

Does Windows keep a unique ID for each file in the file system throughout
its lifetime which will never change? and if yes, how will I retrieve it by
code so I can keep it in my DB in the file's record?

I checked the basic properties of the FileSystemObject as well as the
documentation of WMI, I haven't found anything yet.

Any ideas/suggestions, anyone?
Thankfully, YisMan

Dave O.

unread,
Mar 23, 2007, 6:42:53 AM3/23/07
to

"YisMan" <yis...@att.net> wrote in message
news:5C5C9AEE-2A26-4968...@microsoft.com...
> Hi Everyone,

> I checked the basic properties of the FileSystemObject as well as the
> documentation of WMI, I haven't found anything yet.

That's because there is nothing to find. Think about it for a moment, there
are an infinite number of potential files but if you are using a long only a
little over 4 billion possible numbers, so it is not possible for every file
to have a unique number, nor is there any useful reason to have one. You can
generate a CRC for any file, this is a number which can be used to check the
file for corruption as it changes if the file changes but for a quantity of
large files such as media it would take too long to calculate and would
change if the files tag was edited

If you want to avoid the database losing sync with the files, put the
editing tools into the database front end and have it update the file & the
database together.

If you can read the headers of your media files and know how long the
tag/header is, you should be able to grab a K or 2 of the file which will be
the same regardless of the size of the tag or header, you can then take a
CRC of this excerpt and put that into your database then the program can
identify the file even if its name and location are changed and the tag
edited.

Best Regards
Dave O.


YisMan

unread,
Mar 30, 2007, 3:20:02 AM3/30/07
to
Hi Dave,

Thank you very much for your detailed explanation, I found it very edifying.
I do find it very strange that Windows doesn't have a unique identifier for
each file in it's own internal database. How does Windows keep track of the
ever changing metadata of it's files, such as name, path, date modified etc.
if it doesn't have some ID key to reference?

Now to my problem. I never heard of CRC's. I did some research on the web on
the subject, it's way beyond my algebra. Even if I'll find a ready snippet, I
think it'll take forever for my code to iterate through a few thousand files
checking each one's CRC to see if that's the one that needs to be updated.

What you write about keeping the editing tools in the DB front, great idea,
actually I've already done that, to some degree. What's odd here is that the
lyrics, which are stored in Windows, cannot be accessed through the
FileSystemObject, as far as I tried. They can only be accessed through the
WMP.GetItem.Info method. Shouldn't Windows expose the extended properties
which it stores? Am I missing something?

Thanks again for your clear and detailed reply. It's a pleasure having such
helpful people around.


--
Thankfully, YisMan BS"D

J French

unread,
Mar 30, 2007, 3:56:15 AM3/30/07
to

Files are just chunks of data on disk

Their Name, dates, size and location are stored in a Directory Entry
that also points to the start of the file.

Certain programs and APIs know about the /expected/ internal format of
specific types of files and can go in to fetch data

www.wotsit.org has information about file formats

As for CRCs, I would take the CRC of the entire file and even if I
found two CRCs the same I would not assume that the contents are
identical.

There is a small, but very real possibility that two different chunks
of data will have the same CRC


Dave O.

unread,
Apr 2, 2007, 5:14:09 AM4/2/07
to

"YisMan" <yis...@att.net> wrote in message
news:7589F6A5-F956-44C8...@microsoft.com...

> Hi Dave,
>
> Thank you very much for your detailed explanation, I found it very
> edifying.
> I do find it very strange that Windows doesn't have a unique identifier
> for
> each file in it's own internal database. How does Windows keep track of
> the
> ever changing metadata of it's files, such as name, path, date modified
> etc.
> if it doesn't have some ID key to reference?
>
> Now to my problem. I never heard of CRC's. I did some research on the web
> on
> the subject, it's way beyond my algebra. Even if I'll find a ready
> snippet, I
> think it'll take forever for my code to iterate through a few thousand
> files
> checking each one's CRC to see if that's the one that needs to be updated.

Here is the code to provide a standard CRC - All in a module

Private CRCTable(255) As Long

Public Sub InitCRC()
Dim dwPolyN As Long
Dim i As Integer
Dim j As Integer
Dim dwCRC As Long

' Fill lookup table for CRC calculation - This sub is called once on loading
dwPolyN = &HEDB88320
For i = 0 To 255
dwCRC = i
For j = 8 To 1 Step -1
If (dwCRC And 1) Then
dwCRC = ((dwCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
dwCRC = dwCRC Xor dwPolyN
Else
dwCRC = ((dwCRC And &HFFFFFFFE) \ 2&) And &H7FFFFFFF
End If
Next j
CRCTable(i) = dwCRC
Next i

End Sub

Public Function GetCRCStr(Fnt As String) As Long
Dim b() As Byte
Dim lp As Long
Dim CRC32Result As Long
Dim iLookup As Integer
Dim fBuff As String
Dim ff As Long

fBuff = Fnt
ff = Len(Fnt)
CRC32Result = &HFFFFFFFF
ReDim b(ff)
b = StrConv(fBuff, vbFromUnicode)
For lp = 0 To ff - 1
iLookup = (CRC32Result And &HFF) Xor b(lp)
CRC32Result = ((CRC32Result And &HFFFFFF00) \ &H100) And 16777215
' nasty shr 8 with vb :/
CRC32Result = CRC32Result Xor CRCTable(iLookup)
Next
GetCRCStr = Not CRC32Result

End Function

''*** This was found on a site a long time ago - I can't remember where but
''*** whoever wrote it - many thanks.


Dave O.

unread,
Apr 2, 2007, 5:14:18 AM4/2/07
to
> As for CRCs, I would take the CRC of the entire file and even if I
> found two CRCs the same I would not assume that the contents are
> identical.
>
> There is a small, but very real possibility that two different chunks
> of data will have the same CRC

There is a problem with media files where tagging information can be added,
removed or edited but the actual media part of the file is unchanged. In
these cases a CRC of the whole file would report the file as different when
you want it to be the same. If you take a CRC of just the media part you
eliminate that potential problem. By choosing a chunk of media file which
you can always find regardless of any header or tag (or absence thereof)
means that the same media will return the same CRC.

There is no good reason getting the CRC of possibly 20 meg for a long high
quality MP3 file when a few k would give you just as much of a unique
identification.

What you say about the possibility of different files sharing a CRC is very
true and eventually inevitable. Using a chunk offers a way to ameliorate
this problem by designating 2 chunks from very different positions, getting
the CRC for both and then if one matches use the second as extra validation.
The chance of both CRCs matching for different files while existing is
preposterously low and can be disregarded for anything that is not
life-critical.

Regards

Dave O.


YisMan

unread,
Apr 4, 2007, 7:40:03 AM4/4/07
to
I thank you all for the help, I'm afraid it's getting too complicated for my
hobby project. Meanwhile I'm managing keeping my path's up to date. If it
will get out of hand I guess I'll resort to CRC's.
Meanwhile G-d bless you all.
--
Thankfully, YisMan

Steve

unread,
Apr 19, 2007, 3:54:13 PM4/19/07
to

> What you write about keeping the editing tools in the DB front, great idea,
> actually I've already done that, to some degree. What's odd here is that the
> lyrics, which are stored in Windows, cannot be accessed through the
> FileSystemObject, as far as I tried. They can only be accessed through the
> WMP.GetItem.Info method. Shouldn't Windows expose the extended properties
> which it stores? Am I missing something?

YisMan,

I know this thread is a bit old but if you or anyone else finds it and
is interested, the reason you can not find the meta-data (ie. lyrics)
for WMP managed files in the file itself is because WMP stores some
data only in it's own database. Somewhere there is a .chm (help file)
for the WMP object model that specifies which data is stored in the
files tag and the database or just in the database.

Hope this helps,
Steve

YisMan

unread,
Apr 19, 2007, 5:24:04 PM4/19/07
to
Thank you very much, Steve.

Yes, I am still very interested in the subject. Can you tell me more about
this? What I'd like to know is:

A) Is there a way I could get/set the File's own properties w/o the WMP
object. Actually as per an answer I got from Alessandro Angeli, I downloaded
the Windows Media Format SDK. Unfortunately, for the time being I am still
struggling to convert the COM wrapper from C# to VB.NET (I'm illiterate in C.
Alessandro, if you see this, and have a VB translation I'd be grateful for
it). In any case, it is not the simplest object model. If you have a simpler,
more intuitive way of doing this, I'd love to hear about it.

B) Where is this database that WMP uses? Can I get a hold of it? can I
change/read the info therin? back it up?

Thank you very much, YisMan

Steve

unread,
Apr 20, 2007, 8:57:57 AM4/20/07
to
> > Steve- Hide quoted text -
>
> - Show quoted text -

YisMan,

It sounds as if you are doing something very similar to what I did a
few years back...and running into the same difficulties.

I found the WMP library to be such a PIA to work with that I abandoned
it all together and created my own DB and playlist management system.
The system I created uses ID3 ver. 2 tags to store the extended info
about my mp3 files. I found a class on Mike Suttons web site (http://
EDais.mvps.org/) that handles both ID3 version 1 and version 2 tags.
I however am not trying to store the lyrics just simple data such as
artist, recorded date etc.

As for keeping my DB insync with the actual files I simply do (as a
previous poster suggested) modify both the file and the DB at the same
time. This approach ofcoarse assumes that no other mechanisim is used
to edit the file data. Since my app is just a jukebox/media library
app for my own personal use I can be sure that this wont happen..

Hope this helps,
Steve

Tony Proctor

unread,
Apr 20, 2007, 10:14:17 AM4/20/07
to
I think what you're looking for YisMan is the "File Index". This can be
obtained from the BY_HANDLE_FILE_INFORMATION structure, via
GetFileInformationByHandle. The file index is bigger than a long but if you
represent it in textual form then you can still use it as a unique key. For
instance...

Const INVALID_HANDLE_VALUE = -1

Const OPEN_EXISTING = 3

Private Declare Function CloseHandle Lib "kernel32" ( _
ByVal hObject As Long) As Long

Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileA" ( _
ByVal lpFileName As String, _
ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, _
ByVal lpSecurityAttributes As Long, _
ByVal dwCreationDisposition As Long, _
ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long

Private Declare Function GetFileInformationByHandle Lib "kernel32" ( _
ByVal hFile As Long, lpFileInformation As BY_HANDLE_FILE_INFORMATION) As
Long

Private Type FILETIME
dwLowDateTime As Long
dwHighDateTime As Long
End Type

Private Type BY_HANDLE_FILE_INFORMATION
dwFileAttributes As Long
ftCreationTime As FILETIME
ftLastAccessTime As FILETIME
ftLastWriteTime As FILETIME
dwVolumeSerialNumber As Long
nFileSizeHigh As Long
nFileSizeLow As Long
nNumberOfLinks As Long
nFileIndexHigh As Long
nFileIndexLow As Long
End Type

Private tInfo As BY_HANDLE_FILE_INFORMATION 'Current file information

Private Function sGetFileInfo(sFile As String) As String
' Fill out tInfo with file information. Returns an error string if this
fails, e.g.
' if the file doesn't exist.
Dim hFile As Long

' Open the file to get attributes (no I/O intended)
hFile = CreateFile(sFile, 0, 0, 0, OPEN_EXISTING, 0, 0)
If hFile = INVALID_HANDLE_VALUE Then
sGetFileInfo = sGetErrMessage()
Exit Function
End If
' Read the unique file index, and the file's modifications date/time
If GetFileInformationByHandle(hFile, tInfo) = 0 Then
sGetFileInfo = sGetErrMessage()
End If
CloseHandle hFile
End Function

Private Function sFileIndex() As String
' Returns the unique file index for the current file as 16-digit hex string
(i.e. 64
' bits formatted as hex)

sFileIndex = Right$("0000000" & Hex(tInfo.nFileIndexHigh), 8) & _
Right$("0000000" & Hex(tInfo.nFileIndexLow), 8)
End Function

Tony Proctor


"YisMan" <yis...@att.net> wrote in message

news:5C5C9AEE-2A26-4968...@microsoft.com...

Steve

unread,
Apr 20, 2007, 10:48:06 AM4/20/07
to
On Apr 20, 10:14 am, "Tony Proctor"
> > Thankfully, YisMan- Hide quoted text -

>
> - Show quoted text -

Tony,

When does this index value get changed?

Lets assume I have two similar (but not exact) versions of the same
mp3 file with the same name but located in two different folders. We
will call the first file "A" and the second file "B".
The mp3 tag data in my DB is correct and matches that of file "A" but
file "A" also has some data errors in the media portion so I want to
replace it with file "B".

To keep my DB insync with the file I need the result to be that the
copied file ("B") assumes the index of the existing file ("A"). I
assume however that the file would maintain its original index. Or
will a completely new index be generated?

Either case is managable, provided I know which way it works and
provided that my app is aware of the switch.

Thanks,
Steve

Tony Proctor

unread,
Apr 20, 2007, 11:22:33 AM4/20/07
to
The file index uniquely identifies the file, but not the file content.
Hence, a file keeps the same file index, even after modifications to its
content

If you, or the OP, is keeping a catalog of file information then the file
index can be used as a key to relate those catalog records to the relevant
files that. You can also use it to spot new or deleted files and so keep the
catalog in step with directory changes.

If you wanted to know when the file contents had changed, though, then you
would have to check the last-modified time in this same file information
structure (see ftLastWriteTime field)

Tony Proctor

"Steve" <sred...@rfcorp.com> wrote in message
news:1177080486.4...@o5g2000hsb.googlegroups.com...

John

unread,
Apr 20, 2007, 10:05:47 PM4/20/07
to
On Apr 20, 11:22 am, "Tony Proctor"

<tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> The file index uniquely identifies the file, but not the file content.
> Hence, a file keeps the same file index, even after modifications to its
> content
>
> If you, or the OP, is keeping a catalog of file information then the file
> index can be used as a key to relate those catalog records to the relevant
> files that. You can also use it to spot new or deleted files and so keep the
> catalog in step with directory changes.
>
> If you wanted to know when the file contents had changed, though, then you
> would have to check the last-modified time in this same file information
> structure (see ftLastWriteTime field)
>
> Tony Proctor
>
> "Steve" <sredm...@rfcorp.com> wrote in message
> > > > I keep an Accessdatabaseof all my song files, along with their

> > > respective
> > > > atributes. I want to be able to make changes to their
>
> > > attributes/properties
>
> > > > in the DB and use code to update them in Windows. As the names and
> hence
> > > the
> > > > full paths of these files may fall out of sync with the DB, I would
> like
> > > to
> > > > know this:
>
> > > > Does Windows keep a unique ID for each file in the file system
> throughout
> > > > its lifetime which will never change? and if yes, how will I retrieve
> it
> > > by
> > > > code so I can keep it in my DB in the file's record?
>
> > > > I checked the basic properties of the FileSystemObject as well as the
> > > > documentation of WMI, I haven't found anything yet.
>
> > > > Any ideas/suggestions, anyone?
> > > > Thankfully, YisMan- Hide quoted text -
>
> > > - Show quoted text -
>
> > Tony,
>
> > When does this index value get changed?
>
> > Lets assume I have two similar (but not exact) versions of the same
> >mp3file with the same name but located in two different folders. We

> > will call the first file "A" and the second file "B".
> > Themp3tag data in my DB is correct and matches that of file "A" but

> > file "A" also has some data errors in the media portion so I want to
> > replace it with file "B".
>
> > To keep my DB insync with the file I need the result to be that the
> > copied file ("B") assumes the index of the existing file ("A"). I
> > assume however that the file would maintain its original index. Or
> > will a completely new index be generated?
>
> > Either case is managable, provided I know which way it works and
> > provided that my app is aware of the switch.
>
> > Thanks,
> > Steve- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -

>
> - Show quoted text -

I've found that an MD5 of the data is much faster than a CRC32. Your
approach of storing the filename/filepath/filetime/filesize is also a
good way of trying to verify that the file is unchanged. The only
problem is that any change to the embedded tags will change the file
information without changing the music content. I've never really
figured out a good/fast way to be *absolutely* positive that there
haven't been any changes. There are so many different possible tag
standards, and files are often tagged in a format that is invalid for
the particular filetype!

You might be interested in takin a look at MP3-Boss -- an Access
database (now 8 years in the making!) that tries to manage all this.
Originally, I figured it was a 6month project!
http://www.mp3-boss.com

Tony Proctor

unread,
Apr 23, 2007, 6:07:11 AM4/23/07
to
You don't really need the file path or size John. The last-modified time
will tell you whether any change was made to the file content, although it
cannot distinguish between changes to tag-content and to music-content.
Also, if the same data was written back (i.e. no net change) then it may
look like there was a change when there wasn't. This is where a CRC or MD5
might be better.

The use of the file-index is better than relying on the file name/path since
it references the file body. The name/path are merely entries in one-or-more
directory files, and the same file body can even have multiple directory
entries pointing to it. Hence, the file-index is unique. It's also
unaffected by a rename of the file - which is quite common with music files

Tony Proctor

"John" <googl...@mp3-boss.com> wrote in message
news:1177121147.4...@b58g2000hsg.googlegroups.com...

John

unread,
Apr 25, 2007, 9:12:34 AM4/25/07
to
On Apr 23, 6:07 am, "Tony Proctor"

<tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> You don't really need the file path or size John. The last-modified time
> will tell you whether any change was made to the file content, although it
> cannot distinguish between changes to tag-content and to music-content.
> Also, if the same data was written back (i.e. no net change) then it may
> look like there was a change when there wasn't. This is where a CRC or MD5
> might be better.
>
> The use of the file-index is better than relying on the file name/path since
> it references the file body. The name/path are merely entries in one-or-more
> directory files, and the same file body can even have multiple directory
> entries pointing to it. Hence, the file-index is unique. It's also
> unaffected by a rename of the file - which is quite common with music files
>
> Tony Proctor
>
> "John" <google-p...@mp3-boss.com> wrote in message
> >database(now 8 years in the making!) that tries to manage all this.

> > Originally, I figured it was a 6month project!
> >http://www.mp3-boss.com- Hide quoted text -

>
> - Show quoted text -

At one time I had tried using the FileIndex to uniquely identify files
-- I was particularly interested in identifying a CD based on the
FileIndex -- but if you can believe it, the FileIndex for the files
changed every time you'd write to the CD (even though I was only
adding files to the CD).

I vaguely recall that the FileIndex for a hard drive would change
every time you'd reboot the computer (or maybe it was if you renamed
the volume?). My main interest was to have a unique ID for a CD
though...so unfortunately I didn't write down my findings! Have you
verified that the FileIndex is unique even after rebooting the
computer? Also, I believe the FileIndex changes for network mapped
drives if the file is closed and then reopened, and also if you move
the file across volumes.

I don't find a lot of information out there regarding the limitations
of FileIndex, but I found this on a search -- this is probably what I
was seeing "The FileIndex is a 64-bit number that indicates the
position of the file in the Master File Table (MFT). It is stable
between successive starts of the system, provided the MFT does not
overflow and therefore has to be rebuilt."

So...the problem with FileIndex is that it can't quite be counted on
as a permanent unique identifier. Maybe in Vista?

John

Tony Proctor

unread,
Apr 25, 2007, 9:39:25 AM4/25/07
to
The 'file index' is part of the internal disk and file-system organisation
John. For a hard-drive, it doesn't change if you reboot the system, or if
you rename the file. I haven't tried renaming the volume but I would expect
it to be unchanged there too since the disk structure remains intact. If you
find you have to rebuild your MFT then you will have had a very serious
computer failure, and your file IDs will be the least of your worries
(needless to say, it very rarely happens)

If you copy a file from one place to another then the copy will have a
different index - because it's a different file as far as the O/S is
concerned. Similarly, if you reburn a CD then you're changing the disk
structure, and writing entirely new files (even if the names and contents
were the same as before)

We use file indexes a lot here for cache entries, which is just another type
of file catalog. If we need to process a file, say, to compile a
memory-based version of its storage, we can reliably tell whether we already
have it loaded by simply looking up the file index in a Collection or
Dictionary object. We can then test the file-modified time to see if the
contents were modified since the time we loaded it. If so then it can be
re-loaded at that point and update our stored copy of the file-modified
time.

Tony Proctor

"John" <googl...@mp3-boss.com> wrote in message

news:1177506754.1...@n15g2000prd.googlegroups.com...

John

unread,
Apr 25, 2007, 10:09:14 AM4/25/07
to
On Apr 23, 6:07 am, "Tony Proctor"

<tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> You don't really need the file path or size John. The last-modified time
> will tell you whether any change was made to the file content, although it
> cannot distinguish between changes to tag-content and to music-content.
> Also, if the same data was written back (i.e. no net change) then it may
> look like there was a change when there wasn't. This is where a CRC or MD5
> might be better.
>
> The use of the file-index is better than relying on the file name/path since
> it references the file body. The name/path are merely entries in one-or-more
> directory files, and the same file body can even have multiple directory
> entries pointing to it. Hence, the file-index is unique. It's also
> unaffected by a rename of the file - which is quite common with music files
>
> Tony Proctor
>
> "John" <google-p...@mp3-boss.com> wrote in message
> >http://www.mp3-boss.com- Hide quoted text -

>
> - Show quoted text -

Tony,

I had taken a look at FileIndex as a unique identifier, but decided
that it really wasn't a permanent unique identifier.

I was specifically interested in using the FileIndex to identify files
on a CD -- but found that every time I'd add files to the CD -- the
FileIndex would change.

I found this information:


The FileIndex is a 64-bit number that indicates the position of the
file in the Master File Table (MFT). It is stable between successive
starts of the system, provided the MFT does not overflow and therefore

has to be rebuilt. On WinNT systems (NT, 2K, XP) the FileIndex is also
returned for directories, on Win9x (95, 98, ME) it returns zero for
directories. It is not stable for files on network drives; successive
calls to GetFileInformationByHandle return different values.

The FileIndex also changes if you move the file across volumes.

During my testing (quite a long while ago!), I seem to recall that it
would also change if you renamed the volume.

Have you found FileIndex to be permanent in your testing?

John

Karl E. Peterson

unread,
Apr 25, 2007, 1:53:32 PM4/25/07
to
Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> If you copy a file from one place to another then the copy will have a
> different index - because it's a different file as far as the O/S is
> concerned.

What about a defrag?
--
.NET: It's About Trust!
http://vfred.mvps.org


Tony Proctor

unread,
Apr 25, 2007, 2:19:01 PM4/25/07
to
Good question Karl. I'm not entirely sure as a defrag moves the individual
blocks in the file around the disk rather than the copying the whole file
body around. Hence, I believe it still stays fixed but I haven't tested that

Tony Proctor

"Karl E. Peterson" <ka...@mvps.org> wrote in message
news:%23G8skK2...@TK2MSFTNGP06.phx.gbl...

Karl E. Peterson

unread,
Apr 25, 2007, 2:26:57 PM4/25/07
to
Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
> "Karl E. Peterson" <ka...@mvps.org> wrote...

>> Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
>>> If you copy a file from one place to another then the copy will have a
>>> different index - because it's a different file as far as the O/S is
>>> concerned.
>>
>> What about a defrag?
>
> Good question Karl. I'm not entirely sure as a defrag moves the individual
> blocks in the file around the disk rather than the copying the whole file
> body around. Hence, I believe it still stays fixed but I haven't tested that

Makes sense that they'd stay intact, but when "makes sense" is the best one has to
go on... Well, we all "been there" right? <g> Otherwise, these sound like pretty
cool IDs to know about!

Tony Proctor

unread,
Apr 25, 2007, 2:47:34 PM4/25/07
to
Aha, I think you may be getting confused by the MSDN documentation for the
nFileIndexHigh/nFileIndexLow structures John. This has confused other people
too (e.g.
http://groups.google.ie/group/microsoft.public.vb.winapi/browse_frm/thread/9f258cadf993c8b2/dd8cad3dd42df5e6?hl=en#dd8cad3dd42df5e6).
It suggests that the file index may change on a system boot, or after
closing and re-opening a file. Sigh....

Having worked on file system development I know how file indexes and inodes
are used. I think what that MSDN sentence is basically saying that when you
open a file by name, you then have access to its unique file-index, and that
remains fixed until you close it. This is because you're pointing to the
file body at that point, and anything that happens to the associated
directory entry (e.g. deleted, re-created) is then irrelevant. However, if
you close the file and re-open it -- again, by name -- then you may have
opened a difference instance of the file (where it's been physically
re-created) and so might see a different file-index. This is very misleading
though since the file name (or path) never unique identifies a file, and
everyone appreciates that you may delete a file, and then create a whole new
file with the same name. That's exactly why file indexes are so useful. The
file indexes for the old and new files in this scenario would be different,
and so you would know it's a different file even though someone has given it
the same name.

Tony Proctor

"John" <googl...@mp3-boss.com> wrote in message

news:1177506754.1...@n15g2000prd.googlegroups.com...

Jim Mack

unread,
Apr 25, 2007, 3:02:09 PM4/25/07
to

Since files below a certain size may (at the option of the FS) be kept entirely in the MFT, I suspect that a defrag is going to upset the index values, in at least some cases. And one case is enough. :-)

And of course FAT file systems, including FAT32, don't have a MFT and so no way to persist any index.

--
Jim

Karl E. Peterson

unread,
Apr 25, 2007, 3:07:38 PM4/25/07
to

Yeah, I guess it all goes back to the docs, huh? They recommend using these only as
a way to compare whether two existing file handles point to the same file. (Btw
Tony, wouldn't you want to combine the volume label with your key string?)

Tony Proctor

unread,
Apr 25, 2007, 4:05:17 PM4/25/07
to
> Tony, wouldn't you want to combine the volume label with your key string?

Probably Karl. I assumed the files being checked were all on the same
volume, but otherwise 'yes'

Tony Proctor

"Karl E. Peterson" <ka...@mvps.org> wrote in message

news:uNav%23z2hH...@TK2MSFTNGP04.phx.gbl...

Karl E. Peterson

unread,
Apr 25, 2007, 4:29:22 PM4/25/07
to
Tony Proctor <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote:
>> Tony, wouldn't you want to combine the volume label with your key string?
>
> Probably Karl. I assumed the files being checked were all on the same
> volume, but otherwise 'yes'

Yeah, for a "unique" key, that's really the only way. I'm gonna remember this one.
:-)

Tony Proctor

unread,
Apr 27, 2007, 7:54:21 AM4/27/07
to
I was reading around to get a clearer picture of the differences here Jim.
The consensus seems to be that NTFS provides "defrag-safe" file indexes
(even for small files held in the MFT), but that FAT ones are not
"defrag-safe". I haven't confirmed that myself though

Tony Proctor

"Jim Mack" <jm...@mdxi.nospam.com> wrote in message
news:uIJx6w2h...@TK2MSFTNGP03.phx.gbl...

Jim Mack

unread,
Apr 27, 2007, 9:49:49 AM4/27/07
to
Tony Proctor wrote:
> I was reading around to get a clearer picture of the differences here
> Jim. The consensus seems to be that NTFS provides "defrag-safe" file
> indexes (even for small files held in the MFT), but that FAT ones are
> not "defrag-safe". I haven't confirmed that myself though

Trouble is, without an authoritative statement from MS, it's proof by example and so always subject to instant falsification. It's the 'white crow' problem: we can infer that all crows are black, until a white one comes along. We might be able to gain certainty if the internals were documented.

I didn't follow this thread from the beginning, so I'm not sure what the goal is. Is it to uniquely identify a file, or to detect changes, or to determine the order that files were added, or something else?

--
Jim

Tony Proctor

unread,
Apr 27, 2007, 11:50:43 AM4/27/07
to
In general terms I believe was to keep a separate "file catalog" in step
with the underlying files. In effect, to have the catalog reference a unique
file identifier to keep its data in synch with the files, to be able to spot
new/deleted files using the same ID, and then using either last-modified
time or possibly CRC/MD5 to detect when the content of the said files has
been changed

Tony Proctor

"Jim Mack" <jm...@mdxi.nospam.com> wrote in message

news:%23r3htLN...@TK2MSFTNGP04.phx.gbl...

Jim Mack

unread,
Apr 27, 2007, 4:54:56 PM4/27/07
to
Tony Proctor wrote:
> In general terms I believe was to keep a separate "file catalog" in
> step with the underlying files. In effect, to have the catalog
> reference a unique file identifier to keep its data in synch with the
> files, to be able to spot new/deleted files using the same ID, and
> then using either last-modified time or possibly CRC/MD5 to detect
> when the content of the said files has been changed
>

Ah. Well, I sure wouldn't rely on these indexes for anything critical, but maybe it's enough for this application.

It's easy enough to just keep an array of directory contents, with a hash of file size and CRC of the contents (or a portion).

0 new messages