Windows File Access from D3

182 views
Skip to first unread message

Peter McMurray

unread,
Jul 13, 2016, 2:26:39 AM7/13/16
to Pick and MultiValue Databases
HI
I use Windows files extensively to store PDF copies of invoices and statements. I am delighted with the results over the last 4 years using PrintWiz.
Are there any recommendations regarding the optimum number of entries in a Windows folder. 
I am used to this being a pointless question when using MV files as millions of items can be accessed instantly with a known key. However when using Windows folders opened as a DOS: file and searching for known keys does anybody know how the Pick READ routine finds the item and can the method be affected by the number and or size of items in the Windows folder?

Ross Ferris

unread,
Jul 13, 2016, 10:13:18 AM7/13/16
to Pick and MultiValue Databases
Peter.
Shame on you! This is a question that you should have answered decades ago :-) I know I did when NTFS first hit the ground ... so long ago now, but IIRC I was getting sub-second response times with over a million "records" (aka files) in a directory, and the limit is over 4 billion per directory (and the directories are hashed - 64K limit on FAT32 I think)

However, never, Ever, EVER try opening this folder with Explorer unless you have a death wish, and a week to spare. However, don't confuse this limitation of the Explorer program with a limitation with the file system itself (shame Win2012 didn't get the SQL-like upgrade that was rumoured, but that is a whole other story) - I'd also recommend against a SORT, but direct read works just fine.

That said, it shouldn't take too much t/h/o/u/g/h/t to come up with strategies that make this point mute (and if you look at other variables, like the type of file & when it relates to, "smart", easy to navigate strategies become obvious

Nathan Rector

unread,
Jul 13, 2016, 10:22:57 AM7/13/16
to mvd...@googlegroups.com
Microsoft will tell you that there is no limit to the number of files in
a NTFS folder. The issue is that there is a Max limit of 4,294,967,295
files per "Volume".

https://technet.microsoft.com/en-gb/library/bb457112.aspx

On the other hand just because there is no limit doesn't mean there
isn't practical limits. If you are going to be using Windows Explorer
to browse the folder, then there is a practical limit of around 100,000
(in my experience) before it becomes too hard to work Windows explorer.

If you working with "dir", I didn't seem to have any issues with larger
number of files.

When you have a large number of files, turn off the DOS 8.3 file
formats. This will address a performance issues in NTFS. You don't
normally see it with smaller number of files, but becomes a big issue
with large number of files.

I would also recommend turning off "Windows Indexing" on this folder.
Again performance issue, and it's already indexed in your MV system, so
why have Windows Index it too.

-Nathan
> --
> You received this message because you are subscribed to
> the "Pick and MultiValue Databases" group.
> To post, email to: mvd...@googlegroups.com
> To unsubscribe, email to: mvdbms+un...@googlegroups.com
> For more options, visit http://groups.google.com/group/mvdbms

--

--------------------------------------------
Nathan Rector
International Spectrum, Inc
http://www.intl-spectrum.com
Phone: 720-259-1356

Conference Dates: April 11th-14th
http://www.intl-spectrum.com/conference/

Magazine: Jan/Feb 2016
- Business Tech: The Point (of Sale)
- OAUTH 2 Login with MultiValue BASIC . Part 3
- /bin/bash-ing MultiValue
- Building a Modern Line-Of-Business Application — Part 1
http://www.intl-spectrum.com/Magazine/

Rex Gozar

unread,
Jul 13, 2016, 2:34:26 PM7/13/16
to mvd...@googlegroups.com
I did some testing on this a dozen years ago with document imaging
files -- Windows folder access started to become noticeably slower
after 10,000 files. I decided on an arbitrary limit of 25,000 document
files per folder which gave me a balance between acceptable
performance and having to manage too many document folders.

rex

Peter McMurray

unread,
Jul 13, 2016, 4:29:34 PM7/13/16
to Pick and MultiValue Databases


Hi 
Thank You all. I have no problem at all with the speed as I would not dream of using anything but a direct read and write from the NTFS. The fantastic speed that D3 Achieves from MV files means I only use the NTFS for storage. A client fiddling around in an area that he does not understand questioned the storage and I wanted a well informed opinion in case I had missed something.
The fact that the NTFS file is hashed explains what I had hoped to hear. I don't use Explorer for this believe me however I do occasionally do a SORT using the dict from an MV file on an NTFS sub-folder where I know there are only a 1000 or so records and that works fine.

Peter McMurray

unread,
Jul 13, 2016, 5:19:43 PM7/13/16
to Pick and MultiValue Databases

Hi Rex
I am interested in how you were accessing the files when you made the 25,000 decision. Specifically were you accessing them from within MV or using ANother program that relied on Microsoft. I see in the article that Nathan quoted that Microsoft recommend 300,000.
Also I can see from that article that the D3 designer used the optimum NTFS cluster as the default group. I remember back in my youth being very impressed by Dick's use of the standard assembler table entry size of 10 bytes for a variable a practise that continues to this day. His influence lingers on - do what the system was designed for.

Kevin Powick

unread,
Jul 13, 2016, 9:08:35 PM7/13/16
to Pick and MultiValue Databases
One of our Windows based customers must maintain storage for an XML version of each of their invoices.  Since the XML files must be accessible to both D3 and other non-mv processes, they are kept in a folder on an NTFS drive.

To aid with organization, one base folder was created (e.g. Invoices) with 100 subfolders within it numbered 00 to 99.  Since each XML file is named the same as the invoice number, the last two digits of the invoice number are used to determine in which subfolder the XML is written. e.g. Invoice # 53204 would be found as /Invoices/04/53204.xml.

While the above is not strictly necessary, it does make it faster/easier for people to access the XML from various Windows based programs when the need arises.

--
Kevin Powick


MarioB

unread,
Jul 14, 2016, 12:17:27 AM7/14/16
to Pick and MultiValue Databases
Hi Peter,

Like yourself I am both delighted with and use PrintWiz to create PDFs in a similar manner.
Whilst I have used folders containing many entries I have found processing or searching through them is a PITA.
Like Kevin's customers I have adopted a powershell script that periodically move files into sub-directories (folders).
I have found a preliminary copy of this script I created which should you find useful is below.
Rgrds from marz

<#
    Check each folder in the nominated starting folder.
    Moving any files in the folder into a subfolder.
    The subfolder is created if it does not exist.
    The subfolder name format is folder name & date
where date is file's last write date & in YYYYMM
#>

$UpdCnt = 0
$StartDir = "C:\TMP"
Write-Output "Processing Start Folder $StartDir"

 
$fldrs = get-childitem $StartDir | where { $_.PSIsContainer }

foreach ($fldr in $fldrs)
{
    if ($fldr)
    {
        $SrcDir = $StartDir + "\" + $fldr
        Write-Output "checking folder:  $SrcDir"
        
        $files = get-childitem $SrcDir | where { ! $_.PSIsContainer }
        foreach ($file in $files)
        {
            if ($file)
            {
            $DstnDir = $SrcDir + "\$fldr" +"_"+ $file.LastWriteTime.Date.ToString('yyyyMM') + "\"
            if (!(Test-Path $DstnDir))
       {
          New-Item -Path $DstnDir -ItemType directory | out-null
       }
            $UpdCnt += 1
            Move-Item $file.fullname $DstnDir
            Write-Output "Moved file: $file to $DstnDir"
            }
        }
    }
}
Write-Output "All Done... Updated:$UpdCnt"





Rex Gozar

unread,
Jul 14, 2016, 9:42:54 AM7/14/16
to mvd...@googlegroups.com
Peter,

My system saved compressed document images. I used third-party DOS
command line utilities to compress/decompress the images. When I timed
the operations, I noticed the timings became noticeably longer after
10,000 files/folder (I did the same test on unix/linux and the
threshold was 25,000). I simply decided on 25,000 because the wait
time wasn't objectionable at that limit.

In my case, a user had to wait for the image to decompress and view
it, so performance was more of an issue. If it were strictly a batch
process (without an impatient user) then I probably would have
considered the overall processing time and adjusted my limit
accordingly.

(sidebar - my theory as to why NTFS gets slower after X number of
items: all filesystems have some sort of directory header table that
maps the file's name to an inode on disk. When the table exceeds a
single disk block (e.g. 4K or 8K) the filesystem has to get the next
disk block to find the specified file. When there are thousands of
files in a directory, the filesystem has to repeat this read
block/search block loop many times before it finds the inode -- this
adds disk read latency. There are other filesystem types (e.g.
ReiserFS on linux) that structure their directory header tables
specifically to reduce the number of reads it takes to find the
inode.)

rex

Peter McMurray

unread,
Jul 15, 2016, 2:43:54 AM7/15/16
to Pick and MultiValue Databases


Thank You all.
As I expected people using non MV applications to read and manipulate the data needed smaller record counts so I ran a test using Pick Basic.
My invoice storage averages 2Kb per item so I built a folder of 250,000 items in 25,000 item chunks. 
The time to build each chunk started off at a second and after 100,000 went out to 10 seconds.
The retrieval rate I checked by doing a count of an MV file with 328,000 entries in it in an attempt to clear current memory then doing 10 random reads from the new file.
This was under a millisecond for less than 100,000 but gradually went out to 10 milliseconds i.e. a millisecond a read.
However when I closed everything down and then ran the 10 random reads again (several times) I could not time it because it was under a millisecond and that is the System(12) limit.
At any step in the procedure Windows Explorer could scroll from top to bottom as fast as I could drag the cursor.
This is on the cheapest machine out of Officeworks  HP AMD Athlon II X2 255 Processor 3.10 GHz 4.00Gb 64 bit Windows 10
My conclusion when it is just a data dump storage go as big as you like and when you are doing significant work keep the count down. My test file had 250,000 entries and came to 976Mb
Obviously significant record size variations will affect the result.

Ross Ferris

unread,
Jul 16, 2016, 10:47:40 AM7/16/16
to Pick and MultiValue Databases
Peter, I believe you will find results consistent, regardless of individual file size for all practical terms

Peter McMurray

unread,
Jul 16, 2016, 10:40:21 PM7/16/16
to Pick and MultiValue Databases


Thanks Ross.
 In fact the D3 FSI files and the OSFI files must use pretty well the same access options. 
Based on the tests I ran Windows hesitated for a second or so every 5000 * 2Kb records as it added more space whereas D3 allocates the main file chunk initially. However if one has finger trouble and winds up with a D3 data file of 3 blocks instead of around 17000 it still works extremely well and must pick up space at the Microsoft level.
However once the file size had settled down a block of 10 random reads were still under a millisecond.
I queried RD about the optimum size of arrays and was told that a 10,000 element array was very small. I have a file that I intended to get fancy with and keep in small chunks of linked records however as the records now range from 1Kb to 300 Kb (the most frequently used one) and nobody has noticed I have left well alone. Unlike Bruce's motto "if it works fix it 'til it doesn't" :-)

Ross Ferris

unread,
Jul 16, 2016, 11:12:54 PM7/16/16
to Pick and MultiValue Databases
D3 will be caching writes for D3 files, OSFI to windows file system will not (not strictly true, but close enough) ... if you had more RAM (or tweaked some parameters) suspect your 5,000 record OS flush would change, but as you say, performance isn't too shabby

Peter McMurray

unread,
Jul 17, 2016, 11:23:39 PM7/17/16
to Pick and MultiValue Databases


The speed on a better machine such as a client's 8GB of Ram is such that the operators' did not believe the month end had run and tried to do it again. Several thousand PDFs were produced from the standard Account and Credit card Basic programs with minor alterations to fit PrintWiz in under 2 seconds. Then the print to the new HP with Statement paper in one tray and Credit Card paper in the other cut a day of noisy agony to an hour of silence with the occasional beep for more paper please. On top of that the fact that I had sorted the pdfs so that Statement and Credit card for the same customer were merged and ready to mail or skipping that just emailed made the OSFI changes magic.
Reply all
Reply to author
Forward
0 new messages