Temp file location for sorts

10 views
Skip to first unread message

Thorr

unread,
May 24, 2012, 10:11:58 AM5/24/12
to sem...@googlegroups.com
I am suddenly having problems doing sorts and it only occurs when the list
is large enough to require the tsort.com module to make an external sort
file. It appears that it cannot make the file it needs. It's not a space
issue so it has to be a permissions issue or an external library issue. If
anyone knows what directory the tsort.com program defaults to using, I would
appreciate the information so I can track down this problem. Thanks in
advance, everyone!

S.E. Mitchell

unread,
May 26, 2012, 7:44:28 AM5/26/12
to sem...@googlegroups.com
What OS are you running?

Which version of TSE? (you can get this from help, about)

What is the size, and date and time of tsort.com?

If you just run tsort.com from a command prompt, does it display
an error message?

The external sort uses whatever the TEMP or TMP environment
variable points to as the place to store temporary files.

--
Sammy Mitchell

Campbell, Shaun

unread,
May 29, 2012, 8:03:18 AM5/29/12
to sem...@googlegroups.com
Windows 7
V4.0b
The TSort.com I have is dated 8/3/2004. I had to update it years ago when WinXp SP2 required it to be updated.

My TMP and TEMP environment variables both point C:\Windows\Temp, which is unchanged for years, but Win 7 has a default override for both of those that send it to user's profile (username\appdate\local\temp). I have taken both of those directories and removed all security from them but it still doesn't play nice. I have been a TSE guy for 15+ years and never have seen this happen before. I am wondering if there may be a library that got corrupted somehow. I have another program that is giving me trouble writing data to the local machine as well which I cannot figure out. It only happens on this one machine and not on any others, so some error is present. I am just trying to track down possibilities.



Shaun Campbell
Implementation Programmer
Tyler Technologies, Inc.

P: 800.772.2260 ext: 4119
F: 207.781.4606
www.tylertech.com-----Original Message-----
From: sem...@googlegroups.com [mailto:sem...@googlegroups.com] On Behalf Of S.E. Mitchell
Sent: Saturday, May 26, 2012 7:44 AM
To: sem...@googlegroups.com
Subject: Re: [TSE] Temp file location for sorts

S.E. Mitchell

unread,
May 29, 2012, 5:20:34 PM5/29/12
to sem...@googlegroups.com
I've updated the sort macro to include some additional error checking.
I'll send that directly to you.

You'll need to compile it and then run it.

Please let me know if it catches and displays and errors.

Ross Boyd

unread,
May 29, 2012, 5:43:09 PM5/29/12
to sem...@googlegroups.com
Couple of things to try.......
Turn off anti-virus real time protection temporarily (if that
works then add an exclusion for TSort to your AV)
Try creating a new Win7 user profile (if it works in the new
profile then perhaps user security permissions are broken)
Cheers, Ross

Rick C. Hodgin

unread,
May 29, 2012, 7:11:52 PM5/29/12
to sem...@googlegroups.com
Sammi,

You're pretty awesome to be supporting TSE still after all these years
like this. Quite impressive.

Best regards,
Rick C. Hodgin
317-879-6374

S.E. Mitchell

unread,
May 29, 2012, 7:21:44 PM5/29/12
to sem...@googlegroups.com
Everybody needs a little fun sometimes, right? :-)
--
Sammy

Rick C. Hodgin

unread,
May 29, 2012, 7:22:27 PM5/29/12
to sem...@googlegroups.com
Sammy,

Sorry about the spelling. My wife's name is "Sammi" ... I sometimes
mistype the Sammy spelling due to muscle memory. :-)

Best regards,
Rick C. Hodgin

S.E. Mitchell

unread,
May 29, 2012, 7:26:24 PM5/29/12
to sem...@googlegroups.com
No worries :-)

As has been said, you can call me just about anything, only don't call
me late for dinner.

--
Sammy

Larry

unread,
May 29, 2012, 7:41:30 PM5/29/12
to sem...@googlegroups.com

Uh, maybe if TSE could sing, it would sing this to Sammy:
http://www.youtube.com/watch?v=-1CrtWvX_jk

If TSE could sound like Dusty Springfield singing, "What Are You Doing The
Rest Of Your Life", any of us would spend the rest of our lives taking care
of it. :)

Campbell, Shaun

unread,
May 30, 2012, 7:57:57 AM5/30/12
to sem...@googlegroups.com
I compiled the new Sort macro and used it on the machine in question and it works without a hitch. I used it with other macros that call the Sort routine as well and it seems to work fine. (Tests were done with files large enough to require an external sort file.) So far, it works. It has been a few days since the errors were popping up for me and something may have changed since then, but so far so good.

I appreciate the follow up and will keep a close eye on it. If it acts up again, I'll be sure to report the error. Thank you everyone!

S.E. Mitchell

unread,
May 30, 2012, 5:46:03 PM5/30/12
to sem...@googlegroups.com
In the new sort macro, I set the sort_thresh constant to 1 - this
will cause the sort to always use the external sort and go to
disk.

Depending on the speed of your machine, you should set this
constant to 2000 to 4000 or so.

In the latest unreleased version of the editor, sort_thresh is
set at 6000.

I read about something called the "Tukey ninther", and added the
technique to the internal (and external) sorts, and it sped it up
a good bit.


On Wed, May 30, 2012 at 7:57 AM, Campbell, Shaun

Campbell, Shaun

unread,
May 31, 2012, 9:49:51 AM5/31/12
to sem...@googlegroups.com
Is that threshold for the number of sorted items, or is it a memory size in bytes? My Win7 machines are the fastest we have and they max out the specs for 32 bit machines. (Intel i7, max memory). We cannot switch to 64 bit mode on them because some of our software won't install on a 64 bit machine. I use TSE for editing very large files (anywhere from 300M to 3.5G) and have had very few times where the files was just too large to sort without crashing. No other editor has kept up with me in that regard. TSE Forever!

knud van eeden

unread,
May 31, 2012, 10:39:48 AM5/31/12
to sem...@googlegroups.com
Shaun,

Sub question:

>  I use TSE for editing very large files (anywhere from 300M to 3.5G)  

So you are successfully editing 3.5 gigabytes files using TSE without any issues and it goes still really fast???

Note:
As the limit as far as I know is (or thus probably was) 2 gigabytes. The maximum I have tested is about 500 megabytes (=0.5 gigabytes), and once 1 gigabytes.

Thanks,

with friendly greetings,
Knud van Eeden




From: "Campbell, Shaun" <shaun.c...@tylertech.com>
To: "sem...@googlegroups.com" <sem...@googlegroups.com>
Sent: Thursday, May 31, 2012 3:49 PM
Subject: RE: [TSE] Temp file location for sorts

Campbell, Shaun

unread,
May 31, 2012, 11:05:28 AM5/31/12
to sem...@googlegroups.com

I edit 3.5G files quite frequently.  In order to manage them without crashing, I must turn off the Undo capability.  Some large files crash upon trying to load, of course, but I frequently go above the 2G file size.  Newer versions of TSE use a better Virtual memory system than the older ones did (according the author when I spoke with him on the phone several years ago) and I have had much better luck with large files ever since then.  2G files were the maximum before I upgraded to v4.0b.  After that upgrade, I can manage files as big as my system will allow.  The files I edit are fixed-length data files.  I use TSE to search for data irregularities and duplicate records before I attempt to load these files into a database.  The search functions in TSE are still the fastest and best, even with these huge files.

knud van eeden

unread,
May 31, 2012, 11:27:45 AM5/31/12
to sem...@googlegroups.com
After that upgrade, I can manage files as big as my system will allow.

And what is thus about the largest file size you have managed? 3.5 gigabytes thus?
Ever tried even bigger file sizes?

Thanks

with friendly greetings,
Knud van Eeden

Sent: Thursday, May 31, 2012 5:05 PM
Subject: RE: [TSE] Maximum file size [was: Temp file location for sorts]

Campbell, Shaun

unread,
May 31, 2012, 11:33:08 AM5/31/12
to sem...@googlegroups.com

I think the largest I have ever managed is about 4G.  At 4G, I am pushing the limits of both physical and virtual memory in a 32 bit system.  I have a file that’s 3.6G and another one that is 4.4G currently.  I’ll retrieve them from my archive and see if I can load either of them into TSE.

William W. Viergever

unread,
May 31, 2012, 4:43:03 PM5/31/12
to sem...@googlegroups.com

i haven’t gone over 2g, however, very often do similar stuff as Shaun … in particular verifying data layouts, but also too searching for stuff

 

as he said the search is quite fast … the only time eater is the initial load as i sit & watch the records counter (in the status line) spin rapidly

 

i run Win 7 64-bit w/ 16gb of ram and all SAS drives

 

 

--------------------------------------------------------------
William W. Viergever
Viergever & Associates
Health Data Analysis / Systems Design & Development
2920 Arden Way Suite N
Sacramento, CA 95825
wil...@viergever.net
www.viergever.net
 (916) 483-8398
--------------------------------------------------------------

S.E. Mitchell

unread,
May 31, 2012, 5:21:50 PM5/31/12
to sem...@googlegroups.com
It is the number of items to sort, e.g., the number of lines in
the block.

So, if you set it to 6000, if the block to be sorted has 6000 or
less lines, the internal sort will be used. If there are more
than 6000 lines to sort, the block will be saved to disk, the
external sort will be called to sort the file, and then the file on
disk will be loaded, replacing the block.

On my machine, around 6000 lines is the break even point.

On Thu, May 31, 2012 at 9:49 AM, Campbell, Shaun

Rick C. Hodgin

unread,
May 31, 2012, 5:25:52 PM5/31/12
to sem...@googlegroups.com
Sammy,

Why isn't the same sort algorithm used internally and externally? BTW,
following your previous post, I looked up the Tukey Ninther algorithm.
Interesting. :-)

Best regards,
Rick C. Hodgin

S.E. Mitchell

unread,
May 31, 2012, 5:42:47 PM5/31/12
to sem...@googlegroups.com
The same general sorting algorithm is used both internally and
externally.

However, the swapping portion is somewhat different.

Externally, a memory load of data (by memory load, I limit this
to 1/5 or some such of 'apparently' available memory (not paged
out - hard to get this right in a virtual memory system)) is
read, an array of pointers is built, and then the memory load is
sorted. The actual lines are not swapped, but only the pointers.
The file is then written back out (in pointer, not array order).

This process is repeated until the entire file has been read.

Next, a merge sort is performed on the 'runs' created above. (1)

Internally, the editor's editing buffer structure does not lend
itself to easily swapping pointers (there are no pointers to
individual lines), and so each line is physically swapped as
needed.

This makes the internal sort very slow :-)

(1) I spent many hours testing this, and found that relying on
virtual memory (e.g., just loading the whole file and sorting it
via one of the fast internal sort methods) was much slower than a
traditional sort-merge - therefor tsort uses a traditional
sort-merge. I also tested using memory mapped files, and had the
same results - sort-merge was faster.

I would not be surprised if this changes in the not too distant
future - maybe even in a few years.

Rick C. Hodgin

unread,
May 31, 2012, 6:23:48 PM5/31/12
to sem...@googlegroups.com, S.E. Mitchell
Sammy,

I guess that's still my question. Why not use the exact same sorting
algorithm source code, but run it from within TSE. If you have to do a
big sort, write it out to a file, then call your internal code within
TSE which does the sort exactly the way it does right now, and then
writes the file back out, and then returns to the calling function which
reads it back in (or however it does it).

Why spawn a separate process to do it? That's the part I'm not
understanding.

Best regards,
Rick C. Hodgin

Reply all
Reply to author
Forward
0 new messages