Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why would SysStemSort not use 100% CPU?

64 views
Skip to first unread message

Swifty

unread,
May 25, 2012, 8:05:17 AM5/25/12
to
I'm sorting a stem which contains the details of every file in a
directory. There will be 100's of thousands of them.

I'm watching the CPU use in Process Explorer under WindowsXP.

The sysstemsort() is proceeding at a steady 44% CPU, which is 88% of
one of my two processors.

The system overall is running at around 70% CPU, never going above 75%
so I'm not capped by CPU.

What might the REXX process be doing that is NOT using the CPU? It's
not System overhead as I would see that on the graph - there is none
visible.

If I force the rexx process to run on just one of my processors, its
CPU usage goes down. So presumably it is switching processors.

So perhaps the missing 12% is consumed in switching processors, but
that seems a huge overhead.

This is with oorexx 4.1.1

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk

Swifty

unread,
May 25, 2012, 9:38:58 AM5/25/12
to
On Fri, 25 May 2012 13:05:17 +0100, Swifty <steve....@gmail.com>
wrote:

>I'm sorting a stem which contains the details of every file in a
>directory. There will be 100's of thousands of them.

The sort took just under 90 minutes. There are 547,691 files, so
that's about half a second of CPU per line sorted.

I know that SysStemSort is not one of the fastest sorts around, but is
this reasonable?

Jeremy Nicoll - news posts

unread,
May 25, 2012, 10:29:25 AM5/25/12
to
Swifty <steve....@gmail.com> wrote:

> What might the REXX process be doing that is NOT using the CPU? It's not
> System overhead as I would see that on the graph - there is none visible.

Paging?

Does Task Manager show lots of page faults happening and/or lots of IO?

--
Jeremy C B Nicoll - my opinions are my own.

Email sent to my from-address will be deleted. Instead, please reply
to newsre...@wingsandbeaks.org.uk replacing "aaa" by "284".

Shmuel Metz

unread,
May 25, 2012, 9:23:23 AM5/25/12
to
In <kpsur75pllgepqchu...@4ax.com>, on 05/25/2012
at 01:05 PM, Swifty <steve....@gmail.com> said:

>What might the REXX process be doing that is NOT using the CPU?

Paging? How big is its working set and how much RAM do you have? Do
you have a way to monitor I/O to the "swap" file?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spam...@library.lspace.org

LesK

unread,
May 25, 2012, 10:19:05 PM5/25/12
to
Swifty wrote:
> On Fri, 25 May 2012 13:05:17 +0100, Swifty <steve....@gmail.com>
> wrote:
>
>> I'm sorting a stem which contains the details of every file in a
>> directory. There will be 100's of thousands of them.
>
> The sort took just under 90 minutes. There are 547,691 files, so
> that's about half a second of CPU per line sorted.
>
> I know that SysStemSort is not one of the fastest sorts around, but is
> this reasonable?
>
Have you compared its performance against any other sorter? For instance, the
one I sent you back in Feb?

Les

--

Les (Change Arabic to Roman to email me)

Bob Babcock

unread,
May 26, 2012, 12:03:29 AM5/26/12
to
Swifty <steve....@gmail.com> wrote in
news:5t0vr7d53euhogp09...@4ax.com:

> The sort took just under 90 minutes. There are 547,691 files, so
> that's about half a second of CPU per line sorted.
>
> I know that SysStemSort is not one of the fastest sorts around, but is
> this reasonable?

It's horrible. I just timed a merge sort in C++ with 1.1 million files.
Sorting by unicode file name takes about 12 seconds. A natural number sort
(2 sorts before 10) takes 27 seconds. Sorting by size (where the
comparison function is simpler) takes about 1 second. My sort shuffles an
index array; perhaps SysStemSort is moving the complete record for every
swap.

It might be interesting to test your sort with different number of files to
see if the time is proportional to n^2, nlog(n),... n^2 would indicate
that SyStemSort is using a bad algorithm.

Swifty

unread,
May 26, 2012, 2:34:02 AM5/26/12
to
On Sat, 26 May 2012 04:03:29 +0000, Bob Babcock
<wss...@nospam.wssddc.com> wrote:

>It might be interesting to test your sort with different number of files to
>see if the time is proportional to n^2, nlog(n),... n^2 would indicate
>that SyStemSort is using a bad algorithm.

That should be fairly simple to implement in a benchmark. I may
attempt it.

Your comment about the index array is my way forwards.
Each line in the output from SysFileTree('*','F','FL') looks like
this:

1950-09-16 01:35:17 ...

... and I have 547,691 elements in the stem. I can easily create an
index stem containing date time entry# :

1950-09-16 01:35:17 1
2012-05-26 07:25:49 2

... and so on. I sort this index stem, then process it in order, using
word(,3) as the entry number in the original stem.

The actual program is a routine I wrote which handles directories
where files accumulate without limit - temp directories, log
directories, and so on.
The Directory with 547,691 files is the images from my recently
installed security camera setup.
I would never have ended up with so many files but for a minor error
recently introduced which was preventing the code from deleting
anything...

Based on: http://www.swiftys.org.uk/wiz?1451, I'd noticed that my C:
drive seemed to be filling up, and thought "That's funny". :-)

Swifty

unread,
May 26, 2012, 3:01:00 AM5/26/12
to
On Fri, 25 May 2012 09:23:23 -0400, Shmuel (Seymour J.) Metz
<spam...@library.lspace.org.invalid> wrote:

>Paging? How big is its working set and how much RAM do you have?

I didn't check the working set, but REXX used a maximum of 388Mb
private storage (according to Process Explorer) in a system with 3Gb
and not much else going on. The CPU use graph was as flat as Table Top
Mountain, and I'd have expected some serious deviations if paging had
been happening.

I'll try a more rigorous test if the conditions arise again. My
attention span is also like Table Top Mountain; negligible when I was
an infant, slight through my adult life, and back to negligible now
I'm a doddering old fool.

Sahananda

unread,
May 26, 2012, 4:52:17 AM5/26/12
to
I don't know if this would be faster with your data and setup, but it might be worth a try:

array = stem.~makeArray
do row over array~sort
...
end

hth

Jon

Glenn Knickerbocker

unread,
May 26, 2012, 2:49:07 PM5/26/12
to
On Sat, 26 May 2012 04:03:29 +0000, Bob Babcock wrote:
>It might be interesting to test your sort with different number of files to
>see if the time is proportional to n^2, nlog(n),... n^2 would indicate
>that SyStemSort is using a bad algorithm.

Whatever it is, it hates a stem that's already sorted but loves one
sorted in two halves:

>say f.0
>9059
> ........................................... rexxtry.rex on WindowsNT
>call time 'R';call sysstemsort 'F.';say time('E')
>0.016000
> ........................................... rexxtry.rex on WindowsNT
>call time 'R';call sysstemsort 'F.';say time('E')
>0.656000
> ........................................... rexxtry.rex on WindowsNT
>call sysstemcopy 'F.','F.',1,9060
> ........................................... rexxtry.rex on WindowsNT
>say f.0
>18118
> ........................................... rexxtry.rex on WindowsNT
>call time 'R';call sysstemsort 'F.';say time('E')
>0.031000
> ........................................... rexxtry.rex on WindowsNT
>call time 'R';call sysstemsort 'F.';say time('E')
>2.750000
> ........................................... rexxtry.rex on WindowsNT
>call time 'R';call sysstemsort 'F.',,,,,2;say time('E')
>0.609000
> ........................................... rexxtry.rex on WindowsNT
>call time 'R';call sysstemsort 'F.';say time('E')
>0.375000
> ........................................... rexxtry.rex on WindowsNT

Double it again and it takes almost 30 seconds. This would easily
explain Swifty's 90 minutes if his list started out already sorted or
nearly so. Steve, try sorting the same list on the times first and then
on the dates and I bet it'll take a few seconds.

ŹR http://users.bestweb.net/~notr/hassel.html
The worst thing that can happen is, tasty olives. --Stevven

Bob Babcock

unread,
May 26, 2012, 11:20:48 PM5/26/12
to
Glenn Knickerbocker <No...@bestweb.net> wrote in
news:lm72s7dmh3pv607i0...@4ax.com:

> Whatever it is, it hates a stem that's already sorted but loves one
> sorted in two halves:

Quicksort doesn't like already sorted input. If the implementation isn't
careful, it can degenerate to n-squared performance. Perhaps that's what's
happening with SysStemSort.

Swifty

unread,
May 28, 2012, 2:42:52 AM5/28/12
to
On Sat, 26 May 2012 14:49:07 -0400, Glenn Knickerbocker
<No...@bestweb.net> wrote:

>try sorting the same list on the times first and then
>on the dates and I bet it'll take a few seconds.

I'll give that a try, but I can see a possible problem here.

SysStemSort is (or maybe was) documented as being an "unstable" sort,
which means that rows with equal keys may not stay in the order
presented.

So, if I sort by time first I could get

2012-05-28 00:17:59
1950-09-16 01:30:29
1950-09-16 02:21:15

But then after sorting by date I could get:

1950-09-16 02:21:15
1950-09-16 01:30:29
2012-05-28 00:17:59

I'm sorting file by date/time ascending (oldest first) because one of
my "temporary file removal" algorithms is "Keep only N newest files in
directory".

In the example above, with N=2, I'd erase the file from 02:21:15
whilst keeping the file from 01:30:29 on the same day.

Swifty

unread,
May 28, 2012, 2:49:35 AM5/28/12
to
On Sat, 26 May 2012 01:52:17 -0700 (PDT), Sahananda
<saha...@gmail.com> wrote:

>I don't know if this would be faster with your data and setup, but it might be worth a try:
>
>array = stem.~makeArray
>do row over array~sort

I shall certainly try it, on the basis of "Never predict the result of
an experiment before you've tried it".

Glenn Knickerbocker

unread,
May 28, 2012, 12:14:43 PM5/28/12
to
On Mon, 28 May 2012 07:42:52 +0100, Swifty wrote:
><No...@bestweb.net> wrote:
>>try sorting the same list on the times first and then
>>on the dates and I bet it'll take a few seconds.
>SysStemSort is (or maybe was) documented as being an "unstable" sort,

So sort on the times first and then on the date and time together (or on
the full record). The point isn't to avoid sorting on the full key on
the second pass, it's just to avoid starting with an already sorted file.

http://users.bestweb.net/~notr Official President of Kevin S. Wilson
"Yeah, and there's a little bit of Elvis in Michael J. Fox, too, but
unfortunately it's the part that makes him shake uncontrollably." ŹR

Glenn Knickerbocker

unread,
May 28, 2012, 12:17:05 PM5/28/12
to
On Sat, 26 May 2012 01:52:17 -0700 (PDT), Sahananda wrote:
>array = stem.~makeArray

Uh, that makes an array of the indices, not the values . . .

http://users.bestweb.net/~notr "The notion of objecting to a fake Web
ŹR site on the grounds that it might possibly incite other people
to do bad things is so dangerous to our constitutionally protected
freedoms that it must never be mentioned, even in jest." --Matt McIrvin

Rick McGuire

unread,
May 28, 2012, 12:29:31 PM5/28/12
to
Use stem.~allItems to get the items.

Rick

Swifty

unread,
May 29, 2012, 4:51:12 AM5/29/12
to
On Mon, 28 May 2012 12:14:43 -0400, Glenn Knickerbocker
<No...@bestweb.net> wrote:

>So sort on the times first and then on the date and time together (or on
>the full record). The point isn't to avoid sorting on the full key on
>the second pass, it's just to avoid starting with an already sorted file.

It's logic like this that explains why I paid good money for a 24"
display, and do my editing in a 10-point font (a compact one, at that)

It's so I have room for about 200 characters in the comment explaining
why I'm doing something so apparently irrational as sorting by time
before sorting on the date + time.

:-)

Oh, for a SysStemStableSort.

Glenn Knickerbocker

unread,
May 29, 2012, 11:50:44 AM5/29/12
to
On 5/28/2012 12:29 PM, Rick McGuire wrote:
> Use stem.~allItems to get the items.

Man, where were my eyes last time I tried to find that? So then you can
*almost* put the items right back into the stem all in one step, except
for that pesky STEM.0:

stem.~remove(0)
stem.~putAll(stem.~allItems~sort)
stem.0 = stem.~items

This presumes, of course, that you're sure there aren't any stray or
missing items in your stem to worry about.

ŹR

LesK

unread,
May 29, 2012, 12:21:18 PM5/29/12
to
Swifty wrote:
> On Mon, 28 May 2012 12:14:43 -0400, Glenn Knickerbocker
> <No...@bestweb.net> wrote:
>
>> So sort on the times first and then on the date and time together (or on
>> the full record). The point isn't to avoid sorting on the full key on
>> the second pass, it's just to avoid starting with an already sorted file.
>
> It's logic like this that explains why I paid good money for a 24"
> display, and do my editing in a 10-point font (a compact one, at that)
>
> It's so I have room for about 200 characters in the comment explaining
> why I'm doing something so apparently irrational as sorting by time
> before sorting on the date + time.
>
> :-)
>
> Oh, for a SysStemStableSort.
>
Or you could use a multi-column sorter. Like the one I sent you before.

Swifty

unread,
May 29, 2012, 2:34:20 PM5/29/12
to
On Tue, 29 May 2012 11:50:44 -0400, Glenn Knickerbocker
<No...@bestweb.net> wrote:

>This presumes, of course, that you're sure there aren't any stray or
>missing items in your stem to worry about.

Well, in my case we know that won't happen, as the original stem is
generated by SysFileTree() and hasn't been manipulated.

Glenn Knickerbocker

unread,
May 29, 2012, 3:17:46 PM5/29/12
to
On 5/29/2012 2:34 PM, Swifty wrote:
> On Tue, 29 May 2012 11:50:44 -0400, Glenn Knickerbocker
> <No...@bestweb.net> wrote:
>>This presumes, of course, that you're sure there aren't any stray or
>>missing items in your stem to worry about.
> Well, in my case we know that won't happen, as the original stem is
> generated by SysFileTree() and hasn't been manipulated.

As long as you haven't accidentally reused the name!

ŹR

Sahananda

unread,
May 30, 2012, 1:33:46 AM5/30/12
to
I notice that one of the bugfixes addressed in 4.1.1 addresses a performance issue with sysStemSort working on sysFileTree results
http://sourceforge.net/tracker/?func=detail&aid=2846301&group_id=119701&atid=684730

hth Jon

Rick McGuire

unread,
May 30, 2012, 6:29:33 AM5/30/12
to
Hmmmm, including that in the list was a mistake. That fix is only in
the trunk version, not the 4.1.1 branch. Because this was a complete
rewrite, I was hesitant to include this in a bug fix release without a
good set of regression tests for the function. No volunteers have
stepped forward to help write this.

Rick
0 new messages