Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to match groups of files

12 views
Skip to first unread message

Bill Ashton

unread,
Apr 12, 2012, 10:04:54 PM4/12/12
to
Hi again!

I just ran into something unusual, and need some philosophical guidance.

I have two lists of files, and I need to see if there are duplicates among
the list of group1, and between group1 and group2.

For example, group1 might have:
My.Test.File1
My.Test.File2
My.Test.File3
My.Test.File2

Group 2 might have
My.Test.File4
My.Test.File5
My.Test.File6
My.Test.File3

In Group 1, I need to identify the duplicate File2 entries
I need to also identify that File3 in Group1 already exists in Group2.

Now these groups could have up to 30 or 40 files each.

I have them in two different stems = filelist1_dsn.1 = My.Test.File1,
etc.,and filelist2_dsn.1 = My.Test.File4

I was doing a double For loop:
For a =1 to filelist1_dsn.0
For b = 1 to filelist2_dsn.0
If files are equal...this is a problem
End
End

Then I realized that for a list of 2 or 3 files this is no big deal, but if
both lists have 30 files, this is a big problem.

So the questions are:
1. How can I check Group 1 for duplicates in the most efficient manner?
2. How can I match Group 1 against Group 2 to identify duplicates (if any)
without this horrible loop in a loop?

Stems are not mandatory, but I want to keep it all in my rexx program if
possible. The order is important, as this is a small piece of a big
process, and I can't sort and rearrange the lists without being able to get
the original sequence back.

Thanks in advance for your continued wisdom - I really enjoy the
discussions here!
Billy

--
Thank you and best regards,
*Billy Ashton*

----------------------------------------------------------------------
For TSO-REXX subscribe / signoff / archive access instructions,
send email to LIST...@VM.MARIST.EDU with the message: INFO TSO-REXX

Paul Gilmartin

unread,
Apr 12, 2012, 10:42:07 PM4/12/12
to
On Apr 12, 2012, at 20:04, Bill Ashton wrote:

> Hi again!
>
> I just ran into something unusual, and need some philosophical guidance.
>
> I have two lists of files, and I need to see if there are duplicates among
> the list of group1, and between group1 and group2.
>
> For example, group1 might have:
> My.Test.File1
> My.Test.File2
> My.Test.File3
> My.Test.File2
>
> Group 2 might have
> My.Test.File4
> My.Test.File5
> My.Test.File6
> My.Test.File3
>
> In Group 1, I need to identify the duplicate File2 entries
> I need to also identify that File3 in Group1 already exists in Group2.
>
Wouldn't associative arrays, well supported by Rexx, solve
this problem for you?

-- gil

Höglund Lars

unread,
Apr 13, 2012, 8:57:44 AM4/13/12
to
Something like (not tested)

GRP1. = ''
Value = 'My.Test.File1'
If GRP1.value = '' then
do
GRP1.value = 'x'
End
Else
Say 'duplicate in GRP1' value
.
Value = 'My.Test.File30'


GRP2. = ''
GRP2.1 = value1fromgrp2
.
GRP2.0 = 30

Do i1 = 1 to GRP2.0
Thevalue = GRP2.i1
If GRP1.thevalue = 'x' then say 'exists'
Else say 'missing'
End

//Lasse

-----Ursprungligt meddelande-----
Från: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] För Bill Ashton
Skickat: den 13 april 2012 04:06
Till: TSO-...@VM.MARIST.EDU
Ämne: [TSO-REXX] how to match groups of files

Bill Ashton

unread,
Apr 13, 2012, 9:00:36 AM4/13/12
to
Thanks, GIl! I have not used these before, and it sounds like it will work
great.

Do make sure I am on the right track, does this sound right?
1. Init all myfile. tails to 0
2. For group1, instead of setting
filelist1_dsn.# = dsname I believe I would use
myfile.dsname = 1
3. Then all I need to do with either the same group or the other group is
test the name:
If myfile.testdsn Then [error processing]... (or If myfile.testdsn = 1
Then ...)

Can it really be this simple, or did I miss something? I know this is only
a rough pseudocode, but it looks too easy...
Billy
--
Thank you and best regards,
*Billy Ashton*

John McKown

unread,
Apr 13, 2012, 9:08:46 AM4/13/12
to
I agree with gil. Two associative arrays (group1. and group2.). If the
groups are in a file or files which you are reading, you'll also need
two normal variables, say dsn_group1 and dsn_group2 to keep track of the
dsns in each corresponding associative array. Set the default for each
array as 0. For each dsn in the group, use the dsn as the index value
and simply add 1 to it. If you have the "tracking" variables, append the
dsn to it, unless it is already in the group "tracking" variable.

Once you have the associate arrays populated, report on all dsns, from
the values in the associated "tracking" array) in each array which have
a count > 1 for duplicates.For duplicate between the two groups, look
for count > 0 in each group, but use the dsn values from the other
group's "tracking" variable.

That's the general algorithm.
--
John McKown
Maranatha! <><

Arthur T.

unread,
Apr 13, 2012, 9:14:50 AM4/13/12
to
On 12 Apr 2012 19:04:54 -0700, in bit.listserv.tsorexx
(Message-ID:<CAPtSOKzFKhek1da2G5ZdNUk1yBK=rWOsZ5Abhz1...@mail.gmail.com>)
bill00...@GMAIL.COM (Bill Ashton) wrote:

>I have two lists of files, and I need to see if there are
>duplicates among
>the list of group1, and between group1 and group2.

Quick and dirty version of the solution I'd use:

<code>

/* In a real program, these assignments would be in
loops*/
filelist1_dsn.1 = 'My.Test.File1'
filelist1_dsn.2 = 'My.Test.File2'
filelist1_dsn.3 = 'My.Test.File3'
filelist1_dsn.4 = 'My.Test.File2'
filelist1_dsn.0 = 4

filelist2_dsn.1 = 'My.Test.File4'
filelist2_dsn.2 = 'My.Test.File5'
filelist2_dsn.3 = 'My.Test.File6'
filelist2_dsn.4 = 'My.Test.File3'
filelist2_dsn.0 = 4

true = (0=0)
false = \true

f1. = false

do i = 1 for filelist1_dsn.0
a = filelist1_dsn.i
if f1.a then say a 'is a duplicate within filelist 1'
else f1.a = true
end

do i = 1 for filelist2_dsn.0
a = filelist2_dsn.i
if f1.a then say a 'is a duplicate from filelist 2'
end

</code>


--
I cannot receive mail at the address this was sent from.
To reply directly, send to ar23hur "at" pobox "dot" com

Bob Bridges

unread,
Apr 13, 2012, 9:25:21 AM4/13/12
to
He didn't spell it out, Bill, but I think he's referring to what is also my
favorite solution to such a problem. I'll back out to the point at which
you're reading in those DSNs into the program in the first place, but you
can adapt this to your own situation:

'EXECIO * DISKR DSNLIST1 (FINIS'
list1.=0 /* it's not on list1 unless I say it is */
do queued()
parse pull dsn . /* assume no leading spaces */
list1.dsn=1; end

'EXECIO * DISK DSNLIST2 (FINIS'
do queued()
parse pull dsn .
if list1.dsn then queue dsn; end

The first paragraph builds you a list of Boolean values; if list1.<any DSN>
is true, then the DSN is in the first list. The second paragraph checks the
first list; if there's a match, put the DSN back on the stack. By the end
of the second loop you have on the stack a list of DSNs that appear in both
lists.

Oh, wait, I just noticed; you also want to be sure there are no dups within
list 1. But maybe you already see how to work that - just add a line to the
first loop:

'EXECIO * DISKR DSNLIST1 (FINIS'
list1.=0 /* it's not on list1 unless I say it is */
do queued()
parse pull dsn . /* assume no leading spaces */
if list1.dsn then call DupListIn1 /* <--- */
else list1.dsn=1; end

---
Bob Bridges, rhb...@attglobal.net, cell 336 382-7313

/* A good programmer is someone who looks both ways before crossing a
one-way street. -Doug Linder */

-----Original Message-----
From: Paul Gilmartin
Sent: Thursday, April 12, 2012 22:41

Wouldn't associative arrays, well supported by Rexx, solve
this problem for you?

-----Original Message-----
From: Bill Ashton
Sent: Thursday, April 12, 2012 22:04

I have two lists of files, and I need to see if there are duplicates among
the list of group1, and between group1 and group2.

For example, group1 might have:
My.Test.File1
My.Test.File2
My.Test.File3
My.Test.File2

Group 2 might have
My.Test.File4
My.Test.File5
My.Test.File6
My.Test.File3

In Group 1, I need to identify the duplicate File2 entries
I need to also identify that File3 in Group1 already exists in Group2.

Now these groups could have up to 30 or 40 files each.

I have them in two different stems = filelist1_dsn.1 = My.Test.File1,
etc.,and filelist2_dsn.1 = My.Test.File4

I was doing a double For loop:
For a =1 to filelist1_dsn.0
For b = 1 to filelist2_dsn.0
If files are equal...this is a problem
End
End

Then I realized that for a list of 2 or 3 files this is no big deal, but if
both lists have 30 files, this is a big problem.

So the questions are:
1. How can I check Group 1 for duplicates in the most efficient manner?
2. How can I match Group 1 against Group 2 to identify duplicates (if any)
without this horrible loop in a loop?

Stems are not mandatory, but I want to keep it all in my rexx program if
possible. The order is important, as this is a small piece of a big
process, and I can't sort and rearrange the lists without being able to get
the original sequence back.

Paul Gilmartin

unread,
Apr 13, 2012, 9:45:21 AM4/13/12
to
On Apr 13, 2012, at 06:26, Bill Ashton wrote:

> Thanks, GIl! I have not used these before, and it sounds like it will work
> great.
>
> Do make sure I am on the right track, does this sound right?
> 1. Init all myfile. tails to 0
> 2. For group1, instead of setting
> filelist1_dsn.# = dsname I believe I would use
> myfile.dsname = 1
> 3. Then all I need to do with either the same group or the other group is
> test the name:
> If myfile.testdsn Then [error processing]... (or If myfile.testdsn = 1
> Then ...)
>
> Can it really be this simple, or did I miss something? I know this is only
> a rough pseudocode, but it looks too easy...
> Billy
>
Looks very much the right track.

Happy coding,

Rob Zenuk

unread,
Apr 13, 2012, 10:05:23 AM4/13/12
to
Is this as simple as using the built-in symbol() to test existence?

group1 = 'my.test.file1 my.test.file2 my.test.file3 my.test.file2'
do i=1 to words(group1)
file = word(group1,i)
if symbol('group1.file') = 'LIT' then
group1.file = ''
else
say 'file' file 'already exists in GROUP1'
end

Rob


-----Original Message-----
From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On Behalf Of
Bill Ashton

Ward Able, Grant

unread,
Apr 13, 2012, 10:22:46 AM4/13/12
to
I was not sure of the term "associative arrays" as this was the first time I had come across it, so I googled it and found this which may be useful:

http://www.toward.com/cfsrexx/os2-mag/9412.htm





Regards - Grant.

Telephone Internal: x1496 London

Telephone External: +44 (0)207 650 1496





-----Original Message-----

From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On Behalf Of Paul Gilmartin

Sent: 13 April 2012 03:41

To: TSO-...@VM.MARIST.EDU

Subject: Re: [TSO-REXX] how to match groups of files
<BR>_____________________________________________________________

<FONT size=2><BR>

DTCC DISCLAIMER: This email and any files transmitted with it are

confidential and intended solely for the use of the individual or

entity to whom they are addressed. If you have received this email

in error, please notify us immediately and delete the email and any

attachments from your system. The recipient should check this email

and any attachments for the presence of viruses. The company

accepts no liability for any damage caused by any virus transmitted

by this email.</FONT>

Rob Zenuk

unread,
Apr 13, 2012, 10:29:37 AM4/13/12
to
I thought about it a little more... Still waiting for the coffee to finish
brewing...

group.1 = 'my.test.file1 my.test.file2 my.test.file3 my.test.file2'
group.2 = 'my.test.file4 my.test.file5 my.test.file6 my.test.file3'
group.3 = 'my.test.file1 my.test.file5 my.test.file7 my.test.file7'
group.0 = 3
do g=1 to group.0
do f=1 to words(group.g)
file = word(group.g,f)
if symbol('group.file') = 'LIT' then
group.file = g
else
say 'file' file 'already exists in GROUP' group.file
end
end

On a side note, I just thought of another purpose or feature for RXVARS...
Simply return the number of variables found... For stems this would avoid
having to keep a stem.0 maintained... Granted this is probably only valid
for test programs since you would probably have a counter anyway in most
real life programs. In its current incarnation, using queued() can get the
count.

My code above would look like this:

group.1 = 'MY.TEST.FILE1 MY.TEST.FILE2 MY.TEST.FILE3 MY.TEST.FILE2'
group.2 = 'MY.TEST.FILE4 MY.TEST.FILE5 MY.TEST.FILE6 MY.TEST.FILE3'
group.3 = 'MY.TEST.FILE1 MY.TEST.FILE5 MY.TEST.FILE7 MY.TEST.FILE7'
"NEWSTACK"
call rxvars 'GROUP.'
group.0 = queued()
"DELSTACK"
do g=1 to group.0
do f=1 to words(group.g)
file = word(group.g,f)
if symbol('group.file') = 'LIT' then
group.file = g
else
say 'file' file 'already exists in GROUP' group.file
end
end

Coffee is finally done... Hi-ho, hi-ho...

Rob


-----Original Message-----
From: Rob Zenuk [mailto:robz...@aol.com]
Sent: Friday, April 13, 2012 7:02 AM
To: 'TSO REXX Discussion List'
Subject: RE: how to match groups of files

Is this as simple as using the built-in symbol() to test existence?

group1 = 'my.test.file1 my.test.file2 my.test.file3 my.test.file2'
do i=1 to words(group1)
file = word(group1,i)
if symbol('group1.file') = 'LIT' then
group1.file = ''
else
say 'file' file 'already exists in GROUP1'
end

Rob


-----Original Message-----
From: TSO REXX Discussion List [mailto:TSO-...@VM.MARIST.EDU] On Behalf Of
Bill Ashton
Sent: Thursday, April 12, 2012 7:04 PM
To: TSO-...@VM.MARIST.EDU

Bill Ashton

unread,
Apr 13, 2012, 5:07:00 PM4/13/12
to
Gil, this was a great tip! I managed to finally get to this code this
afternoon, and it works great. Thanks for pointing me in the right
direction!
Billy
--
Thank you and best regards,
*Billy Ashton*

Andreas Fischer

unread,
Apr 16, 2012, 4:18:24 AM4/16/12
to
hi,

you posted this question in the rexx discussion list, so you probably want
a rexx solution. but if you ask for the most efficient manner to do this,
then i recommend you to use ICETOOL. though i never tried to execute
ICETOOL within a rexx program but regular batch jobs only.

regards,
andi


TSO REXX Discussion List <TSO-...@VM.MARIST.EDU> schrieb am 13.04.2012
04:04:07:

Bob Bridges

unread,
Apr 16, 2012, 7:26:43 PM4/16/12
to
You know the old saying, "to him who has only a hammer, everything looks
like a nail". I laugh and appreciate the saying, but I know it applies to
myself too; I have a number of favorite techniques, platforms, applications
etc, and probably use them out of habit even when another technique might
work as well or better.

But there's some defense for that. I've noticed that a number of decisions
can go a lot faster just by picking what I know and am used to. Put it this
way: I'm a keyboard kind of guy, and much prefer to hit <Ctl-Ins> to copy
some text and <Shift-Ins> to paste it, because I'd rather not move my hand
all the way over to the mouse to do the same task. But if I try to teach
someone else to do that, they can waste an awful lot of time, before they're
used to that method, stopping to think about what keystrokes I named. Could
be it'd be better for them just to stick to what they're used to, unless
they're really serious about shaving that all-important quarter-second from
the operation.

It's kind of the same thing here. I can write a REXX that does this in 60
or 90 seconds, and 80% of the time if I don't fat-finger the DSNs it'll work
right the first time. I'm not quarreling with your assertion that ICETOOL
will do it faster and better, because I don't know; I use it only
occasionally. But I'd need a good reason to make the experiment, if you
follow my thinking.

---
Bob Bridges, rhb...@attglobal.net, cell 336 382-7313

/* A Freudian slip is when you mean one thing and say your mother. */

-----Original Message-----
From: Adrian Stern
Sent: Monday, April 16, 2012 05:24

I've used it and it's great - makes this kind of problem disappear in a
twinkle, but these guys seem to like doing things the hard way, though I
must commend the final solution to this problem - really elegant

-----Original Message-----
From: Andreas Fischer
Sent: den 16 april 2012 10:17

you posted this question in the rexx discussion list, so you probably want
a rexx solution. but if you ask for the most efficient manner to do this,
then i recommend you to use ICETOOL. though i never tried to execute
ICETOOL within a rexx program but regular batch jobs only.

---
> Von: Bill Ashton <bill00...@GMAIL.COM>
> Datum: 13.04.2012 04:05
>
> I have two lists of files, and I need to see if there are duplicates
> among the list of group1, and between group1 and group2....In Group 1, I
> need to identify the duplicate File2 entries. I need to also identify that
> File3 in Group1 already exists in Group2.

Mickey

unread,
Apr 17, 2012, 12:41:56 AM4/17/12
to
I tend to be the same way, but this is one of those occasions when I leave
Rexx in my pocket and breakout syncsort. In syncsort, it's about 3 lines of
code, in Rexx it would be overly complex.

--------------------------------------------------
From: "Bob Bridges" <rhb...@ATTGLOBAL.NET>
Sent: Monday, April 16, 2012 7:25 PM
To: <TSO-...@VM.MARIST.EDU>
Subject: Re: [TSO-REXX] Antwort: [TSO-REXX] how to match groups of files
0 new messages