Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Finding duplicate files in a directory tree

64 views
Skip to first unread message

Timo Salmi

unread,
Nov 9, 2012, 6:37:10 AM11/9/12
to
Added a solution for
162} Finding duplicate files in a directory tree.
http://www.netikka.net/tsneti/info/tscmd162.htm

All the best, Timo

--
Prof. (emer.) Timo Salmi, Vaasa, Finland
http://www.netikka.net/tsneti/homepage.php
Useful CMD script tricks http://www.netikka.net/tsneti/info/tscmd.php

foxidrive

unread,
Nov 9, 2012, 7:09:12 AM11/9/12
to
On 09/11/2012 22:37, Timo Salmi wrote:
> Added a solution for
> 162} Finding duplicate files in a directory tree.
> http://www.netikka.net/tsneti/info/tscmd162.htm
>

This finds duplicate filenames, not binary duplicate files, correct?

I didn't study your code closely - it's been a long day. :)


--
foxi

billious

unread,
Nov 9, 2012, 7:26:54 AM11/9/12
to
On 9/11/2012 19:37, Timo Salmi wrote:
> Added a solution for
> 162} Finding duplicate files in a directory tree.
> http://www.netikka.net/tsneti/info/tscmd162.htm
>
> All the best, Timo
>
Well, looks good!

One itsy-bitsy problem that I see is that the presence of JUNCTIONS in
Vista+ means that the same physical file can appear to have not only a
sfn and lfn but also more than one path - the upshot of which is that if
you're tempted to delete what appears to be a duplicate, you could be
deleting your only copy...

Might be an idea to publicise the caution!

frank.w...@gmail.com

unread,
Nov 9, 2012, 9:56:20 AM11/9/12
to
From Timo Salmi :
>Added a solution for
> 162} Finding duplicate files in a directory tree.
> http://www.netikka.net/tsneti/info/tscmd162.htm

Very nice programming style, and you make good use of
sharable subroutines. In your retirement do you have
time to add a department to your CMD information which
would be a repository of plug and play subroutines?

I don't have a Windows machine so I couldn't see your
script function, but I did notice that you might have
lost the letter "s" in the line:

??rem The -d witch: only print duplicate lines

Frank

Timo Salmi

unread,
Nov 9, 2012, 11:42:16 AM11/9/12
to
On 09.11.2012 16:56 frank.w...@gmail.com wrote:
> From Timo Salmi :
>> Added a solution for
>> 162} Finding duplicate files in a directory tree.
>> http://www.netikka.net/tsneti/info/tscmd162.htm

> Very nice programming style, and you make good use of sharable
> subroutines.

Appreciated.

> In your retirement do you have time to add a department to
> your CMD information which would be a repository of plug and play
> subroutines?

Actually, it already has been there. A bit hidden, admittedly:
http://www.netikka.net/tsneti/info/tscmd018.htm#Subroutines

> I don't have a Windows machine so I couldn't see your script function,
> but I did notice that you might have lost the letter "s" in the line:
> ??rem The -d witch: only print duplicate lines

Corrected. Also my thanks to billious and foxidrive. Your points have
now been taken into account in the item.

Timo Salmi

unread,
Nov 11, 2012, 5:49:26 AM11/11/12
to
On Friday, November 9, 2012 1:37:11 PM UTC+2, Timo Salmi wrote:
> Added a solution for
> 162} Finding duplicate files in a directory tree.
> http://www.netikka.net/tsneti/info/tscmd162.htm

Added a pure batch option and made some fine tuning.

Dr J R Stockton

unread,
Nov 11, 2012, 4:35:02 PM11/11/12
to
In alt.msdos.batch.nt message <509CEAE6...@uwasa.fi>, Fri, 9 Nov
2012 13:37:10, Timo Salmi <t...@uwasa.fi> posted:

> 162} Finding duplicate files in a directory tree.
> http://www.netikka.net/tsneti/info/tscmd162.htm
>

I suggest changing "files" to "filenames", if that would not be wrong.


One could modify the code (Jscript, in WSH) described in my
<http://www.merlyn.demon.co.uk/programs/32-bit/seakfyle.htm>, which will
already scan the tree, to check the names. The code for that scan alone
could be extracted.

A good way to check the names is probably to start with A = {}, an empty
object; then for each file create element A[filename] as an empty array
[] (if not already created), and push into the array the path of the
file (e.g. c:\this\that\tother).

Then scan A looking for elements of length greater than 1, which contain
the required information.

One could index A with filename+separator+datestamp to select files of
identical names and datestamps, which are probably copies.

The above is not necessarily *exactly* correct.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk DOS 3.3, 6.20; WinXP.
Web <http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms and links.
PAS EXE TXT ZIP via <http://www.merlyn.demon.co.uk/programs/00index.htm>
My DOS <http://www.merlyn.demon.co.uk/batfiles.htm> - also batprogs.htm.

frank.w...@gmail.com

unread,
Nov 12, 2012, 6:57:29 AM11/12/12
to
From Dr J R Stockton :
>The above is not necessarily *exactly* correct.

Agreed. Do you have a suggestion which is exactly
correct?

Frank

Dr J R Stockton

unread,
Nov 13, 2012, 11:18:36 AM11/13/12
to
In alt.msdos.batch.nt message <13af47d611a$frank.w...@gmail.com>,
Mon, 12 Nov 2012 11:57:29, frank.w...@gmail.com posted:

>From Dr J R Stockton :
>>The above is not necessarily *exactly* correct.
>
>Agreed. Do you have a suggestion which is exactly correct?


Firstly you should say what you think is wrong, so that one can see
whether you are mistaken.

Secondly, that was intended to express the general method, to be thought
about by anyone considering implementing it. Any problem not detected
at that stage will be discovered at the testing stage.

--
(c) John Stockton, nr London UK Reply address via Home Page.
news:comp.lang.javascript FAQ <http://www.jibbering.com/faq/index.html>.
<http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

frank.w...@gmail.com

unread,
Nov 13, 2012, 7:04:55 PM11/13/12
to
From Dr J R Stockton :
>In alt.msdos.batch.nt message
><13af47d611a$frank.w...@gmail.com>,
>Mon, 12 Nov 2012 11:57:29, frank.w...@gmail.com
>posted:

>>From Dr J R Stockton :
>>>The above is not necessarily *exactly* correct.
>>
>>Agreed. Do you have a suggestion which is exactly
>correct?


>Firstly you should say what you think is wrong, so that
>one can see
>whether you are mistaken.

How is that logical? You said it isn't correct and I
agreed; you didn't say what part is not correct and I
simply agreed with your general statement, but if you
had specified which part is incorrect then it would have
been pointless for me to repeat that.

Frank

Dr J R Stockton

unread,
Nov 15, 2012, 3:48:17 PM11/15/12
to
In alt.msdos.batch.nt message <13afc436e50$frank.w...@gmail.com>,
Wed, 14 Nov 2012 00:04:55, frank.w...@gmail.com posted:

>From Dr J R Stockton :
>>In alt.msdos.batch.nt message
>><13af47d611a$frank.w...@gmail.com>,
>>Mon, 12 Nov 2012 11:57:29, frank.w...@gmail.com
>>posted:
>
>>>From Dr J R Stockton :
>>>>The above is not necessarily *exactly* correct.
>>>
>>>Agreed. Do you have a suggestion which is exactly
>>correct?
>
>
>>Firstly you should say what you think is wrong, so that
>>one can see
>>whether you are mistaken.
>
>How is that logical? You said it isn't correct and I agreed; you
>didn't say what part is not correct and I simply agreed with your
>general statement, but if you had specified which part is incorrect
>then it would have been pointless for me to repeat that.


I wrote that it "is not necessarily *exactly* correct", not that it "is
necessarily not *exactly* correct" or that it "is not *exactly*
correct".

It is not necessarily exactly correct because it is a plan for code
which might be written, rather than a description of working code.
Something may have been forgotten, or the code may not match the
requirement.

But it consists of known-good tested moves. The directory structure is
basically a tree structure, to be scanned for duplicate leaves; I've
written, and frequently use, a program which scans the DOM trees of Web
pages for anchors and IDs, and reports duplicates within pages - which
is reasonably similar.

--
(c) John Stockton, nr London, UK. Mail via homepage. Turnpike v6.05 MIME.
Web <http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms and links;
Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.
0 new messages