Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

finding duplicated files

20 views
Skip to first unread message

Jean Pierre Daviau

unread,
Nov 20, 2009, 2:18:58 PM11/20/09
to
Hi everybody,

How can I find duplicated files on a drive?

JPD

--
Jean Pierre Daviau

- - - -
Art: http://www.jeanpierredaviau.com

Pegasus [MVP]

unread,
Nov 20, 2009, 2:55:28 PM11/20/09
to

"Jean Pierre Daviau" <on...@wasenough.ca> wrote in message
news:%232NTRZh...@TK2MSFTNGP05.phx.gbl...

Google is your friend. When you type

windows duplicated file finder

into a Google search box then you get lots and lots of hits.


Jean Pierre Daviau

unread,
Nov 20, 2009, 3:35:29 PM11/20/09
to
thanks

Todd Vargo

unread,
Nov 20, 2009, 4:33:58 PM11/20/09
to

"Pegasus [MVP]" <ne...@microsoft.com> wrote in message
news:%23CrQqth...@TK2MSFTNGP04.phx.gbl...

http://tinyurl.com/yfzaqlh

--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)

foxidrive

unread,
Nov 21, 2009, 4:13:44 AM11/21/09
to
On Fri, 20 Nov 2009 16:33:58 -0500, "Todd Vargo" <tlv...@sbcglobal.netz> wrote:

>
>"Pegasus [MVP]" <ne...@microsoft.com> wrote in message
>news:%23CrQqth...@TK2MSFTNGP04.phx.gbl...
>>
>> "Jean Pierre Daviau" <on...@wasenough.ca> wrote in message
>> news:%232NTRZh...@TK2MSFTNGP05.phx.gbl...
>> > Hi everybody,
>> >
>> > How can I find duplicated files on a drive?
>>

>> Google is your friend. When you type
>>
>> windows duplicated file finder
>>
>> into a Google search box then you get lots and lots of hits.
>
>http://tinyurl.com/yfzaqlh


It's a minefield to find something command line driven and free - I was
looking recently and I ended up writing a batch script that relies
on sed and Fsum. (See below)

OTOH this tool is free and does the job swiftly but is GUI driven:

http://www.EasyDuplicateFinder.com


Here's my script - it writes a temporary batch file that contains all the
duplicates and rems out the delete command for the first duplicate in each set.

It might be clumsy, and probably doesn't need to test files using 6 hashing
algorithms - and it's slow - but for a small number of files it works fine.

It only compares files with the same filesize so in that respect it is
efficient, and can handle subdirectories.

Change this line to remove excess hashing algorithms "-crc32 -rmd -md5 -sha1 -sha512 -tiger"

If you examine the set of temp files it might be clearer as to what it does: %temp%.\delsametemp.txt?

@echo off

if "%~1"=="" (
echo Purpose: Deletes identical files using multiple checksums from FSUM
echo. Builds !delsame.bat for perusal...
echo. The first file in each set of identical files is marked REMed with a :
echo.
echo Syntax: %0 [filespec.ext] [/s]
echo.
echo. If /s is specified it will recurse through the subdirectory branch.
echo.
pause
goto :EOF
)

:: faster with delayed expansion

echo.Gathering file information...

set "file=%temp%.\delsametemp.txt"
:: goto :next

del "%file%*" 2>nul

chcp 1252
dir /a:-d %1 %2 |sed -e "s/!/|/g" -e "/^ .*/d" -e "/ Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"

echo.Creating file list...

>>"%file%1" echo. ?dummy

setlocal enabledelayedexpansion
for /f "tokens=1,2,3,*" %%a in ('type "%file%0"') do (
if "%%a"=="Directory" (
set "folder=%%d"
) else (
set num= %%c
set num=!num:~-14!
for /f "delims=" %%z in ("!folder!\%%d") do >>"%file%1" echo !num! ?%%z
)
)
endlocal
sed -e "s/|/!/g" -e "/System Volume Information/Id" -e "/recycler/Id" "%file%1"|sort>"%file%2"

echo.Finding duplicate filesizes and creating checksum list...
set num=1
set t1=
set t2=
set preva=
set prevb=
set same=

for /f "tokens=1,2 delims=?" %%a in ('type "%file%2"') do call :sub "%%a" "%%b"
if defined same call :sub2 "%preva%" "%prevb%"
goto :continue

:: start subroutine
:sub
set "t1=%~1"

:: if previous file is NOT the same as the current file,
:: but was the same as the one before (last one in a set) then write details to the file

if not "%t1%"=="%t2%" if defined same call :sub2 "%preva%" "%prevb%" & set same=

::

if "%t1%"=="%t2%" set same=1& call :sub2 "%preva%" "%prevb%"

set t2=%t1%
set "preva=%~1"
set "prevb=%~2"
goto :eof
:: end subroutine
:: start second routine
:sub2
pushd "%~dp2"
echo "%~2"
set fs=%~1
set fs=%fs:,=%
set fs=%fs: =%
>>"%file%4" set /p =%fs%<nul
for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul
popd
>>"%file%4" echo. %2
>>"%file%3" echo "%~1" "%~2"
)
goto :eof
:: end second routine
:continue

if not exist "%file%4" echo no duplicates found&pause&del "%file%?"&goto :EOF

echo.Parsing checksum list...
sort<"%file%4" >"%file%5"

set t1=
set t2=
set preva=
set prevb=
set same=

set num=1
for /f "tokens=1,* delims= " %%a in ('type "%file%5"') do call :sub3 "%%a" %%b
if defined same >>"%file%6" echo. %preva% "%prevb%"
goto :continue2

:: start subroutine3
:sub3
set "t1=%~1"

:: if previous file is NOT the same as the current file,
:: but was the same as the one before (last one in a set) then write details to the file

if not "%t1%"=="%t2%" if defined same >>"%file%6" echo. %preva% "%prevb%"& set same=

::

if "%t1%"=="%t2%" set same=1& >>"%file%6" echo. %preva% "%prevb%"

set t2=%t1%
set "preva=%~1"
set "prevb=%~2"
goto :eof
:: end subroutine
:continue2

echo.Writing duplicates to a batch file "!delsame.bat"
echo.@echo off>"%file%7"
:: echo.chcp 850 >>"%file%7"
echo.chcp 1252 >>"%file%7"
set prev=0
for /f "tokens=1,*" %%a in ('type "%file%6"') do call :sub4 %%a %%b
echo.>>"%file%7"
echo.echo Done!>>"%file%7"
echo.pause>>"%file%7"

move /y "%file%7" !delsame.bat
: del "%file%?"

echo Done!
pause
goto :EOF

:sub4
if %1 EQU %prev% (>>"%file%7" echo del %2) else (>>"%file%7" echo.&>>"%file%7" echo : del %2)
set prev=%1
goto :EOF


Jean Pierre Daviau

unread,
Nov 23, 2009, 8:54:23 AM11/23/09
to
where can I get the last version of sed?

foxidrive

unread,
Nov 23, 2009, 9:07:22 AM11/23/09
to
On Mon, 23 Nov 2009 08:54:23 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>where can I get the last version of sed?

GnuSed for Windows is what I use.

http://gnuwin32.sourceforge.net/packages/sed.htm

Jean Pierre Daviau

unread,
Nov 23, 2009, 5:34:48 PM11/23/09
to
Something does not work.

duplicated.cmd is at the root.

I have changed this line:
set "file=M:\tmp\delsametemp.txt" //I must not it was not working on the
C drive either with the line not changed.


----------------------
Does not work
-----------!delsame.bat-----------
@echo off
chcp 1252

: del "\Grand Classiques D'Edgard Encore Plu1"
del "\Grand Classiques D'Edgard Encore Plu2"
del "\Grand Classiques D'Edgard Encore Plu3"
del "\Grand Classiques D'Edgard Encore Plu4"
del "\Grand Classiques D'Edgard Encore Plu5"
del "\Grand Classiques D'Edgard Encore Plu6"

: del 148
del 148

: del 866
del 866
.....
---------------------------------------------
Impossible de trouver M:\leloup2009
Impossible de trouver M:\Liebert
Cant find the file
Done!
===============

JP

foxidrive

unread,
Nov 24, 2009, 7:21:15 AM11/24/09
to

Maybe it is a language issue. I only have the English XP.

Jean Pierre Daviau

unread,
Nov 24, 2009, 9:04:44 AM11/24/09
to
I have the english Vista. ;-)

foxidrive

unread,
Nov 24, 2009, 9:12:15 AM11/24/09
to
On Tue, 24 Nov 2009 09:04:44 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>I have the english Vista. ;-)

This is not an English error message.

"Impossible de trouver"

Jean Pierre Daviau

unread,
Nov 24, 2009, 9:10:08 AM11/24/09
to
Lets talk about chineese language:

for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2"
2^>nul') do >>"%file%4" set /p ="-%%x"<nul

what is fsum? Another Linux program?

I have to download crc32 -md5 -sha1 -sha512

foxidrive

unread,
Nov 24, 2009, 9:48:11 AM11/24/09
to
On Tue, 24 Nov 2009 09:10:08 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>Lets talk about chineese language:
>
> for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2"
>2^>nul') do >>"%file%4" set /p ="-%%x"<nul
>
>what is fsum? Another Linux program?

http://www.slavasoft.com/fsum/


Jean Pierre Daviau

unread,
Nov 24, 2009, 10:17:00 AM11/24/09
to
"foxidrive" <got...@woohoo.invalid> a �crit dans le message de
news:qgsng5pps02qch94m...@4ax.com...
OK


set fs=%fs:,=% replace , with nothing
set fs=%fs: =% replace space with nothing
>>"%file%4" set /p =%fs%<nul ????????????????????


for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger
"%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul

fsum all these and echo the second variable to nul? "%~2" 2^>nul'

do >>"%file%4" set /p ="-%%x"<nul

set /p= - variable x ----------------why is the nul trown in x?

foxidrive

unread,
Nov 24, 2009, 10:54:33 AM11/24/09
to
On Tue, 24 Nov 2009 10:17:00 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>>>what is fsum? Another Linux program?
>>
>> http://www.slavasoft.com/fsum/
>>
>

> set fs=%fs:,=% replace , with nothing
> set fs=%fs: =% replace space with nothing
> >>"%file%4" set /p =%fs%<nul ????????????????????

Try it and see what it does.

>>file.txt set /p =abc<nul

Check the filesize.


> for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger
>"%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul


-crc32 -rmd -md5 -sha1 -sha512 -tiger <--- those are all different hashing
algorithms. You probably don't need all of them and it'll speed up the
process if you remove some. Better get it working first though.

>fsum all these and echo the second variable to nul? "%~2" 2^>nul'

2>nul redirects the STDERR stream to nul, not the %2

> do >>"%file%4" set /p ="-%%x"<nul
>
>set /p= - variable x ----------------why is the nul trown in x?

Try it and see what happens.

>>file.txt set /p ="-123"<nul

Then execute this and see what happens.

>>file.txt set /p ="-456"<nul


Jean Pierre Daviau

unread,
Nov 24, 2009, 11:14:45 AM11/24/09
to
=============================

> Try it and see what it does.
>
>>>file.txt set /p =abc<nul
>
> Check the filesize.

408 bytes???
Should it not be 3 bytes?
=================================


> Try it and see what happens.
>
>>>file.txt set /p ="-123"<nul
>
> Then execute this and see what happens.
>
>>>file.txt set /p ="-456"<nul

abc-123-456
============================

Jean Pierre Daviau

unread,
Nov 24, 2009, 11:16:35 AM11/24/09
to

> Try it and see what it does.
>
>>>file.txt set /p =abc<nul
>
> Check the filesize.


ok it is 3 bytes


Jean Pierre Daviau

unread,
Nov 24, 2009, 11:48:10 AM11/24/09
to
Gathering file information...
Creating file list...

Finding duplicate filesizes and creating checksum list...
"\a.mp3"
"\Saukare.mp3"
"\zazou.mp3"
Parsing checksum list...

Writing duplicates to a batch file "!delsame.bat"
1 file(s) moved(s).
Done!

=========file 0==========

R�pertoire de M:\MediaPlayer\tmp\a

2008-05-27 19:19 3 429 027 Saukare.mp3
2008-05-27 19:19 3 429 027 zazou.mp3
2008-05-27 19:19 3 429 027 a.mp3
------------------file 1=================

?dummy
dans ?\le lecteur M s'appelle MUSIQUE
de ?\s�rie du volume est 4836-7181
iaPlayer\tmp\a ?\
3 429 027 ?\Saukare.mp3
3 429 027 ?\zazou.mp3
3 429 027 ?\a.mp3
---------------------------
It does not take the blank space?
>>"%file%1" echo. ?dummy
some where in theese lines?

foxidrive

unread,
Nov 24, 2009, 10:33:22 PM11/24/09
to
On Tue, 24 Nov 2009 11:48:10 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>Gathering file information...
>Creating file list...
>Finding duplicate filesizes and creating checksum list...
>"\a.mp3"
>"\Saukare.mp3"
>"\zazou.mp3"
>Parsing checksum list...
>Writing duplicates to a batch file "!delsame.bat"
> 1 file(s) moved(s).
>Done!
>
>=========file 0==========
>
> R�pertoire de M:\MediaPlayer\tmp\a

Try it on your English Vista.

Jean Pierre Daviau

unread,
Nov 25, 2009, 6:18:12 AM11/25/09
to

I changed theese lines and it works
del %2 /S & echo.>>out.txt for a list of deleted files
in
if %1 EQU %prev% (>>"%file%7" echo del %2 /S & echo.>>out.txt) else

(>>"%file%7" echo.&>>"%file%7" echo : del %2)

and dir /-c
in
dir /-c /a:-d %1 %2 |sed -e "s/!/|/g" -e "/^ .*/d" -e "/

Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"

the codepage are only cosmetic because I write all my files in english on my
french Vista. An old habit.

---------
set "t1=%~1"
why not set "t1=%1" ?

Was it my last question? ;-)

Thanks to you.


Jean Pierre Daviau

unread,
Nov 25, 2009, 7:51:27 AM11/25/09
to
It sucks on junction points. A french Vista peoblem?

------- delsametemp3.txt ------
"d\Disque-E\Mes " "\Documents\Medical\MaMort"
"d\Disque-E\Mes "
"\Documents\powerPoint\Atelier2006-7_FlashPoint\temp\Sound"
--------------------

foxidrive

unread,
Nov 25, 2009, 8:46:35 AM11/25/09
to
On Wed, 25 Nov 2009 06:18:12 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>I changed theese lines and it works
>del %2 /S & echo.>>out.txt for a list of deleted files
>in
> if %1 EQU %prev% (>>"%file%7" echo del %2 /S & echo.>>out.txt) else
>(>>"%file%7" echo.&>>"%file%7" echo : del %2)

The paths should be fully qualified and I suspect it is because the routine
that writes the fully qualified path names is written for english DIR
terms.

>set "t1=%~1"
>why not set "t1=%1" ?

It's a practice used in cases where %1 might have double quotes.


>It sucks on junction points. A french Vista peoblem?

I think this will also be solved with fixing the routine for fully
qualified path names.

Jean Pierre Daviau

unread,
Nov 25, 2009, 1:10:07 PM11/25/09
to

> The paths should be fully qualified and I suspect it is because the
> routine
> that writes the fully qualified path names is written for english DIR
> terms.
>
::dir /-c/a:-d %1 %2 | sed -e "s/!/|/g" -e "/^ .*/d" -e "/
Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"

this line:
for /r %%I in (*.*) do echo %%~zI "%%I" >>zout.txt
gives me that output:

1791 "C:\Users\Jean\Desktop\-.lnk"
261 "C:\Users\Jean\Desktop\187_-_53543.url"
1088 "C:\Users\Jean\Desktop\AceFTP 3 Freeware.lnk"
273 "C:\Users\Jean\Desktop\Acrobat User Community Forums.url"

How could I get

zout.txt | sed -e "s/!/|/g" -e "/^ .*/d" -e "/ Volume.*/d" -e

"s/Directory of/Directory of : /" >"%file%0"

??

Timo Salmi

unread,
Nov 25, 2009, 1:47:19 PM11/25/09
to

Jean Pierre Daviau wrote:
> How can I find duplicated files on a drive?

http://www.netikka.net/tsneti/info/tscmd162.htm

All the best, Timo

--
Prof. Timo Salmi mailto:t...@uwasa.fi ftp & http://garbo.uwasa.fi/
Hpage: http://www.uwasa.fi/laskentatoimi/english/personnel/salmitimo/
Department of Accounting and Finance, University of Vaasa, Finland
Useful CMD script tricks http://www.netikka.net/tsneti/info/tscmd.htm

foxidrive

unread,
Nov 25, 2009, 11:20:06 PM11/25/09
to
On Wed, 25 Nov 2009 13:10:07 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>this line:
> for /r %%I in (*.*) do echo %%~zI "%%I" >>zout.txt
>gives me that output:
>
>1791 "C:\Users\Jean\Desktop\-.lnk"
> 261 "C:\Users\Jean\Desktop\187_-_53543.url"
> 1088 "C:\Users\Jean\Desktop\AceFTP 3 Freeware.lnk"
> 273 "C:\Users\Jean\Desktop\Acrobat User Community Forums.url"

In the routine the filesizes are all padded with leading spaces so they
sort correctly - you can't use the above directly.

Jean Pierre Daviau

unread,
Nov 26, 2009, 8:05:36 AM11/26/09
to
"foxidrive" <got...@woohoo.invalid> a �crit dans le message de
news:fc0sg5d69q4nnkato...@4ax.com...

a- If the file size are not padded the numbers becomes a kind of string
(id) wich is sorted like a string. The paths (as many as there is)beginning
with the same id can be compare sent to a file and then the fsum can be
apply to them only. It is faster.

1791 "C:\Users\Jean\Desktop\-.lnk"

1791 "C:\Users\Jean\Desktop\187_-_53543.url"
b- How can I use a for loop to output the first part of the string and
after the second part of the string using the space as a delimiter.
1791 " "

%1 = 1791
%2 = "C:\Users\Jean\Desktop\-.lnk"
%3 = 1791
%4 = "C:\Users\Jean\Desktop\187_-_53543.url"

if {%1} == {%3} write the two lines to the file.

foxidrive

unread,
Nov 26, 2009, 8:54:24 AM11/26/09
to
On Thu, 26 Nov 2009 08:05:36 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:

>a- If the file size are not padded the numbers becomes a kind of string
>(id) wich is sorted like a string. The paths (as many as there is)beginning
>with the same id can be compare sent to a file and then the fsum can be
>apply to them only. It is faster.

Yes, it will work. The routine to output the filesize using %%~za is far
slower than parsing a list from a DIR command though. On the other hand
parsing the list does cause problems with filenames that start with spaces.

> 1791 "C:\Users\Jean\Desktop\-.lnk"
> 1791 "C:\Users\Jean\Desktop\187_-_53543.url"
>b- How can I use a for loop to output the first part of the string and
>after the second part of the string using the space as a delimiter.
>1791 " "
>
>%1 = 1791
> %2 = "C:\Users\Jean\Desktop\-.lnk"
> %3 = 1791
>%4 = "C:\Users\Jean\Desktop\187_-_53543.url"
>
>if {%1} == {%3} write the two lines to the file.


set var=1791 "C:\Users\Jean\Desktop\187_-_53543.url"
if {%1} == {%3} for /f "tokens=1,*" %%a in ("%var%") do echo %%b


Jean Pierre Daviau

unread,
Nov 26, 2009, 9:44:42 AM11/26/09
to

> Yes, it will work. The routine to output the filesize using %%~za is far
> slower than parsing a list from a DIR command though.
>On the other hand parsing the list does cause problems with filenames that
>start with spaces.

> set var=1791 "C:\Users\Jean\Desktop\187_-_53543.url"


> if {%1} == {%3} for /f "tokens=1,*" %%a in ("%var%") do echo %%b
>

Thats great for keeping only one of the duplicates.

I found
for /r . %J in (*.pdf) do @echo %~zJ %J 1>>out.txt
type out.txt | sort >out2.txt
for /f "tokens=1,2" %I in (out2.txt) do echo %I %J wich gives me the two
separated variables I need.


Jean Pierre Daviau

unread,
Nov 26, 2009, 10:09:31 AM11/26/09
to
C:\Users\Jean\Desktop>fsum -jnc -crc32
"C:\Users\Jean\Desktop\Wake_Gallery.pdf"

74105d30 ?CRC32*Wake_Gallery.pdf

How can I get rid of the "
SlavaSoft Optimizing Checksum Utility - fsum 2.52.00337
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2007. All rights reserved."
?

Jean Pierre Daviau

unread,
Nov 27, 2009, 8:59:34 AM11/27/09
to
Sorry this one works but I still have a problem of blank space somewhere.

---
@echo off
set _un=
set _deux=
set _drapeau=


for /f "tokens=1,2" %%I in (out2.txt) do call :doublons %%I %%J

:doublons

if {%_un%}=={%1} (
if defined %_drapeau% ( @echo. del "%2" >>!doublons.txt
) else (
@echo. :del "%_deux%" >>!doublons.txt
@echo. del "%2" >>!doublons.txt
set _drapeau=1
)
) else (
set _un=%1
set _deux=%2
set _drapeau=
)

goto :EOF
--------

foxidrive

unread,
Nov 27, 2009, 9:31:00 AM11/27/09
to
On Fri, 27 Nov 2009 08:59:34 -0500, "Jean Pierre Daviau"
<on...@wasenough.ca> wrote:
>Sorry this one works but I still have a problem of blank space somewhere.

It seems to have issues here.

out2.txt

123 "c:\abc\def\123.txt"
123 "c:\abc\def\123b.txt"
256 "c:\abc\def\256.txt"
123 "c:\abc\def\123c.txt"


Result:

:del ""
del ""c:\abc\def\123.txt""
:del ""c:\abc\def\123b.txt""
del ""c:\abc\def\256.txt""
:del ""c:\abc\def\256.txt""
del ""c:\abc\def\123c.txt""
:del ""c:\abc\def\123c.txt""
del ""

Jean Pierre Daviau

unread,
Nov 27, 2009, 11:20:35 AM11/27/09
to

> Result:
Theese lines should not be there as there is only one 256.txt

> del ""c:\abc\def\256.txt""

> :del ""c:\abc\def\256.txt""

Jean Pierre Daviau

unread,
Nov 27, 2009, 12:13:32 PM11/27/09
to
try this

123 "c:\abc\def\123a.txt"
123 "c:\abc\def\123 b.txt"

Jean Pierre Daviau

unread,
Nov 27, 2009, 1:09:59 PM11/27/09
to

Here it is.

---------------

@echo off
set _un=
set _deux=
set _drapeau=


for /f "tokens=1,*" %%I in (out2.txt) do call :doublons %%I %%J

:doublons

if {%_un%}=={%1} (

if {%_drapeau%}=={1} ( @echo. del %~2 >>!doublons.txt


) else (
@echo. :del %_deux% >>!doublons.txt

@echo. del %~2 >>!doublons.txt


set _drapeau=1
)
) else (

set _un=%~1
set _deux=%~2

billious

unread,
Nov 28, 2009, 12:35:38 AM11/28/09
to

"Jean Pierre Daviau" <on...@wasenough.ca> wrote in message
news:%23isaod4...@TK2MSFTNGP06.phx.gbl...

I've no intention of trying to follow this massive thread, and have simply
skimmed it.

What I gather is that a file is being produced consisting of
[optional spaces][file length][space][quoted full filename]
sorted by file length
with the aim of reporting on duplicate files.

Here's an approach to process this file (physical or not)


This solution developed using XP
It may work for NT4/2K

----- batch begins -------
[1]@echo off
[2]setlocal enabledelayedexpansion
[3]set oldlength=original
[4]set countmatches=0
[5]for /f "tokens=1*" %%i in (daviau.txt) do (
[6]if not %%i==!oldlength! (
[7]call :report
[8]set oldlength=%%i
[9])
[10]set /a countmatches+=1
[11]set match!countmatches!=%%j
[12])
[13]call :report
[14]goto :eof
[15]
[16]:report
[17]if %countmatches% gtr 1 for /l %%a in (1,1,%countmatches%) do for /l %%b
in (%%a,1,%countmatches%) do if not %%a==%%b ECHO testing %%a==%%b
!match%%a! vs !match%%b!
[18]:: for /l %%a in (1,1,%countmatches%) do for /l %%b in
(%%a,1,%countmatches%) do if not %%a==%%b if defined match%%a if defined
match%%b if exist !match%%a! if exist !match%%b! fc /b !match%%a! !match%%b!
>nul & if not errorlevel 1 ECHO\!match%%a! same as !match%%a!
[19]for /l %%m in (1,1,%countmatches%) do (set match%%m=)
[20]set countmatches=0
[21]goto :eof
------ batch ends --------

Lines start [number] - any lines not starting [number] have been wrapped and
should be rejoined. The [number] that starts the line should be removed

The label :eof is defined in NT+ to be end-of-file but MUST be expressedas
:eof

%varname% will be evaluated as the value of VARNAME at the time that the
line is PARSED. The ENABLEDELAYEDEXPANSION option to SETLOCAL causes
!varname! to be evaluated as the CURRENT value of VARNAME - that is, as
modified by the operation of the FOR

The idea here is that the names of the files are installed into the
environment sa MATCH1..MATCHn and on each change of length the gathered
names are compared. Obviously, if the environment area is filled, this would
need to be replaced by a tempfile-processing routine.

The filename in [5] could theoretically be replaced by a single-quoted
process to operate on that process output.

[17] is simply a line to show the comparisons being made. It can be removed
[18] is the real action line and should be un-commented to invoke (I don't
have the filenames in this post's header) An improvement would be to append

(SET match%%b=)

to the end of the line so that match%%b is deleted from the environment if
the file !match%%b! is found to match !match%%a! - the result being that it
is only ever mentioned in the output on the FIRST time a match is found.

I'd suggest that the output be echoed to a report file - but if you decide
to arbitrarily delete the duplicate !match%%b! (DANGEROUS!!) then the IF
EXIST for the two files in [18] should speed matters.

Actually, perhaps the if exist !match%%a! would be better moved to before
the for /l %%b... IF that is, EITHER 'if exist' is strictly required.

Rest is a matter for the experimenter.

Idea only - tested in theory only.


0 new messages