How can I find duplicated files on a drive?
JPD
--
Jean Pierre Daviau
- - - -
Art: http://www.jeanpierredaviau.com
Google is your friend. When you type
windows duplicated file finder
into a Google search box then you get lots and lots of hits.
--
Todd Vargo
(Post questions to group only. Remove "z" to email personal messages)
>
>"Pegasus [MVP]" <ne...@microsoft.com> wrote in message
>news:%23CrQqth...@TK2MSFTNGP04.phx.gbl...
>>
>> "Jean Pierre Daviau" <on...@wasenough.ca> wrote in message
>> news:%232NTRZh...@TK2MSFTNGP05.phx.gbl...
>> > Hi everybody,
>> >
>> > How can I find duplicated files on a drive?
>>
>> Google is your friend. When you type
>>
>> windows duplicated file finder
>>
>> into a Google search box then you get lots and lots of hits.
>
>http://tinyurl.com/yfzaqlh
It's a minefield to find something command line driven and free - I was
looking recently and I ended up writing a batch script that relies
on sed and Fsum. (See below)
OTOH this tool is free and does the job swiftly but is GUI driven:
http://www.EasyDuplicateFinder.com
Here's my script - it writes a temporary batch file that contains all the
duplicates and rems out the delete command for the first duplicate in each set.
It might be clumsy, and probably doesn't need to test files using 6 hashing
algorithms - and it's slow - but for a small number of files it works fine.
It only compares files with the same filesize so in that respect it is
efficient, and can handle subdirectories.
Change this line to remove excess hashing algorithms "-crc32 -rmd -md5 -sha1 -sha512 -tiger"
If you examine the set of temp files it might be clearer as to what it does: %temp%.\delsametemp.txt?
@echo off
if "%~1"=="" (
echo Purpose: Deletes identical files using multiple checksums from FSUM
echo. Builds !delsame.bat for perusal...
echo. The first file in each set of identical files is marked REMed with a :
echo.
echo Syntax: %0 [filespec.ext] [/s]
echo.
echo. If /s is specified it will recurse through the subdirectory branch.
echo.
pause
goto :EOF
)
:: faster with delayed expansion
echo.Gathering file information...
set "file=%temp%.\delsametemp.txt"
:: goto :next
del "%file%*" 2>nul
chcp 1252
dir /a:-d %1 %2 |sed -e "s/!/|/g" -e "/^ .*/d" -e "/ Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"
echo.Creating file list...
>>"%file%1" echo. ?dummy
setlocal enabledelayedexpansion
for /f "tokens=1,2,3,*" %%a in ('type "%file%0"') do (
if "%%a"=="Directory" (
set "folder=%%d"
) else (
set num= %%c
set num=!num:~-14!
for /f "delims=" %%z in ("!folder!\%%d") do >>"%file%1" echo !num! ?%%z
)
)
endlocal
sed -e "s/|/!/g" -e "/System Volume Information/Id" -e "/recycler/Id" "%file%1"|sort>"%file%2"
echo.Finding duplicate filesizes and creating checksum list...
set num=1
set t1=
set t2=
set preva=
set prevb=
set same=
for /f "tokens=1,2 delims=?" %%a in ('type "%file%2"') do call :sub "%%a" "%%b"
if defined same call :sub2 "%preva%" "%prevb%"
goto :continue
:: start subroutine
:sub
set "t1=%~1"
:: if previous file is NOT the same as the current file,
:: but was the same as the one before (last one in a set) then write details to the file
if not "%t1%"=="%t2%" if defined same call :sub2 "%preva%" "%prevb%" & set same=
::
if "%t1%"=="%t2%" set same=1& call :sub2 "%preva%" "%prevb%"
set t2=%t1%
set "preva=%~1"
set "prevb=%~2"
goto :eof
:: end subroutine
:: start second routine
:sub2
pushd "%~dp2"
echo "%~2"
set fs=%~1
set fs=%fs:,=%
set fs=%fs: =%
>>"%file%4" set /p =%fs%<nul
for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul
popd
>>"%file%4" echo. %2
>>"%file%3" echo "%~1" "%~2"
)
goto :eof
:: end second routine
:continue
if not exist "%file%4" echo no duplicates found&pause&del "%file%?"&goto :EOF
echo.Parsing checksum list...
sort<"%file%4" >"%file%5"
set t1=
set t2=
set preva=
set prevb=
set same=
set num=1
for /f "tokens=1,* delims= " %%a in ('type "%file%5"') do call :sub3 "%%a" %%b
if defined same >>"%file%6" echo. %preva% "%prevb%"
goto :continue2
:: start subroutine3
:sub3
set "t1=%~1"
:: if previous file is NOT the same as the current file,
:: but was the same as the one before (last one in a set) then write details to the file
if not "%t1%"=="%t2%" if defined same >>"%file%6" echo. %preva% "%prevb%"& set same=
::
if "%t1%"=="%t2%" set same=1& >>"%file%6" echo. %preva% "%prevb%"
set t2=%t1%
set "preva=%~1"
set "prevb=%~2"
goto :eof
:: end subroutine
:continue2
echo.Writing duplicates to a batch file "!delsame.bat"
echo.@echo off>"%file%7"
:: echo.chcp 850 >>"%file%7"
echo.chcp 1252 >>"%file%7"
set prev=0
for /f "tokens=1,*" %%a in ('type "%file%6"') do call :sub4 %%a %%b
echo.>>"%file%7"
echo.echo Done!>>"%file%7"
echo.pause>>"%file%7"
move /y "%file%7" !delsame.bat
: del "%file%?"
echo Done!
pause
goto :EOF
:sub4
if %1 EQU %prev% (>>"%file%7" echo del %2) else (>>"%file%7" echo.&>>"%file%7" echo : del %2)
set prev=%1
goto :EOF
>where can I get the last version of sed?
GnuSed for Windows is what I use.
duplicated.cmd is at the root.
I have changed this line:
set "file=M:\tmp\delsametemp.txt" //I must not it was not working on the
C drive either with the line not changed.
----------------------
Does not work
-----------!delsame.bat-----------
@echo off
chcp 1252
: del "\Grand Classiques D'Edgard Encore Plu1"
del "\Grand Classiques D'Edgard Encore Plu2"
del "\Grand Classiques D'Edgard Encore Plu3"
del "\Grand Classiques D'Edgard Encore Plu4"
del "\Grand Classiques D'Edgard Encore Plu5"
del "\Grand Classiques D'Edgard Encore Plu6"
: del 148
del 148
: del 866
del 866
.....
---------------------------------------------
Impossible de trouver M:\leloup2009
Impossible de trouver M:\Liebert
Cant find the file
Done!
===============
JP
>I have the english Vista. ;-)
This is not an English error message.
"Impossible de trouver"
for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2"
2^>nul') do >>"%file%4" set /p ="-%%x"<nul
what is fsum? Another Linux program?
I have to download crc32 -md5 -sha1 -sha512
>Lets talk about chineese language:
>
> for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger "%~2"
>2^>nul') do >>"%file%4" set /p ="-%%x"<nul
>
>what is fsum? Another Linux program?
http://www.slavasoft.com/fsum/
set fs=%fs:,=% replace , with nothing
set fs=%fs: =% replace space with nothing
>>"%file%4" set /p =%fs%<nul ????????????????????
for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger
"%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul
fsum all these and echo the second variable to nul? "%~2" 2^>nul'
do >>"%file%4" set /p ="-%%x"<nul
set /p= - variable x ----------------why is the nul trown in x?
>>>what is fsum? Another Linux program?
>>
>> http://www.slavasoft.com/fsum/
>>
>
> set fs=%fs:,=% replace , with nothing
> set fs=%fs: =% replace space with nothing
> >>"%file%4" set /p =%fs%<nul ????????????????????
Try it and see what it does.
>>file.txt set /p =abc<nul
Check the filesize.
> for /f %%x in ('fsum -jnc -crc32 -rmd -md5 -sha1 -sha512 -tiger
>"%~2" 2^>nul') do >>"%file%4" set /p ="-%%x"<nul
-crc32 -rmd -md5 -sha1 -sha512 -tiger <--- those are all different hashing
algorithms. You probably don't need all of them and it'll speed up the
process if you remove some. Better get it working first though.
>fsum all these and echo the second variable to nul? "%~2" 2^>nul'
2>nul redirects the STDERR stream to nul, not the %2
> do >>"%file%4" set /p ="-%%x"<nul
>
>set /p= - variable x ----------------why is the nul trown in x?
Try it and see what happens.
>>file.txt set /p ="-123"<nul
Then execute this and see what happens.
>>file.txt set /p ="-456"<nul
408 bytes???
Should it not be 3 bytes?
=================================
> Try it and see what happens.
>
>>>file.txt set /p ="-123"<nul
>
> Then execute this and see what happens.
>
>>>file.txt set /p ="-456"<nul
abc-123-456
============================
ok it is 3 bytes
=========file 0==========
R�pertoire de M:\MediaPlayer\tmp\a
2008-05-27 19:19 3 429 027 Saukare.mp3
2008-05-27 19:19 3 429 027 zazou.mp3
2008-05-27 19:19 3 429 027 a.mp3
------------------file 1=================
?dummy
dans ?\le lecteur M s'appelle MUSIQUE
de ?\s�rie du volume est 4836-7181
iaPlayer\tmp\a ?\
3 429 027 ?\Saukare.mp3
3 429 027 ?\zazou.mp3
3 429 027 ?\a.mp3
---------------------------
It does not take the blank space?
>>"%file%1" echo. ?dummy
some where in theese lines?
>Gathering file information...
>Creating file list...
>Finding duplicate filesizes and creating checksum list...
>"\a.mp3"
>"\Saukare.mp3"
>"\zazou.mp3"
>Parsing checksum list...
>Writing duplicates to a batch file "!delsame.bat"
> 1 file(s) moved(s).
>Done!
>
>=========file 0==========
>
> R�pertoire de M:\MediaPlayer\tmp\a
Try it on your English Vista.
I changed theese lines and it works
del %2 /S & echo.>>out.txt for a list of deleted files
in
if %1 EQU %prev% (>>"%file%7" echo del %2 /S & echo.>>out.txt) else
(>>"%file%7" echo.&>>"%file%7" echo : del %2)
and dir /-c
in
dir /-c /a:-d %1 %2 |sed -e "s/!/|/g" -e "/^ .*/d" -e "/
Volume.*/d" -e "s/Directory of/Directory of : /" >"%file%0"
the codepage are only cosmetic because I write all my files in english on my
french Vista. An old habit.
---------
set "t1=%~1"
why not set "t1=%1" ?
Was it my last question? ;-)
Thanks to you.
------- delsametemp3.txt ------
"d\Disque-E\Mes " "\Documents\Medical\MaMort"
"d\Disque-E\Mes "
"\Documents\powerPoint\Atelier2006-7_FlashPoint\temp\Sound"
--------------------
>I changed theese lines and it works
>del %2 /S & echo.>>out.txt for a list of deleted files
>in
> if %1 EQU %prev% (>>"%file%7" echo del %2 /S & echo.>>out.txt) else
>(>>"%file%7" echo.&>>"%file%7" echo : del %2)
The paths should be fully qualified and I suspect it is because the routine
that writes the fully qualified path names is written for english DIR
terms.
>set "t1=%~1"
>why not set "t1=%1" ?
It's a practice used in cases where %1 might have double quotes.
>It sucks on junction points. A french Vista peoblem?
I think this will also be solved with fixing the routine for fully
qualified path names.
this line:
for /r %%I in (*.*) do echo %%~zI "%%I" >>zout.txt
gives me that output:
1791 "C:\Users\Jean\Desktop\-.lnk"
261 "C:\Users\Jean\Desktop\187_-_53543.url"
1088 "C:\Users\Jean\Desktop\AceFTP 3 Freeware.lnk"
273 "C:\Users\Jean\Desktop\Acrobat User Community Forums.url"
How could I get
zout.txt | sed -e "s/!/|/g" -e "/^ .*/d" -e "/ Volume.*/d" -e
"s/Directory of/Directory of : /" >"%file%0"
??
http://www.netikka.net/tsneti/info/tscmd162.htm
All the best, Timo
--
Prof. Timo Salmi mailto:t...@uwasa.fi ftp & http://garbo.uwasa.fi/
Hpage: http://www.uwasa.fi/laskentatoimi/english/personnel/salmitimo/
Department of Accounting and Finance, University of Vaasa, Finland
Useful CMD script tricks http://www.netikka.net/tsneti/info/tscmd.htm
>this line:
> for /r %%I in (*.*) do echo %%~zI "%%I" >>zout.txt
>gives me that output:
>
>1791 "C:\Users\Jean\Desktop\-.lnk"
> 261 "C:\Users\Jean\Desktop\187_-_53543.url"
> 1088 "C:\Users\Jean\Desktop\AceFTP 3 Freeware.lnk"
> 273 "C:\Users\Jean\Desktop\Acrobat User Community Forums.url"
In the routine the filesizes are all padded with leading spaces so they
sort correctly - you can't use the above directly.
a- If the file size are not padded the numbers becomes a kind of string
(id) wich is sorted like a string. The paths (as many as there is)beginning
with the same id can be compare sent to a file and then the fsum can be
apply to them only. It is faster.
1791 "C:\Users\Jean\Desktop\-.lnk"
1791 "C:\Users\Jean\Desktop\187_-_53543.url"
b- How can I use a for loop to output the first part of the string and
after the second part of the string using the space as a delimiter.
1791 " "
%1 = 1791
%2 = "C:\Users\Jean\Desktop\-.lnk"
%3 = 1791
%4 = "C:\Users\Jean\Desktop\187_-_53543.url"
if {%1} == {%3} write the two lines to the file.
>a- If the file size are not padded the numbers becomes a kind of string
>(id) wich is sorted like a string. The paths (as many as there is)beginning
>with the same id can be compare sent to a file and then the fsum can be
>apply to them only. It is faster.
Yes, it will work. The routine to output the filesize using %%~za is far
slower than parsing a list from a DIR command though. On the other hand
parsing the list does cause problems with filenames that start with spaces.
> 1791 "C:\Users\Jean\Desktop\-.lnk"
> 1791 "C:\Users\Jean\Desktop\187_-_53543.url"
>b- How can I use a for loop to output the first part of the string and
>after the second part of the string using the space as a delimiter.
>1791 " "
>
>%1 = 1791
> %2 = "C:\Users\Jean\Desktop\-.lnk"
> %3 = 1791
>%4 = "C:\Users\Jean\Desktop\187_-_53543.url"
>
>if {%1} == {%3} write the two lines to the file.
set var=1791 "C:\Users\Jean\Desktop\187_-_53543.url"
if {%1} == {%3} for /f "tokens=1,*" %%a in ("%var%") do echo %%b
> set var=1791 "C:\Users\Jean\Desktop\187_-_53543.url"
> if {%1} == {%3} for /f "tokens=1,*" %%a in ("%var%") do echo %%b
>
Thats great for keeping only one of the duplicates.
I found
for /r . %J in (*.pdf) do @echo %~zJ %J 1>>out.txt
type out.txt | sort >out2.txt
for /f "tokens=1,2" %I in (out2.txt) do echo %I %J wich gives me the two
separated variables I need.
74105d30 ?CRC32*Wake_Gallery.pdf
How can I get rid of the "
SlavaSoft Optimizing Checksum Utility - fsum 2.52.00337
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2007. All rights reserved."
?
---
@echo off
set _un=
set _deux=
set _drapeau=
for /f "tokens=1,2" %%I in (out2.txt) do call :doublons %%I %%J
:doublons
if {%_un%}=={%1} (
if defined %_drapeau% ( @echo. del "%2" >>!doublons.txt
) else (
@echo. :del "%_deux%" >>!doublons.txt
@echo. del "%2" >>!doublons.txt
set _drapeau=1
)
) else (
set _un=%1
set _deux=%2
set _drapeau=
)
goto :EOF
--------
It seems to have issues here.
out2.txt
123 "c:\abc\def\123.txt"
123 "c:\abc\def\123b.txt"
256 "c:\abc\def\256.txt"
123 "c:\abc\def\123c.txt"
Result:
:del ""
del ""c:\abc\def\123.txt""
:del ""c:\abc\def\123b.txt""
del ""c:\abc\def\256.txt""
:del ""c:\abc\def\256.txt""
del ""c:\abc\def\123c.txt""
:del ""c:\abc\def\123c.txt""
del ""
> :del ""c:\abc\def\256.txt""
---------------
@echo off
set _un=
set _deux=
set _drapeau=
for /f "tokens=1,*" %%I in (out2.txt) do call :doublons %%I %%J
:doublons
if {%_un%}=={%1} (
if {%_drapeau%}=={1} ( @echo. del %~2 >>!doublons.txt
) else (
@echo. :del %_deux% >>!doublons.txt
@echo. del %~2 >>!doublons.txt
set _drapeau=1
)
) else (
set _un=%~1
set _deux=%~2
I've no intention of trying to follow this massive thread, and have simply
skimmed it.
What I gather is that a file is being produced consisting of
[optional spaces][file length][space][quoted full filename]
sorted by file length
with the aim of reporting on duplicate files.
Here's an approach to process this file (physical or not)
This solution developed using XP
It may work for NT4/2K
----- batch begins -------
[1]@echo off
[2]setlocal enabledelayedexpansion
[3]set oldlength=original
[4]set countmatches=0
[5]for /f "tokens=1*" %%i in (daviau.txt) do (
[6]if not %%i==!oldlength! (
[7]call :report
[8]set oldlength=%%i
[9])
[10]set /a countmatches+=1
[11]set match!countmatches!=%%j
[12])
[13]call :report
[14]goto :eof
[15]
[16]:report
[17]if %countmatches% gtr 1 for /l %%a in (1,1,%countmatches%) do for /l %%b
in (%%a,1,%countmatches%) do if not %%a==%%b ECHO testing %%a==%%b
!match%%a! vs !match%%b!
[18]:: for /l %%a in (1,1,%countmatches%) do for /l %%b in
(%%a,1,%countmatches%) do if not %%a==%%b if defined match%%a if defined
match%%b if exist !match%%a! if exist !match%%b! fc /b !match%%a! !match%%b!
>nul & if not errorlevel 1 ECHO\!match%%a! same as !match%%a!
[19]for /l %%m in (1,1,%countmatches%) do (set match%%m=)
[20]set countmatches=0
[21]goto :eof
------ batch ends --------
Lines start [number] - any lines not starting [number] have been wrapped and
should be rejoined. The [number] that starts the line should be removed
The label :eof is defined in NT+ to be end-of-file but MUST be expressedas
:eof
%varname% will be evaluated as the value of VARNAME at the time that the
line is PARSED. The ENABLEDELAYEDEXPANSION option to SETLOCAL causes
!varname! to be evaluated as the CURRENT value of VARNAME - that is, as
modified by the operation of the FOR
The idea here is that the names of the files are installed into the
environment sa MATCH1..MATCHn and on each change of length the gathered
names are compared. Obviously, if the environment area is filled, this would
need to be replaced by a tempfile-processing routine.
The filename in [5] could theoretically be replaced by a single-quoted
process to operate on that process output.
[17] is simply a line to show the comparisons being made. It can be removed
[18] is the real action line and should be un-commented to invoke (I don't
have the filenames in this post's header) An improvement would be to append
(SET match%%b=)
to the end of the line so that match%%b is deleted from the environment if
the file !match%%b! is found to match !match%%a! - the result being that it
is only ever mentioned in the output on the FIRST time a match is found.
I'd suggest that the output be echoed to a report file - but if you decide
to arbitrarily delete the duplicate !match%%b! (DANGEROUS!!) then the IF
EXIST for the two files in [18] should speed matters.
Actually, perhaps the if exist !match%%a! would be better moved to before
the for /l %%b... IF that is, EITHER 'if exist' is strictly required.
Rest is a matter for the experimenter.
Idea only - tested in theory only.