Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

List of NOT duplicate files

40 views
Skip to first unread message

Neil

unread,
Mar 12, 2023, 3:08:02 PM3/12/23
to
I've been doing this manually but I'm asking if there is a better way.

I have two folders of text files, some of which have duplicate file names.
The _contents_ of those text files, even with the same name, are different.
But everything is dependent only on the file names (not on their contents).

Folder A is the master folder containing thousands of A\*.txt files.
Folder B is the folder whose new files (only!) are added to folder A.

That is, if a B\*.txt file name is already in folder A, it's not added.
Only if a B\*.txt file name is NOT already in folder A, is it added.
Folder A always grows but it never contains a duplicate file name from B.

What I need is a list of the files that were in folder B that were not
duplicates of what was already in folder A when they were copied/moved.

Is there an easy way to obtain that list which is better than how I'm
manually doing it today, which is to make a list of what file names are
originally in A and I make a list of the file names in B that are left
after I copy/move the files over without duplication and then I subtract
one list from the other.

Is there an easier way to get that list of what file names are in
folder B that aren't yet in folder A when I move just those files over?
--
best regards,

Neil

Alan Browne

unread,
Mar 12, 2023, 3:23:37 PM3/12/23
to
Being lazy, I just pasted everything above into ChatGPT. It immediately
output the following. Note that I'm not a Python programmer, so can't
vouch for this. In particular I'm not sure if it differences files with
the same name but different content.

ChatGPT:
==================================
Yes, there are several ways to obtain the list of files in folder B that
are not duplicates of what's already in folder A. One way is to use a
scripting language like Python to compare the file names in folder A and
folder B and identify the files in folder B that are not already in
folder A. Here's an example Python script that does this:

python
Copy code
import os

folderA = 'path/to/folderA'
folderB = 'path/to/folderB'

filesA = set(os.listdir(folderA))
filesB = set(os.listdir(folderB))

newFiles = filesB - filesA

print(newFiles)
This script uses the os module to get a list of files in folder A and
folder B, and then creates sets of the file names. It then subtracts the
set of files in folder A from the set of files in folder B to get a set
of new files. The newFiles set contains the names of the files in folder
B that are not already in folder A.

You can modify the folderA and folderB variables to point to the actual
paths of your folders. You can save the script as a .py file and run it
in a Python environment, or you can run it from the command line using
python script.py.

This is just one example of how to solve the problem, and there are many
other ways to approach it depending on your specific requirements and
preferences.
==================================

--
“Donald Trump and his allies and supporters are a clear and present
danger to American democracy.”
- J Michael Luttig - 2022-06-16
- Former US appellate court judge (R) testifying to the January 6
committee

Alan Browne

unread,
Mar 12, 2023, 3:30:08 PM3/12/23
to
On 2023-03-12 15:08, Neil wrote:
...
So, then I asked ChatGPT to be sure it accounted for the case of the
same name, but different content;

ChatGPT:
==================================
No, the script I provided only checks for new files based on their
filename, so it doesn't take into account the contents of the files.

If you want to check for new files based on both their filename and
contents, you can modify the script to read the contents of each file
and compare them before deciding whether to add the file to folder A or
not. Here's an example of how you can do this in Python:


import os
import filecmp
import shutil

folderA = 'path/to/folderA'
folderB = 'path/to/folderB'

# Get a list of file names in folder A
filesA = os.listdir(folderA)

# Iterate over each file in folder B
for fileB in os.listdir(folderB):
# Check if the file already exists in folder A
if fileB in filesA:
# If the file already exists, compare the contents
if not filecmp.cmp(os.path.join(folderA, fileB),
os.path.join(folderB, fileB), shallow=False):
# If the contents are different, copy the file to folder A
with a unique name
shutil.copy2(os.path.join(folderB, fileB),
os.path.join(folderA, fileB + '_new'))
else:
# If the file doesn't exist in folder A, copy it over
shutil.copy2(os.path.join(folderB, fileB),
os.path.join(folderA, fileB))
==================================

Which may be better. Please give it a go!

Herbert Kleebauer

unread,
Mar 12, 2023, 3:43:44 PM3/12/23
to
When in directory b use this command (double the % if used in a batch program):

for %i in (*.txt) do if not exist ..\a\%i copy %i ..\a\

Maybe you want to use "move" instead of "copy"

But maybe the simplest way is to use file explorer and copy all
files of b/ to a/ and select don't replace existing files.



😉 Good Guy 😉

unread,
Mar 12, 2023, 6:46:07 PM3/12/23
to
On 12/03/2023 19:08, Neil wrote:


Is there an easier way to get that list of what file names are in folder B that aren't yet in folder A when I move just those files over?

Sure there is but it depends on whether you have excel installed on your machine or whether you are prepared to use the Online version of Excel. Assuming you have Excel on your machine or online version then read this article:

<https://support.microsoft.com/en-us/office/find-and-remove-duplicates-00e35bea-b46a-4d5d-b28e-66a552dc138d>

To create a list of files is by using a command line on the folder and pipe the output to a text file?

Something like this: "dir /b > listA.txt"


Please scroll down and click on the Submit button after you have completed the required fields.

<https://contact.mainsite.tk/>
<https://contact.mainsite.tk/>
<https://contact.mainsite.tk/>
<https://contact.mainsite.tk/>
<https://contact.mainsite.tk/>







Philip Herlihy

unread,
Mar 13, 2023, 7:38:03 AM3/13/23
to
In article <tul7ug$36gq8$1...@dont-email.me>, Neil wrote...
Robocopy (built-in to all recent versions of Windows) can do this. It takes
work to get to a basic competence with it, but it's so useful that it's well
worth it. There's a guide out there (Google Robocopy.doc) or type robocopy /?
into a command window. There are a LOT of command options!

Essentially robocopy <folder1> <folder2> will _selectively_ copy files from 1
to 2, only if they are newer or otherwise different. Exactly how it chooses
this is controlled by the options. Usefully, the /L option does all the
calculations but doesn't move anything. Dangerously, the /MIR option deletes
anything in the target folder that wasn't found in the source folder.

The way I often use it is to navigate in a command window to my desktop, so job
files (described below) and log files are readily seen there. I run:
robocopy <f1> <f2> <options, including /L> /SAVE:jobfile.rcj /QUIT
This creates a "job file" which can be re-run with this syntax:
robocopy /JOB:jobfile.rcj
You then edit the jobfile (double-colon comments out a line) to fool around
with the options, studying the output (you can usefully create a /LOG file and
/TEE the output to the command window). When you're confident you have it
right, you remove the /L and the files get moved next time.

In your situation, you should pay particular attention to the file "classes":
Newer, Different, Older, etc, etc. Robocopy will elegantly do exactly what you
want. It's a utility I only occasionally use, but when I do it's absolutely
invaluable.

--

Phil, London

Neil

unread,
Mar 13, 2023, 10:14:07 AM3/13/23
to
On 3/13/2023 1:13 AM, Herbert Kleebauer wrote:
>> Is there an easier way to get that list of what file names are in
>> folder B that aren't yet in folder A when I move just those files over?
>
> When in directory b use this command (double the % if used in a batch program):
>
> for %i in (*.txt) do if not exist ..\a\%i copy %i ..\a\
>
> Maybe you want to use "move" instead of "copy"
>
> But maybe the simplest way is to use file explorer and copy all
> files of b/ to a/ and select don't replace existing files.

I should have been clear that I can easily copy (or move) the files
selectively because the native Windows command has an overwrite yes/no.

What I need is a separate _list_ of what was moved (or copied) over.
for %i in (*.txt) do if not exist ..\a\%i copy %i ..\a\
echo %i > list.txt

What I do now is compare time stamps of the resulting combined folder.
But sometimes the time stamps aren't what I would have thought they'd be.
--
best regards,

Neil

Mr. Man-wai Chang

unread,
Mar 13, 2023, 12:06:58 PM3/13/23
to
On 3/13/2023 3:08 AM, Neil wrote:
>
> List of NOT duplicate files
>

"NOT duplicate"" = Unique? :)

Herbert Kleebauer

unread,
Mar 13, 2023, 1:18:18 PM3/13/23
to
You just answered your own question (but don't use the name
"list.txt" or it will also be listed as new text file):
While in directory b\ enter at the command prompt (but first
delete new.log if it already exists):

for %i in (*.txt) do if not exist ..\a\%i echo %i>>new.log

Or make a batch file which first changes the working
directory to b\, deletes new.log if it already exists
and then executes the above line (but double any %).
Then it doesn't matter where the batch is located and
you can start it with a double click with the mouse.




Neil

unread,
Mar 13, 2023, 1:19:40 PM3/13/23
to
It doesn't really matter what the contents of the files are.

I only brought up the contents being different because a dumb move wouldn't
work because that would overwrite files of the same name in A with the new
contents of another file of the same name from B.

A is the master folder.

B is just the new files (some of which may have the same name and hence
would be discarded).

I should also be clear the copy (or move) isn't the problem because Windows
already won't copy over the file in A if it has the same name as in B.

What I'm after is the _list_ of files that were copied (or moved) over from
B to A (those are all going to be file names which were not in A prior).

Kind of like this pseudish code snip
cd B
for %i in (*.txt) do if not exist ..\a\%i copy %i ..\A\
echo %i >> list_of_file_names_in_b_not_in_a.txt

I don't even need to do the copy/move because that's the easy part.
The hard part for me is getting a list of file names in B not already in A.
--
best regards,

Neil

Zaidy036

unread,
Mar 13, 2023, 4:15:10 PM3/13/23
to
A list of file names in a folder:
DIR /B <full path folderA> > X:\FolderA.txt
where X: is a full path and not required if same folder as folderA

So make a batch:
1. Make FolderA.txt
2. Make FolderB.txt
3. FOR /f %%A in ('DIR /A:D /B "FolderB.txt" ^| FIND /V /C ""') DO SET
cnt=%%A
4. Make a loop:
FOR /L %%f IN (1,1,%cnt%) DO (
SET /P _Fname=<FolderB.txt
IF EXIST %<full path folderA>%\!_Fname! echo !_Fname!>> X:\FolderC.txt
MORE +1 FolderB.txt
)
FolderC will contain file names from FolderB that are also in FolderA

Graham J

unread,
Mar 13, 2023, 4:46:37 PM3/13/23
to
Zaidy036 wrote:


Robocopy will do what you want.

Use its logging feature, and it will report the files it copies.


--
Graham J

knuttle

unread,
Mar 13, 2023, 4:49:26 PM3/13/23
to
There is another way that is not elegant.

Run the following Batch file in each directory

Dir *.* /s >text.txt

Open the first text file in column A of the spreadsheet and sort the column

open the second text file in Column C of the spreadsheet and sort that
column. You may need to open the second text file in another spreadsheet
and copy it to column C of the first spreadsheet.

enter A1=C1 into all of the cells in Column B,

Start at the top and every time you see a 0, in column B, make what ever
adjustments are necessary in Columns A or C to make the cells equal
again A1=C1 is 1 you may need to copy the cell in Column B down to the
bottom, depending how you make the adjustment.

When Column B has all 1, you have identical date in the cells in rows A
and C.

I find I can compare the two data sets using this method faster that I
can create and debug an scripts or commands.

Neil

unread,
Mar 13, 2023, 9:00:09 PM3/13/23
to
Herbert Kleebauer wrote:

>> What I do now is compare time stamps of the resulting combined folder.
>> But sometimes the time stamps aren't what I would have thought they'd be.
>
> You just answered your own question (but don't use the name
> "list.txt" or it will also be listed as new text file):
> While in directory b\ enter at the command prompt (but first
> delete new.log if it already exists):
>
> for %i in (*.txt) do if not exist ..\a\%i echo %i>>new.log
>
> Or make a batch file which first changes the working
> directory to b\, deletes new.log if it already exists
> and then executes the above line (but double any %).
> Then it doesn't matter where the batch is located and
> you can start it with a double click with the mouse.

Thank you for a way to diff list file names that are in B but not A.
cd B
for %i in (*.txt) do if not exist C:\path\to\A\%i echo %i >> new.log

To which I added the copy I had been using with my manually created diff.
cd B
echo "No" | copy/-Y *.txt C:\path\to\A\

I like that your copy command also copied what's in B not already in A.
cd B
for %i in (*.txt) do if not exist C:\path\to\A\%i copy %i C:\path\to\A\

To explain what I was doing before you provided that nice diff listing
and diff batch copy commands, and using the names for that diff file that
are in this thread...

Before, what I was doing was _manually_ creating the diff file on my own.
And then I was editing that diff file to create a new.bat on my own.

vim new.log
:%s/.*/echo n | copy & C\:\\path\\to\\A\\&
:w! new.bat

The explanation of that is ":" starts a command inside of vim.
The "%" says to run the command on every line of the file.
The "s/.*" says search for any character & then zero or any number of them.
The "/" means to replace what that search found with what comes next.
Next is what to replace it with (the ampersand being what was found).
The extra backslashes are to escape characters which have other meanings.

Where the resulting new.bat file contained the respective copy commands:
echo n | copy file1.txt C:\path\to\A\file1.txt
echo n | copy file2.txt C:\path\to\A\file2.txt
echo n | copy file3.txt C:\path\to\A\file3.txt

But your method is more elegant than my manual editing of the diff file
to turn that diff file of what's in B not yet in A into a batch copy file.

@echo off
cd C:\path\to\B
for %%i in (*.txt) do if not exist C:\path\to\A\%%i echo %%i >> new.log
for %%i in (*.txt) do if not exist C:\path\to\A\%%i copy %%i C:\path\to\A\

Thank you!
--
best regards,

Neil

Big Al

unread,
Mar 14, 2023, 8:41:28 AM3/14/23
to
You might use meld.
meld <folder A> <folder B>
It will highlight all the files that are not in A. Also it will highlight all the files that are in both but the text
in them is diff.

--
Al

wasbit

unread,
Mar 15, 2023, 5:37:52 AM3/15/23
to
Presume you mean
- http://meldmerge.org/

-
Regards
wasbit

Big Al

unread,
Mar 15, 2023, 8:20:07 AM3/15/23
to
Yes.
--
Al

0 new messages