DirDiffVim for folders, but files are missed?

249 views
Skip to first unread message

K otgc

unread,
Jan 1, 2024, 10:58:36 AMJan 1
to vim_use
Hello,
I have Google Photos downloaded and I'm trying to sort out the mess of duplicates.
I usually use vimdiff for file content, however this needs files in directories compared.

Running command vim -c "DirDiff dir1 dir2" -> select Enter -> DirDiff.vim correctly shows folders only in dir1 and dir2.
However, I need dir1's files that aren't duplicates of dir2, to move into dir2.
How can I manage this please?

K otgc

unread,
Jan 2, 2024, 12:40:30 PMJan 2
to vim_use
Here's my plan for ProjectGooglePhotosCleanUpDownloads:
Process1: copying and pasting 241 (TakeoutAlbumYears/Google Photos/Photos from YYYY) +12 (TakeoutAlbumsMade/Google Photos/VariousNamedDirectoriesIMade) directories;
Process2: sorting duplicate files in a Main directory;
Process3: copying non duplicate files to a new directory TakeoutAlbumsMade/Google Photos/VariousNamedDirectoriesIMade/PhotosNotInAlbums.

Any suggestions on which software robably (Vim, Vimdiff or Vim-dirdiff) and commands to use please?

meine

unread,
Jan 2, 2024, 2:30:06 PMJan 2
to vim...@googlegroups.com
Since you are looking for a difference or resemblance in files and not
per se contence, I suggest using a command line tool like `diff` or a
graphical program like `meld`.

Both will provide you with differences and alikes in name, date, size,
etc.

Vim and vimdiff are for comaring text files, and you seem to need
something different.

KR,

//meine

K otgc

unread,
Jan 2, 2024, 6:27:28 PMJan 2
to vim_use
Thanks, I actually did use Meld, great GUI.
However, the sorting process needs digital automation.
Manually moving 1000's of files on several accounts is too slow and I haven't figured out how Meld could do that?

meine

unread,
Jan 3, 2024, 11:44:29 AMJan 3
to vim...@googlegroups.com
On Tue, Jan 02, 2024 at 03:27:28PM -0800, K otgc wrote:
> Thanks, I actually did use Meld, great GUI.
> However, the sorting process needs digital automation.
> Manually moving 1000's of files on several accounts is too slow and I
> haven't figured out how Meld could do that?

For automation regular `diff` seems a more appropriate tool. You can
call it in a script, etc.

//meine

rwmit...@gmail.com

unread,
Jan 3, 2024, 12:38:08 PMJan 3
to vim_use

I'd recommend using something like md5 to generate unique hash values for every file into a single file.
From that, extract just the md5 values and pipe to " sort | uniq -c | sort -nr"
This will generate a list of hash values reverse sorted by how often they occur. 
Anything occurring more than once, is duplicated (and will be near the top of the output).
Use those hash values to look up all the matching filenames in the original md5 file.

Of course, all of this will fail when anything minor (such as meta data) changes in a photo.

K otgc

unread,
Jan 4, 2024, 8:11:23 AMJan 4
to vim_use
Diff worked, similar to Meld, with a clear list of what files (photos) are in or not in directories.
I'm stuck on the command to extract just the md5 values?

Directory with photo files -> find -H directoryName/ ! -type d -exec md5sum {} + >sum.md5 -> select Enter -> sort | uniq -c | sort -nr checksum’sFileName.md5 -> select Enter -> not sure how to extract just the md5 values?

ubuntu@ubuntu:~/Documents$ cat sum2.md5
3bc3be114fb6323adc5b0ad7422d193a  test1/test1.1/test1.1.1/test1.1.1file2.JPG
126a8a51b9d1bbd07fddc65819a542c3  test1/test1.1/test1.1.1/test1.1.1file1.JPG.json
3e7705498e8be60520841409ebc69bc1  test1/test1.1/test1.1.1/test1.1.1file1.JPG
ubuntu@ubuntu:~/Documents$ sort | uniq -c | sort -nr sum2.md5
126a8a51b9d1bbd07fddc65819a542c3  test1/test1.1/test1.1.1/test1.1.1file1.JPG.json
3e7705498e8be60520841409ebc69bc1  test1/test1.1/test1.1.1/test1.1.1file1.JPG
3bc3be114fb6323adc5b0ad7422d193a  test1/test1.1/test1.1.1/test1.1.1file2.JPG

jr

unread,
Jan 4, 2024, 10:14:24 AMJan 4
to vim...@googlegroups.com
hi,

On Thu, 4 Jan 2024 at 13:11, K otgc <kontheg...@gmail.com> wrote:
> Diff worked, similar to Meld, with a clear list of what files (photos) are in or not in directories.
> I'm stuck on the command to extract just the md5 values?
> Directory with photo files -> find -H directoryName/ ! -type d -exec md5sum {} + >sum.md5 -> select Enter -> sort | uniq -c | sort -nr checksum’sFileName.md5 -> select Enter -> not sure how to extract just the md5 values?
>
> ubuntu@ubuntu:~/Documents$ cat sum2.md5
> 3bc3be114fb6323adc5b0ad7422d193a test1/test1.1/test1.1.1/test1.1.1file2.JPG

the command you're looking for is 'cut(1)', eg '$ cat sum2.md5 | cut -d' ' -f1'.

--
regards, jr.

You have the right to free speech, as long as you're not dumb enough
to actually try it.
(The Clash 'Know Your Rights')

this email is intended only for the addressee(s) and may contain
confidential information. if you are not the intended recipient, you
are hereby notified that any use of this email, its dissemination,
distribution, and/or copying without prior written consent is
prohibited.

K otgc

unread,
Jan 6, 2024, 10:00:48 AMJan 6
to vim_use
Thanks.
I ran these commands, and I'm up to the final step of using those hash values to look up all the matching filenames in the original md5 file.
I'm researching a command for that, as command fdupes seems to be for files, but I need to match up the md5 hash values?
ubuntu@ubuntu:~/Documents$ find -H test1/ ! -type d -exec md5sum {} + > sum.md5
ubuntu@ubuntu:~/Documents$ ls
NoMachine  sum.md5  test1  test2
ubuntu@ubuntu:~/Documents$ cat sum.md5
3bc3be114fb6323adc5b0ad7422d193a  test1/test1.1/test1.1.1/test1.1.1file2.JPG
126a8a51b9d1bbd07fddc65819a542c3  test1/test1.1/test1.1.1/test1.1.1file1.JPG.json
3e7705498e8be60520841409ebc69bc1  test1/test1.1/test1.1.1/test1.1.1file1.JPG
d8e8fca2dc0f896fd7cb4cb0031ba249  test1/test2/test2.2/test2.2.2/test2.2.2file1.JPG
126a8a51b9d1bbd07fddc65819a542c3  test1/test2/test2.2/test2.2.2/test1.1.1file1.JPG.json
d8e8fca2dc0f896fd7cb4cb0031ba249  test1/test2/test2.2/test2.2.2/test2.2.2file1.JPG.json
ubuntu@ubuntu:~/Documents$ sort|uniq -c|sort -nr sum.md5 |cut -d ' ' -f1
126a8a51b9d1bbd07fddc65819a542c3
126a8a51b9d1bbd07fddc65819a542c3
3e7705498e8be60520841409ebc69bc1
3bc3be114fb6323adc5b0ad7422d193a
d8e8fca2dc0f896fd7cb4cb0031ba249
d8e8fca2dc0f896fd7cb4cb0031ba249

rwmit...@gmail.com

unread,
Jan 6, 2024, 11:55:24 AMJan 6
to vim_use

you have your commands out of order.
you need to use cut on the original file to extract just the md5 values, then pipe that to sort | uniq -c | sort -nr
to generate a list of md5 values with their counts of how often they occur.  counts greater than 1 indicate duplicates.

From that, select each md5 value with counts > 1 then use grep to find that md5 with its names in the original file.

jr

unread,
Jan 6, 2024, 12:25:16 PMJan 6
to vim...@googlegroups.com
hi,

(you do realise we're somewhat OT for this forum ? :-))

On Sat, 6 Jan 2024 at 15:00, K otgc <kontheg...@gmail.com> wrote:
> Thanks.
> I ran these commands, and I'm up to the final step of using those hash values to look up all the matching filenames in the original md5 file.
> I'm researching a command for that, as command fdupes seems to be for files, but I need to match up the md5 hash values?
> ubuntu@ubuntu:~/Documents$ find -H test1/ ! -type d -exec md5sum {} + > sum.md5
> ubuntu@ubuntu:~/Documents$ ls
> NoMachine sum.md5 test1 test2
> ubuntu@ubuntu:~/Documents$ cat sum.md5
> 3bc3be114fb6323adc5b0ad7422d193a test1/test1.1/test1.1.1/test1.1.1file2.JPG
> 126a8a51b9d1bbd07fddc65819a542c3 test1/test1.1/test1.1.1/test1.1.1file1.JPG.json
> 3e7705498e8be60520841409ebc69bc1 test1/test1.1/test1.1.1/test1.1.1file1.JPG
> d8e8fca2dc0f896fd7cb4cb0031ba249 test1/test2/test2.2/test2.2.2/test2.2.2file1.JPG
> 126a8a51b9d1bbd07fddc65819a542c3 test1/test2/test2.2/test2.2.2/test1.1.1file1.JPG.json
> d8e8fca2dc0f896fd7cb4cb0031ba249 test1/test2/test2.2/test2.2.2/test2.2.2file1.JPG.json
> ubuntu@ubuntu:~/Documents$ sort|uniq -c|sort -nr sum.md5 |cut -d ' ' -f1
> 126a8a51b9d1bbd07fddc65819a542c3
> 126a8a51b9d1bbd07fddc65819a542c3
> 3e7705498e8be60520841409ebc69bc1
> 3bc3be114fb6323adc5b0ad7422d193a
> d8e8fca2dc0f896fd7cb4cb0031ba249
> d8e8fca2dc0f896fd7cb4cb0031ba249


ubuntu@ubuntu:~/Documents$ find -H test1/ ! -type d -exec md5sum {} + > sum.md5

why not use '-type f' ? anyway, the following should do what you look for:

$ find -H test1/ ! -type d -exec md5sum {} + | awk -f kotgc.awk

the awk code is:
-----<snip>-----
{
if ($1 in arr)
arr[$1] = arr[$1] ", " $2
else
arr[$1] = $2
}

END {
for (m in arr)
if (arr[m] ~ ".*,.*")
print m " " arr[m]
}
-----<snip>-----

K otgc

unread,
Jan 7, 2024, 12:20:21 AMJan 7
to vim_use
Thanks.
Yes, the Vim question has morphed into a programming question, which is not only 'OT' for this forum, but definitely 'OT' for my skills.
I simply need Google Photos to download my photos, imagine computer illiterate people trying this?

I'm stuck on:
step: generating a list of md5 values, with their counts of how often they occur (counts >1 indicate duplicates);
step: select md5 values with counts >1;
step: use grep to find that md5 with its (fileName.jpg?) in the original (TakeoutAlbumYears?) file.

Here's where I'm at:
ubuntu@ubuntu:~/Documents$ sort |uniq -c|sort -nr fileNamesCutOutOfSum.md5
126a8a51b9d1bbd07fddc65819a542c3
126a8a51b9d1bbd07fddc65819a542c3
3e7705498e8be60520841409ebc69bc1
3bc3be114fb6323adc5b0ad7422d193a
d8e8fca2dc0f896fd7cb4cb0031ba249
d8e8fca2dc0f896fd7cb4cb0031ba249
^C
ubuntu@ubuntu:~/Documents$ cat fileNamesCutOutOfSum.md5
3bc3be114fb6323adc5b0ad7422d193a
126a8a51b9d1bbd07fddc65819a542c3
3e7705498e8be60520841409ebc69bc1
d8e8fca2dc0f896fd7cb4cb0031ba249
126a8a51b9d1bbd07fddc65819a542c3
d8e8fca2dc0f896fd7cb4cb0031ba249
ubuntu@ubuntu:~/Documents$ sort |uniq -c|sort -nr fileNamesCutOutOfSum.md5 > fNCOOSSorted.md5
^C
ubuntu@ubuntu:~/Documents$ ls
fileNamesCutOutOfSum.md5  fNCOOSSorted.md5  NoMachine  sum.md5  test1  test2
ubuntu@ubuntu:~/Documents$ cat fNCOOSSorted.md5
126a8a51b9d1bbd07fddc65819a542c3
126a8a51b9d1bbd07fddc65819a542c3
3e7705498e8be60520841409ebc69bc1
3bc3be114fb6323adc5b0ad7422d193a
d8e8fca2dc0f896fd7cb4cb0031ba249
d8e8fca2dc0f896fd7cb4cb0031ba249

jr

unread,
Jan 7, 2024, 12:55:06 AMJan 7
to vim...@googlegroups.com
hi,

On Sun, 7 Jan 2024 at 05:20, K otgc <kontheg...@gmail.com> wrote:
> ...
> I'm stuck on:

why does my (proposed) awk solution not work for you, ie output of
lines formatted "md5sum file1, file2[, .., fileN]" ? (admittedly
there's no "count", and I haven't coded in awk for a while so there
will be "neater" ways of writing, I'm sure)

> step: generating a list of md5 values, with their counts of how often they occur (counts >1 indicate duplicates);
> step: select md5 values with counts >1;
> step: use grep to find that md5 with its (fileName.jpg?) in the original (TakeoutAlbumYears?) file.
>
> Here's where I'm at:
> ...

K otgc

unread,
Jan 7, 2024, 11:34:41 PMJan 7
to vim_use
Solved using fdupes. Thanks for the suggestions.
Reply all
Reply to author
Forward
0 new messages