Find and Merge Duplicates

31 views
Skip to first unread message

Sparkgapper

unread,
Apr 17, 2013, 3:25:47 PM4/17/13
to geditcom-ii...@googlegroups.com
I have a file that has 11000+ notes with anywhere from 2 to 6 being duplicated on numerous individuals.  While the Find and Merge py script works admirably, there seems to be a problem that shows matches for unrelated individuals just because the body of the note contains some of the same information.  Is there any way to temporarily restrict it to 100% just to be able to get through all these notes?  

Sparkgapper

Sparkgapper

unread,
Apr 20, 2013, 10:23:50 AM4/20/13
to geditcom-ii...@googlegroups.com
Is ANYONE receiving my query?  
Sparkgapper

Jim Eggert

unread,
Apr 20, 2013, 11:34:51 AM4/20/13
to geditcom-ii...@googlegroups.com
A quick hack to require 100% matching in the first 50 words of the NOTES field in Find and Merge Duplicates.py is:

change
> def MergeQualityNOTE(rec1,rec2) :
> # if either has html, must match it all, if one not, no match
> if not(rec1.notes) :
> if rec2.notes : return -1.
> if rec1.htmltext != rec2.htmltext : return -1.
> return 100.
> elif not(rec2.notes) :
> return -1.
>
> # exact match
> if rec1.notes == rec2.notes : return 100.

to
> def MergeQualityNOTE(rec1,rec2) :
> # if either has html, must match it all, if one not, no match
> if not(rec1.notes) :
> if rec2.notes : return -1.
> if rec1.htmltext != rec2.htmltext : return -1.
> return 100.
> elif not(rec2.notes) :
> return -1.
>
> # exact match
> if rec1.notes == rec2.notes : return 100.
> return -1.

(The only change is adding the last line "return -1.") Save the modified script in your user scripts folder.

In looking at this code, I noticed that it contains a custom function WordMatchQuality(), when it probably would have been a better idea to use the built-in difflib module, available in Python since version 2.1.

=Jim
> --
> You received this message because you are subscribed to the Google Groups "GEDitCOM II Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to geditcom-ii-discu...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

William G. Bates

unread,
Apr 20, 2013, 11:40:57 AM4/20/13
to geditcom-ii...@googlegroups.com
The next question is: How do I get to the point to where I can make this change?
I tried open the file like I would a normal script file but nothing works.

Sparkgapper

Jim Eggert

unread,
Apr 20, 2013, 1:59:20 PM4/20/13
to geditcom-ii...@googlegroups.com
From GEDitCOM II: Scripts > Reveal System Scripts in Finder
In Finder: Double click on Editing Tools icon, then right click on Find and Merge Duplicates.py. Open with your favorite text editor, like TextEdit.
If using a Lion or later version of TextEdit, File > Duplicate
Do the edits.
File > Save …
In the Save dialog, navigate to your user folder then to Library/Application Support/GEDitCOM II/Scripts/Editing Tools
Save there, not in Library/Application Support/GEDitCOM II/System/Scripts/Editing Tools
From GEDitCOM II: Scripts > Refresh Scripts
Now run the script from the User Scripts section of the scripts menu.

That should do it. When saving the script, you may want to give it a distinctive name like
Find and Merge Duplicates with Matching Notes.py

=Jim

William G. Bates

unread,
Apr 20, 2013, 3:12:26 PM4/20/13
to geditcom-ii...@googlegroups.com
That is what I need so will give it a try and see what happens but probably next week.

Sparkgapper

William G. Bates

unread,
Apr 20, 2013, 3:28:39 PM4/20/13
to geditcom-ii...@googlegroups.com
Did that first part but where is the "Editing Tools" icon? (see attached screen image)
Sparkgapper
Screen shot 2013-04-20 at 1.26.31 PM.png

William G. Bates

unread,
Apr 20, 2013, 4:28:30 PM4/20/13
to geditcom-ii...@googlegroups.com
Disregard that last note - and have the script up and changed as noted. However, after the save as (needed to change .txt to .py) and when I try to use the newly added script it does not show. This has been another of my problems where I have been unable to access any modified script in GEDitCOM II but at least in AppleScript I can still open and run the script from Apple Script Editor. Since I am still running under Snow Leopard, less the last update, just wondering if that may be part of the problem.
Sparkgapper
On Apr 20, 2013, at 11:59 AM, Jim Eggert wrote:

Reply all
Reply to author
Forward
0 new messages