I'm going through the same thing, trying to switch to Lightroom with a library of 60,000 photos. (By the way, iPhoto is quite unstable with a collection this large.) I found a tool called DupeGuru-PE (which is free if you follow the instructions in the license) that compares the image contents using some algorithm, and unfortunately the situation is worse than I had thought. These are a couple problems I'm seeing:
(a) As you mentioned, many of the originals are exact copies of the modified image.
(b) iPhoto re-saves photos for reasons other than actually editing them (resulting in quality loss). I have maybe 30,000 photos where the modified image is slightly different than the original, and I know I have not edited even close to that number. I can verify that the image has been changed by looking at the histogram in Lightroom, which clearly is different for the two versions of the photos. Though I can't visually see any difference in the photo, it bothers me to not have the original.
(c) Occasionally, an entire event will have the originals from a different event. I'm assuming this is unavoidable, as PhoShare is guessing based on filenames and dates, as it cannot get this information from the iPhoto database.
I'm finding that (a) is easy to correct using DupeGuru, (b) is tricky, and (c) is worrisome. I've had some luck by exporting the results from DupeGuru to CSV, then using regexes in Sublime Text to massage that data into a bash script that moves and removes the files as I'd like.
When I trusted my collection to iPhoto, I knew most of the organization was being stored in a proprietary database format, but for some reason I assumed there would be a reasonable way to export the data. I never expected that iPhoto would make it so hard to get at the originals.
While this process is taking much longer than I anticipated, at least it's possible. Thank you so much, Tilman, for making this program.
Dave