De-duping imports

1 view
Skip to first unread message

Joe Cohen

unread,
May 22, 2008, 11:37:24 AM5/22/08
to PDX Tech Calendar
An idea for dealing with some duplicates when we import sources.
This catches only some low-lying fruit, and even there has some
problems.
But I'd be interested in understanding whether it's acceptable.

See also
De-duping events/venues
http://groups.google.com/group/pdx-tech-calendar/browse_thread/thread/8509ba75f44bb8ec/077d5806acd47eea?lnk=gst&q=duplicates#077d5806acd47eea
RailsConf code drive
http://groups.google.com/group/pdx-tech-calendar/browse_thread/thread/e197c05cc1cf3d2f/3f142c9024a7e58e?lnk=gst&q=duplicate#3f142c9024a7e58e

If an imported event has a UID at its source, we save it as
source_uid.
When later importing an event, see if we already have an event with
that source_uid; we import it only if (at the source) it has a LAST-
MODIFIED time that is later than the Calagator event.updated_at?
I know this isn't perfect,[A] but I think it's an 80% solution.

--Joe

[A] Here's an example problem:
1. We import the event.
2. The event is changed at the source, e.g., it is rescheduled.
3. Someone edits the event in Calagator, e.g., corrects a typo in the
description.
4. We attempt to re-import the event.
My proposal will loose the corrected time.


Igal's
Reply all
Reply to author
Forward
0 new messages