Hi everyone, I'm a relatively new programmer working on a project using the NYPD Motor Vehicle Collisions dataset.
I update my database each night by querying for relevant data that is more recent than the last date present in the database. The problem is that over the past month or so, I've noticed that they are changing and updating the data irregularly. Mainly, numerous incidents are added to the dataset a day or more later than the rest of the incidents from that date (in general it is typically 3-7 days behind). I've also noticed at least one death that was originally listed as an injury.
I'm wondering what would be the best way to handle this. I want to include as up-to-date information as possible, but in doing so, my app is currently missing dozens of incidents that were added at later dates. It seems to be bad practice to delete and then reload data, but it also seems incredibly inefficient to compare hundreds of recent incidents in the database to more recent queries in order to find and account for changes. Is this a typical problem faced by people using these sorts of datasets?