Hello,
Background: large, multi-site REDCap project (external sites all separated by DAGs)
This is a longitudinal project, with repeating events and repeating instruments
For complicated reasons, one of the external sites does not login in to the central REDCap project for data entry. Instead, they need to enter data into a local copy of the REDCap project (we provided them the XML) on their instance of REDCap -- and then their local data needs to be imported into the central REDCap project.
The central project needs the external data at least monthly.
Previously-entered data is expected to be changed occasionally (for example, if errors are found, study sites should correct the error); for this reason, we have taken the approach of asking the external site for a complete copy of their entire database each month -- the thought being that if they send us what they have, we will have an exact duplicate of their data! But I think the import process is getting in the way of that goal.
The strategy to date for handling this external site's data has been:
- The external site exports all data, in raw CSV format.
- Site submits their CSV file to the central project coordinator.
- Central project coordinator imports site's CSV file into the central REDCap project
- the "Allow blank values to overwrite existing saved values?" is set to "YES" - to account for cases where the site needs to delete a text field entry (e.g., months later, they realize that a question should have been left blank)
- Originally, we used the standard "Data Import Tool", but the files became too big for our REDCap to handle. Instead of manually splitting into smaller files, we have been using the "Big Data Import" External Module for the past several imports.
- Aside from some annoyances (e.g., making sure the external site's records get assigned to their DAG in the central project), we thought this was working well.
Problem: We (central REDCap) are not always seeing the exact same data that they (external site) see in their local copy. I think this has to do with repeating forms.
Example:
- Import#1 from external site had a record with 5 instances of a repeating Medication form.
- After Import #1, the local site realized they made an erroneous data entry -- so they have deleted Medication instance #4 from their local copy.
- At the time of Import #2, Medication #4 is no longer in the external site's CSV export -- but that doesn't mean it will be deleted from the central project upon Import #2: a row for Medication #4 isn't in the CSV, so nothing about it gets updated in the import. Thus, the external site has the "right" data (Meds #1,2,3,5), but the central site has incorrect data (Meds #1-5; Med #4 persists).
First, does the described behavior match your understanding of how multiple CSV data imports would work with repeating forms?
Based on this, it seems an import would only be guaranteed to result in the central REDCap having an "exact copy" of the external site's data if the central REDCap was essentially a "blank slate" for the site; e.g., if the DAG's entire set of records was deleted (or at least all values set to null) prior to each import. Does that seem right?
Even though the data is saved in multiple places, I'm not thrilled with the idea of deleting all of the site's current data prior to importing the newest data. And I'm not even sure how I'd "zero out" this DAG in an efficient way (hundreds of records, repeating forms & events).
I'd appreciate any suggestions on how to handle this -- preferably via file imports, but maybe API is the best answer.
Thanks all,
Jeff