Hi Amber,
As part of migration, we did a pre-migration step of merging accounts that had the same e-mail address. In the dataverse/scripts/migration directory, there is a file, scrub_duplicate e-mails:
Part of what we need to do for migration is better explain how to use this, so I'll send some quick notes here:
- the first three commented out queries are there to help you see the extent of the duplicates e-mails you have*
* this doesn't account for e-mails with different cases, e.g.
gdu...@iq.harvard.edu and
gdu...@IQ.harvard.edu. In 4.3 we made changes to disallow this (i.e. they would be considered duplicates) and again had to do some merging. BUT it wold be easier / better to do that as part of this merging state - so also on our todo is modify these to merge those accounts. My guess is it would just be a case of adding some strategic "lower()" calls to the queries.
- the next query is the delete query that removes the duplicates. This will not immediately work, until after accounts have been "merged"
- lastly you have the update queries - these go through and modify references so that all references go to the user account with the lowest value id (based on the assumption that this was the account the user originally created). E.g if I have 2 users with one dataset each, this will "move" the references from the 2nd dataset to the 1st, so then you would have 1 user with 2 datasets, and the 2nd with none, and so it can now be deleted.
- some of those updates could fail - say you have given permission to the same dataset to both users accounts. When doing the transfer, you would end up with two rows that are the same, which would violate unique constraint. In this case, you don't have to transfer anything, since that permission is already there for the original account (the one you will still have when all is said and done). If you're lucky none of these will fails, but if they do there is a commented out section to help with this:
" if any of the below fail because of duplicate constraints, you will need to first delete the duplicates
here is a sample query for deleting the duplicate entries from studyfile_vdcuser (the most likey to fail))"
As you can see this is a complicated process, so it was done fairly manually here. I hope the above can give you some guidance, but let me know how more I can help.
One word of advice: make sure you have a copy of your data before you start the above clean up, in case you need to go back to it and start over.
Re: passwords, we don't copy them over in the original script, but we did copy them over after we were done with all migration steps and ready to go live. (we didn't want anyone to try and login before we were ready for them). I don't recall if we have that documented anywhere (Kevin may know), but it's a fairly straightforward script. Especially compared to all of the above. That said, even with the old passwords copied, users will be prompted for new passwords, as 4.0 changes the encryption to a stronger encryption.
Gustavo