Hi all,
I want to share the experience of how we upgraded from 4.2.2 to 6.1 in the Hong Kong University of Science and Technology Library.
Please be warned: This is not the official way to upgrade. It is just a sharing. Please take your own risk if you follow the same method.
We installed version 4.2.2 in 2016 and have been using this version for these years. We decided to upgrade to version 6 in late 2023.
According to the official guide, it requires following the upgrade instructions for each intermediate version.
https://guides.dataverse.org/en/latest/installation/upgrading.html
However, to upgrade from 4.2.2 to 6.1, there are 60 intermediate versions. Not yet mention the changes in the version of PostgreSQL, Solr, Java, and Glassfish4 to Payara6 between the three major dataverse versions. We also applied customization to the code. We will need to customize and test each version upgrade. If we follow the official upgrade guide, the upgrade project from 4.2.2 to 6.1 will take us a few months to complete.
We finally decided to take risks to shorten the timeframe on this project. Here is our strategy:
- Backup database and data files first.
- Follow the official installation guide for a fresh install of 6.1, including payara6.
- Create SQL files for database migration by comparing the 6.1 and 4.2.2 databases. We use pgAdmin (https://www.pgadmin.org/) to compare the database schema between two databases.pgAdmin is a free tool for PostgreSQL database.
We restore the 4.2.2 database as dvndb2 and compare its schema with the 6.1 using pgAdmin. By comparing the database schema, this allows us to identify the differences between the two versions.
The goal is to get the database migration scripts to patch 4.2.2 database to 6.1. For details, you may check here to see how to compare the schema using pgAdmin: https://www.pgadmin.org/docs/pgadmin4/8.1/schema_diff.html - We create the SQL files in the following order. We get the clauses for each stage by selecting the checkbox in pgAdmin using the filter to trim down the results (e.g., tables only exist in 6.1, tables only exist in 4.2.2, tables exist in both versions.). Remember to skip those clauses that will reset the sequence to 1. We keep trying by restoring the database, patching the database using the created SQL file for each step, and solving the issues in the SQL file until there are no errors when running the SQL file.
- Drop the tables and sequences that only exist in 4.2.2 (by comparing in pgAdmin)
- Create the new functions that only exist in 6.1 (by comparing in pgAdmin)
- Create the new tables that only exist in 6.1 (by comparing in pgAdmin)
- We break down the generated SQLs of creating tables to multiple clauses to avoid errors.
For example, break down the create table clause to something like this:
CREATE TABLE public.alternativepersistentidentifier
(
id integer NOT NULL,
authority character varying(255),
globalidcreatetime timestamp without time zone,
identifier character varying(255),
identifierregistered boolean,
protocol character varying(255),
storagelocationdesignator boolean,
dvobject_id bigint NOT NULL
);
ALTER TABLE public.alternativepersistentidentifier OWNER TO dvnapp;
CREATE SEQUENCE public.alternativepersistentidentifier_id_seq
AS integer
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE CACHE 1;
ALTER TABLE public.alternativepersistentidentifier_id_seq OWNER TO dvnapp;
ALTER SEQUENCE public.alternativepersistentidentifier_id_seq OWNED BY public.alternativepersistentidentifier.id;
ALTER TABLE ONLY public.alternativepersistentidentifier ALTER COLUMN id SET DEFAULT nextval('public.alternativepersistentidentifier_id_seq'::regclass);
ALTER TABLE ONLY public.alternativepersistentidentifier
ADD CONSTRAINT alternativepersistentidentifier_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.alternativepersistentidentifier
ADD CONSTRAINT fk_alternativepersistentidentifier_dvobject_id FOREIGN KEY (dvobject_id) REFERENCES public.dvobject(id); - Handle the difference for the existing tables (by comparing in pgAdmin). We reorder the sequence and solve errors until they can run successfully.
- Handle insert/update data for upgrade. We search for insert/update/delete clauses in the database migration scripts in the source code folder. Database migration scripts up to version 4.11 are in the folder scripts/database/upgrades. Database migration scripts after 4.12 are in the folder src/main/resources/db/migration. Execute them in ascending order.
- Comparing the database again to make sure the schema is the same. Upon this step, both databases should have the same schema.
- Finally, from the 6.1 database, copy the data from table flyway_schema_history and insert it back into the patched database.
- After starting the payara6, drop all indexes in Solr and reindex the Solr using dataverse API.
- Insert the license JSON files to the table license using API. The license JSON files are in the folder dvinstall/data/licenses. We set the default license to CC-BY-4.0 and patched the existing dataset license to CC0 1.0 in table termsofuseandaccess. For adding a new license via API, you can check here: https://guides.dataverse.org/en/latest/api/native-api.html?highlight=license#id246
- Update the permission of roles. Patch the column owned_id in table dataverserole to 1, who is the admin user. This will allow you to update the permission of the roles. Log in as an admin user in dataverse, and update the permission of each role as needed. For example, in our case, there is a new permission ManageFilePermission. Note that there is a bug in 6.1. Before we click "Save" in the user interface, change the column alias to something else (e.g., add a 2 after that). After updating the permissions of all roles, we restore the owner_id to null.
The bug is mentioned here: https://github.com/IQSS/dataverse/issues/8808 - Update the table setting. Since we use Crossref for DOI registration, we change the value of :DoiProvider from original value to FAKE, which works on 6.1. We added ":InstallationName" for the brand name. We added ":ShowMuteOptions", ":AlwaysMuted", ":NeverMuted" to disble some notifications. You may check if there are any new settings you like in the official guide: https://guides.dataverse.org/en/latest/installation/config.html#id290
- We also enabled external tools that were not available in 4.x, e.g., FilePreviewers. If you upgrade from 4.x, you may want to enable this as well.
https://guides.dataverse.org/en/latest/admin/external-tools.html
We are still working on final testing and going to deploy the upgraded version soon. Hope you find this sharing useful.
Best,
Franky