Database migration dry run gone wrong

182 views
Skip to first unread message

Jonathan Bandelier

unread,
Jan 17, 2022, 5:59:52 PM1/17/22
to go-cd
I recently ran gocd-database-migrator (flags -insert and --progress) on my. 20.4.0 GoCD server with its current H2 database as source and a blank PostgreSQL instance as target.

My understanding was that running this tool would only write to the target database with no impact on the source. As this was meant to be a dry run just to make sure the user on target  PostgreSQL instance had the proper privileges, I did not stop or back up GoCD before running the tool. The tool ran without  error, so I instructed the DBA to delete the target database, planning to to the real migration later, when all is ready to actually upgrade to 20.5.0.

I then found plenty database-related errors showing up in the GoCD console; I restarted the go-server service and it got worse: GoCD is no longer able to start

go-server-wrapper.log appears to indicate that the update is in progress:
 
INFO   | jvm 1    | 2022/01/17 22:18:33 | ************************************************************************
INFO   | jvm 1    | 2022/01/17 22:18:33 | WARNING: Shutting down your server at this point will lead to a database corruption. Please wait until the database upgrade completes.
INFO   | jvm 1    | 2022/01/17 22:18:33 | ************************************************************************
INFO   | jvm 1    | 2022/01/17 22:18:33 | Reading change scripts from directory /var/lib/go-server/db/h2deltas...
INFO   | jvm 1    | 2022/01/17 22:18:33 | dbdeploy v2.11
DEBUG  | wrapperp | 2022/01/17 22:18:34 | Send a packet PING : ping 00000152
INFO   | jvm 1    | 2022/01/17 22:18:34 | WrapperManager Debug: Received a packet PING : ping 00000152
INFO   | jvm 1    | 2022/01/17 22:18:34 | WrapperManager Debug: Send a packet PING : ping 00000152
DEBUG  | wrapperp | 2022/01/17 22:18:34 | read a packet PING : ping 00000152
INFO   | jvm 1    | 2022/01/17 22:18:35 | Unable to create database upgrade script for database

Is there any way for me to get out of this mess, and bring back my H2 database to the state it was before running the tool? Would emptying the h2deltas folder do the trick?

Thanks in advance for your help


    

Aravind SV

unread,
Jan 18, 2022, 9:45:39 AM1/18/22
to Jonathan Bandelier, go-cd

Hello Jonathan,

Given that the message seems to be coming from around here, there could be more information in the logs. It’s hard to say what the problem is without more information.

My understanding was that running this tool would only write to the target database with no impact on the source.

That is a reasonable expectation, I feel. However, one of the reasons the first steps in the documentation mentions backing up and stopping the server is to make recovery easier. I don’t know what happens if you run this against a running GoCD server.

I wonder if permissions got changed, or something like that. Won’t know without more information from the logs.

Cheers,
Aravind

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/785b0a66-c1b7-43b6-b18f-279ff7629bb7n%40googlegroups.com.

Jonathan Bandelier

unread,
Jan 19, 2022, 4:54:36 AM1/19/22
to go-cd
Hello,

I really appreciate you looking at this issue. It is pretty clear to me that you pinpointed the correct code extract around here, but with my limited knowledge of Java, I don't know what to make of it.

I posted 210 lines of go-server-wrapper.log on PasteBin; I hope this can help.

Please let me know if you need other details

Jonathan Bandelier

unread,
Jan 19, 2022, 5:13:56 AM1/19/22
to go-cd
I just realized that the cruise.trace.db  file was actually a human-readable log; you might find also useful info in there.

Aravind SV

unread,
Jan 19, 2022, 5:40:47 AM1/19/22
to Jonathan Bandelier, go-cd

Hello Jonathan,

but with my limited knowledge of Java, I don’t know what to make of it.

I was looking for the rest of the error message, which you showed in the pastebin. Thanks. Unfortunately, the stack-trace isn’t too helpful. I tried running the migration tool on a running GoCD server and it was fine – nothing happened.

I would try with DB and config in a fresh GoCD server:

  1. Get a fresh copy of the GoCD server (maybe the zip) of your version from https://gocd.github.io/gocd-plugin-info/gocd-releases
  2. Unzip it. Copy over these files from the old / broken server instance to the new one:
    • ./db/h2db/cruise.h2.db
    • ./config/cruise-config.xml
    • ./config/cipher.aes
  3. Start the new server. If it starts successfully, then maybe there is an issue with something like the h2deltas folder. This shouldn’t be the case unless you updated your GoCD server to 20.5.0.
  4. If it doesn’t start, you might need to recreate that DB using [these instructions](https://gist.github.com/arvindsv/c7934bee26ce1c47e9d24f0a493d3c87) and try steps 1 to 3 again to see if it helps. If it does, it might mean that your DB has somehow got corrupted. I’d say it’s very unlikely to be anything to do with the migrator – it seemed to be harmless when I tried it. But, I cannot say for sure.

If none of these work … I’m out of ideas. :(

Regards,
Aravind

PS: Please try all these on a copy in a separate directory. Not on the broken server.

Aravind SV

unread,
Jan 19, 2022, 5:45:16 AM1/19/22
to Jonathan Bandelier, go-cd

Thank you. The trace file was helpful. Not sure, but you might need to seriously consider step 4 in my previous response. The DB file seems to be corrupted.

Did you forcibly quit the server by running something like kill -9? I don’t see this happening via the migrator, but I can’t be sure.

Cheers,
Aravind

Jonathan Bandelier

unread,
Jan 19, 2022, 6:45:58 AM1/19/22
to go-cd
Thank you for this action plan.

I only used  sudo systemctl restart go-server (it's on a RedHat 7) to restart GoCD; I am not sure how graceful it is when stopping the GoCD process.

We have a very old backup of our H2 db (about 2 years old). May I please ask you to tell the pros and cons of restoring this db vs using a fresh one, as you suggested? I understand most of the configuration lives in the  cruise-config.xml files and the GIT config repositories (we use the gocd-yaml-config-plugin) , so I am not 100% clear what type of information would be lost, if using a fresh database.

Thanks again,

Jonathan

Chad Wilson

unread,
Jan 19, 2022, 8:19:12 AM1/19/22
to go...@googlegroups.com
I think it's safe to say that your H2 DB file is corrupted since the NullPointerException is within internal H2 code while trying to just open the database file, not anything GoCD is specifically trying to do.

Generic error
General error: "java.lang.NullPointerException" [50000-168]

^^ H2 DB code version 1.3.168 doesn't like the database structure it finds.

Root problem:

INFO | jvm 1 | 2022/01/17 22:18:35 | Caused by: java.lang.NullPointerException
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.Data.getValueLen(Data.java:880)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.Data.getValueLen(Data.java:866)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeIndex.getRowSize(PageBtreeIndex.java:396)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeLeaf.addRow(PageBtreeLeaf.java:106)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeLeaf.addRowTry(PageBtreeLeaf.java:100)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeNode.addRowTry(PageBtreeNode.java:200)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeNode.addRowTry(PageBtreeNode.java:200)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeIndex.addRow(PageBtreeIndex.java:105)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.index.PageBtreeIndex.add(PageBtreeIndex.java:96)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.table.RegularTable.addRow(RegularTable.java:121)
INFO | jvm 1 | 2022/01/17 22:18:35 | ... 125 more

Which happens during

INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.table.RegularTable.addRow(RegularTable.java:139)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.PageStore.redo(PageStore.java:1531)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.PageLog.recover(PageLog.java:319)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.PageStore.recover(PageStore.java:1365)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.PageStore.openExisting(PageStore.java:361)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.store.PageStore.open(PageStore.java:285)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Database.getPageStore(Database.java:2118)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Database.open(Database.java:577)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Database.openDatabase(Database.java:222)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Database.<init>(Database.java:217)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Engine.openSession(Engine.java:56)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Engine.openSession(Engine.java:159)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Engine.createSessionAndValidate(Engine.java:138)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Engine.createSession(Engine.java:121)
INFO | jvm 1 | 2022/01/17 22:18:35 | at org.h2.engine.Engine.createSession(Engine.java:28)

So it cannot even open the database file and make sense of the file structure. The question is whether it can be recovered. Just to be clear, you don't have either manual pre-migration backups, GoCD automated backups (via /go/admin/backup) or perhaps file-system level backups of the disk (/var/lib/go-server/db) on which the H2 store was saved?

Were there errors from the migration tool during the migration? Do you have the logs for that? Hopefully not.

An Idea

Anyway, the version of H2 that is with GoCD 20.4.0 is 1.3.168. The version that the DB migrator tool uses is 1.4.200.

My theory is that the migrator tool touched your source database as part of ensuring it was in a state ready for migration (e.g applying any missing DB migrations). It has likely done so using the new version of H2 in some way which the old version does not like. It's also possible that concurrent writes to the file from two different H2 versions completely corrupted it.

If there were no errors on the migration tool itself when you ran it, you might have a valid 1.4.200 DB file that cannot be read by the version in GoCD server 20.4.0 (which is 1.3.168). If that theory is true, the key question will be can a 1.3.x jar recover or understand a database that has been migrated to 1.4.x? I suspect not, hence your errors at startup.

On the assumption/hope that the file is in a valid state for 1.4.200 you can try something similar to Aravind's instructions to get back to a state 20.4.0 can understand.

To modify for your situation
  1. Backup the "corrupt" file as is :-)
  2. Confirm the theory - is the DB valid from the perspective of H2 "1.4.200"? If you try to export it to SQL with 1.4.200 jar as per Aravind's instructions, that should help you know.
  3. If that export to script doesn't work, you might need to try the recover tool. (org.h2.tools.Recover)
  4. If export, or recover+export works, you now have a whole lot of SQL. You can then import to a new database using the 1.3.168 jar. i.e adapt the instructions to

    $ ls
    backup.sql  cruise.mv.db  h2-1.4.200.jar h2-1.3.168.jar
    
    # Import from backup.sql into a new DB file: new.cruise.mv.db
    $ java -cp h2-1.3.168.jar org.h2.tools.RunScript -url jdbc:h2:./new.cruise -user sa -script backup.sql
  5. Try that newly created database with GoCD 21.4.0.
Personally, given the different versions of H2 DB with the migrator tool and with GoCD 20.4.0 I think that will maximise your chances of getting a working DB.

Using an old DB or fresh db

To clarify, Aravind's instructions create a fresh DB file, by trying an export-to-sql and import-to-new-file, so would be a "fresh" DB, but not an empty DB, or schema-only DB :-)

I don't really have an opinion on whether using a 2 year old DB is useful for you. Depends which GoCD it was and whether there is still an upgrade path on the DB - but off top of my head there should be. You will be missing records of pipeline/stage/job runs, VSMs, links to artifacts and materials, plugin configuration, agent approvals, user preferences - all that type of stuff. The stuff that would have been reported in the migration dry run which should have a summary in the log. You can see here what's in the DB (ignores a single later change to drop usage reporting data in 20.6.0)

-Chad

Alain Trinh

unread,
Jan 24, 2022, 10:59:29 AM1/24/22
to go-cd
Hello Chad, hello Aravind,

Thank you for the feedbacks. I am Jonathan's colleague. Just to provide some background, we have 3 environments :
- Test : with H2 DB (and yes, no regular backup)
- QA: with PostgreSQL (with nightly backups, "QA" means "QA" for GoCD itself)
- Production: with PostgreSQL (with nightly backups)

I don't remember we experienced any event of corrupted DB, for the past 5 years.

We plan to update our version in all these environments, which involves the database migration process (H2 to new PostgreSQL and old PostgreSQL to new PostgeSQL). So far, as you can guess, we were practicing on the Test environment with H2 DB. Actually in our Test environment we have no concern about loss of data between attempts, while ensuring indeed that we get the same information before and after the migration within an attempt.

We managed to get the GoCD Server up and running with the old DB, and we will proceed again with caution, without trying to stop in midstream.

It is great we can count on your help, pretty like before. Thanks again.

Best regards,

Alain
Reply all
Reply to author
Forward
0 new messages