Database Migration and Backup

63 views
Skip to first unread message

Stephan Krinetzki

unread,
Sep 26, 2022, 6:02:28 AM9/26/22
to Opencast Users
Hello all,

Currently we are running Opencast 10 (very soon 11 and also 12) with MySQL 5.7. Since MySQL 5.7 will reach EOL at the end of 2023, we now need to migrate the almost 32 GB database. Our first idea was to migrate the database to MySQL 8.0, but the documentation does not seem to recommend this:

Under Opencast 11:

The EclipseLink JPA implementation which is used in Opencast supports several different databases, although some databases might require additional drivers. Official support only exists for MariaDB, MySQL, PostgreSQL and H2. Other database engines are not tested and specific issues will likely not be addressed.

  • MariaDB is the recommended database engine. It is used by most adopters and is well tested.
  • MySQL is supported but tested less than MariaDB.
  • PostgreSQL support is experimental.
  • H2 is not suitable for anything but testing and development. It cannot be used in distributed environments.
(https://docs.opencast.org/r/11.x/admin/#configuration/database/)

Under Opencast 12

The EclipseLink JPA implementation which is used in Opencast supports several different databases, although some databases might require additional drivers. Official support only exists for MariaDB, MySQL, PostgreSQL and H2. Other database engines are not tested and specific issues will likely not be addressed.
  • MariaDB is the recommended database engine. It is used by most adopters and is well tested.
  • PostgreSQL support is experimental.
  • H2 is not suitable for anything but testing and development. It cannot be used in distributed environments.
(https://docs.opencast.org/r/12.x/admin/#configuration/database/)

So our plan is to migrate either to PostgreSQL, which is probably still unsuitable, or to MariaDB, where we don't have DBAs for though. How do you all do it and does it make sense to migrate?

And while we are on the subject of databases: How does it actually look like with a database recovery? What does Opencast need from the backup to work again?

Greetings

Stephan

Greg Logan

unread,
Oct 6, 2022, 11:09:19 PM10/6/22
to us...@opencast.org
Hi Stephan,

While I can't speak to Postgre vs Maria, both should work.  The major issue with Postgres (back in the day) was keeping the DDLs up to date since we didn't have any active adopters.  Modern Opencast (Eclipselink, really) can do all of the things that we needed the DDLs for, so that's why we got rid of the files and added experimental support for Postgres back.  We are also not doing anything crazy with the database so I would expect Postgres to work correctly - it's just untested.

Regarding recording from backup, I'm not sure I understand the question.  You're wondering which table(s) can be safely truncated without losing the ability to (un|re)publish existing recordings?

G

--
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Stephan Krinetzki

unread,
Oct 17, 2022, 4:54:32 AM10/17/22
to Opencast Users, Greg Logan
Hi Greg,

sorry for the late reply, some things got in the way.

I wanted to know about the backup itself. Now if I backup my database daily and restore in case of disaster - what will I lose? Or is it all "automatically" restored from the Elasticsearch Node when I import an old dump?

Of course I'm also interested in deleting old data - because our database is currently 32GB big, saving data would be great :)

Greetings

Stephan

Greg Logan

unread,
Oct 25, 2022, 11:09:36 PM10/25/22
to Stephan Krinetzki, Opencast Users
Ok, database dive time.  Opencast treats the database as the *single source of truth* for the system.  You know how you sometimes have to reindex during a major upgrade?  That's exactly the same operation as completely replacing your existing Elasticsearch cluster.  A backup of ES is only useful for speed-of-recovery in a disaster scenario because it lets you skip reindexing in the new system.  Losing your database backup is a total loss, losing your ES backup is annoying and time consuming.

Let's talk about our important tables:

The ACL table (oc_acl_managed_acl): This is the ACLs you have available to your users in the UI.  I imagine losing this would be annoying because you would need to recreate them but I believe that the ACLs themselves are burned into the mediapackages and publications so the loss of this table doesn't immediately blow open your access rules.  I have not actually checked this though...
The asset manager tables (oc_assets_*): Losing bits of the asset manager is like losing bits of your filesystem.  You might get lucky and be fine.  You might not.  You don't want to find out the hard way :)
The jobs tables (oc_job_*): This is where the individual workflow operation handler records live.  Current records are most important, the older they get the less important they are (probably).   You can truncate the jobs tables to speed job dispatching up in a very large system, but that means you lose some historical job data in the UI.  This is usually done on year+ old recordings where you no longer care how the workflows processed.
The schedule tables (oc_scheduled_*): Depending on how you've done your scheduling this might not be deadly to lose.  If this information is entirely build from automated import then it's easy to replace.  If you're hand scheduling things though...
The search table (oc_search): This is the list of published recordings.  Don't lose this.
The workflow tables (oc_workflow_*): This is the list of recordings you see in the admin UI.  Will your recordings stay up if this data is gone?  Unclear, but possible.  You probably don't want to lose this though.

The rest of the tables are obviously still important, but are usually things which are either automated (service and host registrations), or mostly empty (user and group tables).

Your number one way to drop database usage is to remove the user tracking table, and truncate the jobs.  Both are large and grow without bounds.

Also, you'll be pleased to know that Postgres is no longer experimental for OC 13[1].  Realistically this applies to OC 12 as well, although it's obviously not formal.

G


Reply all
Reply to author
Forward
0 new messages