Multiple Opencast Admin Nodes and what is the "Ingest" Node type for?

395 views
Skip to first unread message

Sven Laudel

unread,
Aug 1, 2016, 2:56:19 AM8/1/16
to Opencast Users
Hi everyone,

would it be possible to have multiple admin nodes behind a load balancer/proxy solution for high availability?
I just installed a second admin node. But on that admin node i can't see the jobs which run on the other admin node. Am i doing something wrong?

And the other question is what is the new node type "ingest" for?

Best regards
Sven

Greg Logan

unread,
Aug 4, 2016, 11:42:24 PM8/4/16
to us...@opencast.org

Hi Sven,

In theory this is possible, but no one has successfully implemented this as far as I'm aware. The reason you aren't seeing the workflows from the other admin node is that the indexes aren't shared - they run locally by default.

The ingest node is used to spread the load caused by ingest across other nodes than the admin node. By default all of the ingests for the system run through the admin node, which isn't scalable as the system grows past a certain point.

G


--
You received this message because you are subscribed to the Google Groups "Opencast Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Hans Erasmus

unread,
Aug 5, 2016, 6:29:42 AM8/5/16
to us...@opencast.org
Sorry to hijack this thread, but on this subject, would it be possible to load balance multiple ingest nodes?  I think this is a "cleaner" method of implementing HA ingest service, than configuring multiple ingest nodes in your Capture Agent, in our case Galicaster. So if one load balance your ingest nodes, will the effect still be desirable?

Will putting the indexes on shared storage between 2 admin nodes not be the answer to Sven's question?  Or am I going way off script here?

Tobias Wunden

unread,
Aug 5, 2016, 9:42:16 AM8/5/16
to Opencast Matterhorn
Hi Hans,

> Sorry to hijack this thread, but on this subject, would it be possible to load balance multiple ingest nodes? I think this is a "cleaner" method of implementing HA ingest service, than configuring multiple ingest nodes in your Capture Agent, in our case Galicaster. So if one load balance your ingest nodes, will the effect still be desirable?

The reason for adding multiple ingest servers *usually* is that one wants to balance the data load on the network interface across multiple servers, maybe even in different data centers. Therefore, if you put a load balancer in front of those nodes, it means that *all* traffic will go through the load balancer. It therefore is desirable to use DNS round robin with the ingest nodes rather than an LB.

Regards,
Tobias

Hans Erasmus

unread,
Aug 5, 2016, 9:54:04 AM8/5/16
to Opencast Matterhorn
Hi Tobias

Thanks for the clarification.  I understood why we would load balance ingest servers, but some architecture we have used before (on other systems) is to have 2 x Haproxy, or 2 x Pound LB's, with a running Keepalived between them.  The setup is very easy, and then you point whatever needs to talk to the LB, in Opencast's case the capture agents to the floating IP assigned to Keepalived. This way, in my opinion, you keep the config on your CA's very simple and clean (which comes in handy when you need to change the ingest server setups, because you cannot always touch 100+ CA's) and if you need to replace your 1 x LB or even a few ingest nodes, you can do that without causing downtime.  So it is just another thought.  

Greg Logan

unread,
Aug 5, 2016, 10:08:58 AM8/5/16
to us...@opencast.org
Hi Hans,

Longer term, the goal is to build this into the CA API itself - the CA asks the admin node which ingest node it should use.  We aren't there yet, but we're working on it!

G

Stephen Marquard

unread,
Aug 5, 2016, 10:15:44 AM8/5/16
to us...@opencast.org

It’s already there in the service registry endpoint:

 

/services/available.json?serviceType=org.opencastproject.ingest

 

We have 2 ingest servers, and the ingest jobs are split across them (mostly Galicaster CAs). The CA just needs to know the Opencast admin URL to get the list of available ingest servers.

 

Cheers

Stephen

 

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen....@uct.ac.za
Phone: +27-21-650-5037 Cell: +27-83-500-5290

Disclaimer - University of Cape Town This e-mail is subject to UCT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. If this e-mail is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via cs...@uct.ac.za

Stuart Phillipson

unread,
Aug 8, 2016, 12:10:11 PM8/8/16
to us...@opencast.org
Just to add to this, we are up to four ingest servers. Originally we had a CAs ask admin which ingest node was least loaded and then use that one, which sounds intelligent, but had problems. On the hour hundreds of CAs would ask admin which ingest node was least loaded, as there were no ingest jobs in that second, admin would respond with the first node and then way too many CAs would ingest to that one.

This was solved by just getting admin to respond with a random node for each CA, pretty much the same outcome as RR DNS. Btw, don’t forget to make sure you admin node isn’t running ingest too!

 
Stuart Phillipson | Media Technologies Coordinator

J20 Sackville Street Building
University of Manchester
Manchester
M13 9PL
United Kingdom

e-mail: stuart.p...@manchester.ac.uk
Phone: 016130 60478
 

Disclaimer - University of Cape Town This e-mail is subject to UCT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. If this e-mail is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via cs...@uct.ac.za

Sven Laudel

unread,
Aug 9, 2016, 4:59:22 AM8/9/16
to Opencast Users
We are still playing with different setups.
At the moment we have one admin and one ingest server. Ingesting from my evaluation Galicaster CA still goes to the admin. Do i have to configure something in the Galicaster to have it ingesting to the ingest server?
After ingesting 3 recordings from our Galicaster quite at the same time, the ingest server didn't do anything, no job was scheduled on it.
 
But your setup sounds pretty good.
  1. How do i disable ingest on the admin server?
  2. What do i need to change sending a random ingest node to the Galicaster?
I need to extend my original questions to using multiple engage/presentation servers.
We are playing with two engage/presentation servers at the moment which are behind a loadbalancer. Normally i would think both presentation servers should see all recordings because both use the same database and the same shared filesystem. But when i connect to one of both i only can see a few recordings. On the other presentation server  i can see the other recordings.
Isn't it possible to use multiple presentation servers or do i have to configure something?

Best regards
Sven



Am Montag, 8. August 2016 18:10:11 UTC+2 schrieb Stuart Phillipson:
Just to add to this, we are up to four ingest servers. Originally we had a CAs ask admin which ingest node was least loaded and then use that one, which sounds intelligent, but had problems. On the hour hundreds of CAs would ask admin which ingest node was least loaded, as there were no ingest jobs in that second, admin would respond with the first node and then way too many CAs would ingest to that one.

This was solved by just getting admin to respond with a random node for each CA, pretty much the same outcome as RR DNS. Btw, don’t forget to make sure you admin node isn’t running ingest too!
Stuart Phillipson | Media Technologies Coordinator

J20 Sackville Street Building
University of Manchester
Manchester
M13 9PL
United Kingdom

e-mail: stuart.phillipson@manchester.ac.uk
Phone: 016130 60478
 

Tobias Wunden

unread,
Aug 9, 2016, 6:18:48 AM8/9/16
to us...@opencast.org
Hi Sven,

I need to extend my original questions to using multiple engage/presentation servers.
We are playing with two engage/presentation servers at the moment which are behind a loadbalancer. Normally i would think both presentation servers should see all recordings because both use the same database and the same shared filesystem. But when i connect to one of both i only can see a few recordings. On the other presentation server  i can see the other recordings.
Isn't it possible to use multiple presentation servers or do i have to configure something?

By default, each instance of Engage uses a Solr based search index, and when the workflow publishes to Engage, it picks a random Engage node and publishes to its search index, which will lead to the recording showing up on that one node.

There are multiple solutions to this: 

1) Setup a central Solr server and have the Engage nodes talk to it.
2) (Somewhat "cheap"): Modify the "Publish to Engage" operation to publish to all Engage nodes instead of to just one

Regards,
Tobias

Sven Laudel

unread,
Aug 9, 2016, 10:38:10 AM8/9/16
to Opencast Users
Hi Tobias,

thanks for your reply, i will give solr a try.
I found the documentation https://docs.opencast.org/r/2.2.x/admin/modules/searchindex/, which seems to me quite outdated.
There is a solr version 1.4.1 mentioned, but the latest one is 6.1. Is there any difference in using opencast with recent versions of solr or does the documentation apply to them too?
Just want to deploy solr in a docker container (as i'm doing with ActiveMQ) using the official image from docker hub, where 1.4.1 is not available.

Regards
Sven

Andrew Wilson

unread,
Aug 9, 2016, 11:01:34 AM8/9/16
to Opencast Users
Hi sven

So we have 4 ingest nodes and an admin node. Galicaster, at ingest, contacts This rest endpoint: https://admin.com/services/available.json?serviceType=org.opencastproject.ingest

a list of available ingest nodes is returned as JSON ordered by least loaded:
{"services":{"service":[{"type":"org.opencastproject.ingest","host":"http:\/\/ingest00.com","path":"\/ingest","active":true,"online":true,"maintenance":false,"jobproducer":true,"onlinefrom":"2016-06-10T11:22:15.700+01:00","service_state":"NORMAL","state_changed":"2016-04-22T14:56:52.420+01:00","error_state_trigger":-1504358931,"warning_state_trigger":-1504358931},{"type":"org.opencastproject.ingest","host":"http:\/\/ingest01.com","path":"\/ingest","active":true,"online":true,"maintenance":false,"jobproducer":true,"onlinefrom":"2016-01-07T12:15:13.483Z","service_state":"NORMAL","state_changed":"2016-05-05T11:23:09.340+01:00","error_state_trigger":-1504358931,"warning_state_trigger":-1504358931},{"type":"org.opencastproject.ingest","host":"http:\/\/ingest02.com","path":"\/ingest","active":true,"online":true,"maintenance":false,"jobproducer":true,"onlinefrom":"2016-06-10T11:22:03.223+01:00","service_state":"NORMAL","state_changed":"2016-03-18T22:21:35.313Z","error_state_trigger":-1504358931,"warning_state_trigger":-1504358931},{"type":"org.opencastproject.ingest","host":"http:\/\/ingest03.com","path":"\/ingest","active":true,"online":true,"maintenance":false,"jobproducer":true,"onlinefrom":"2016-01-07T12:15:19Z","service_state":"NORMAL","state_changed":"2016-05-19T08:25:41.670+01:00","error_state_trigger":-1504358931,"warning_state_trigger":-1504358931},{"type":"org.opencastproject.ingest","host":"http:\/\/admin.com","path":"\/ingest","active":true,"online":true,"maintenance":false,"jobproducer":true,"onlinefrom":"2016-06-23T10:14:00.947+01:00","service_state":"NORMAL","state_changed":"2014-08-14T13:14:44.427+01:00","error_state_trigger":-1504358931,"warning_state_trigger":-1504358931}]}}

In the Galicaster config.ini make sure you have specified 'multiple-ingest = True' under [ingest]
this will then take the least loaded node (i.e. the zeroeth item in the list ) and select that as the node to ingest to.

additionally i wrote some (bad) code to exclude admin and use a random ingest server as relying on least loaded didn't work out for huge numbers of ingests:

hope this helps


-andy

--

Greg Logan

unread,
Aug 10, 2016, 7:20:09 PM8/10/16
to us...@opencast.org
Hi Sven,

The reason 1.4.1 is mentioned is because newer versions don't work out of the box :)  The schema version drifted, and we haven't had the time or resources to address this.  There's an ansible playbook at https://bitbucket.org/opencast-community/configuration-management/src/f69858b2c88f9e40e5306686091fa792715cd622/roles/solr-build/tasks/main.yml?at=master&fileviewer=file-view-default which will set up an appropriate Solr server.  If you're not looking to use Ansible then it should (hopefully) read clearly enough that you can figure it out!

Long term Solr is hopefully going away entirely, but again that's a bunch of work that we have not yet gotten to!

HTH,
G

--

Sven Laudel

unread,
Aug 11, 2016, 2:39:23 AM8/11/16
to Opencast Users
Hi Greg,

thanks for mentioning this ansible playbook.
I'll adapt it to my own ansible playbooks.

Regards
Sven

James Perrin

unread,
Aug 18, 2016, 6:38:26 AM8/18/16
to us...@opencast.org
Hi,


On 09/08/16 09:59, 'Sven Laudel' via Opencast Users wrote:

I need to extend my original questions to using multiple engage/presentation servers.
We are playing with two engage/presentation servers at the moment which are behind a loadbalancer. Normally i would think both presentation servers should see all recordings because both use the same database and the same shared filesystem. But when i connect to one of both i only can see a few recordings. On the other presentation server  i can see the other recordings.
Isn't it possible to use multiple presentation servers or do i have to configure something?

Best regards
Sven


When considering multiple presentations nodes, you need to find out what is your bottle neck. Is it the presentation node itself, your filesystem or what is actually serving the video (which might be the presentation node).

We have a NFS filsystem shared between all nodes and is used for everything. We also serve the video files from 4 dedicated apache servers which are load balanced. This means that the presentation node is only handling search requests. Video and even thumbnails are all provided by the apache servers.

Regards
James
-- 
------------------------------------------------------------------------
 James S. Perrin

 Media Technologies Team
 J20, Sackville Building
 The University of Manchester
 Oxford Road, Manchester, M13 9PL

 t: +44 (0) 161 275 6945
 e: james....@manchester.ac.uk
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------

Hans Erasmus

unread,
Aug 18, 2016, 7:22:51 AM8/18/16
to us...@opencast.org
Can ditto what James is saying.  Our case one presentation node, only handling request of URL, with 2 x nginx servers dishing the actual content. Works well so far.

On Thu, 18 Aug 2016 at 12:38 James Perrin <james.s...@manchester.ac.uk> wrote:
Hi,


On 09/08/16 09:59, 'Sven Laudel' via Opencast Users wrote:

I need to extend my original questions to using multiple engage/presentation servers.
We are playing with two engage/presentation servers at the moment which are behind a loadbalancer. Normally i would think both presentation servers should see all recordings because both use the same database and the same shared filesystem. But when i connect to one of both i only can see a few recordings. On the other presentation server  i can see the other recordings.
Isn't it possible to use multiple presentation servers or do i have to configure something?

Best regards
Sven


When considering multiple presentations nodes, you need to find out what is your bottle neck. Is it the presentation node itself, your filesystem or what is actually serving the video (which might be the presentation node).

We have a NFS filsystem shared between all nodes and is used for everything. We also serve the video files from 4 dedicated apache servers which are load balanced. This means that the presentation node is only handling search requests. Video and even thumbnails are all provided by the apache servers.

Regards
James
Am Montag, 8. August 2016 18:10:11 UTC+2 schrieb Stuart Phillipson:
Just to add to this, we are up to four ingest servers. Originally we had a CAs ask admin which ingest node was least loaded and then use that one, which sounds intelligent, but had problems. On the hour hundreds of CAs would ask admin which ingest node was least loaded, as there were no ingest jobs in that second, admin would respond with the first node and then way too many CAs would ingest to that one.

This was solved by just getting admin to respond with a random node for each CA, pretty much the same outcome as RR DNS. Btw, don’t forget to make sure you admin node isn’t running ingest too!
Stuart Phillipson | Media Technologies Coordinator

J20 Sackville Street Building
University of Manchester
Manchester
M13 9PL
United Kingdom

e-mail: stuart.p...@manchester.ac.uk
Phone: 016130 60478
 

Sven Laudel

unread,
Aug 18, 2016, 7:33:30 AM8/18/16
to Opencast Users
Hi,

thanks for your reply.
As I'm new to opencast and it's architecture I didn't know, that your described configuration is possible. 
Is there any documentation on how to configure opencast this way?

Regards
Sven

James Perrin

unread,
Aug 18, 2016, 8:57:22 AM8/18/16
to us...@opencast.org

Hi,

In opencast/etc/custom.properties look at the org.opencastproject.download.directory and org.opencastproject.download.url properties.

Regards
James
Reply all
Reply to author
Forward
0 new messages