HTTP/1.1 401 Nonce has expired/timed out

Anders Myren

unread,

Sep 1, 2014, 10:48:16 AM9/1/14

to matterho...@opencast.org

Hi guys,

A couple weeks ago we had some hiccups in our system causing some jobs
to fail and others to get stuck in proccesing without matterhorn really
doing anything. After looking into the logs I find several oddities, but
I'm not sure if they are related.

I'm hoping anyone can give any pointers.

First, there is some log entries which might be related to MH-10360:
"2014-09-01 14:06:47 ERROR (WorkflowServiceImpl:1175) - Update of
workflow job 350353 in the service registry failed, service registry and
workflow index may be out of sync"

Then these started showing up:
"opencast.log:2014-08-29 12:10:07 WARN (ServiceRegistryJpaImpl:1958) -
Service org.opencastproject.workflow@http://admin.dig.uib.no:8080 failed
(401) accepting Job {id:345436, version:90}"

"opencast.log:2014-08-29 12:49:57 WARN (RemoteBase:193) - Service at
http://video.dig.uib.no:80/distribution/download returned unexpected
response code 401
"

And now the logs are spammed with these:
"opencast.log:2014-09-01 00:58:00 WARN (ServiceRegistryJpaImpl:1958) -
Service org.opencastproject.textanalyzer@http://admin.dig.uib.no:8080
failed (401) accepting Job {id:374836, version:4}
opencast.log-2014-09-01 00:59:11 INFO (TrustedHttpClientImpl:382) -
Sleeping 399164ms before trying request
http://admin.dig.uib.no:8080/analysis/text/dispatch again due to a
HTTP/1.1 401 Nonce has expired/timed out
"

Running 1.4.4 on RHEL, one admin/worker and one engage server.
Appreciate any help.

--
Beste helsing / Best regards
Anders Myren
Seksjon for applikasjon
IT-avdelingen UiB
E-post: Anders...@adm.uib.no
Telefon: 55 58 46 70

Christian Greweling

unread,

Sep 2, 2014, 2:35:13 AM9/2/14

to matterho...@opencast.org

Hi Anders,

how does your statistic page in the admin ui look like?

Is one of the services in sanatize mode?

Christian

Anders Myren

unread,

Sep 4, 2014, 7:41:07 AM9/4/14

to matterho...@opencast.org

Hi Christian

On 2014-09-02 08:35, Christian Greweling wrote:
> Hi Anders,
>
> how does your statistic page in the admin ui look like?
> Is one of the services in sanatize mode?
>

Nope, is has happened earlier but I've "sanitized" it.

Tried removing the workflow index to prompt a rebuild, which took a
couple of hours to start but finally fixed the "out of sync" message.

But the info message:

(TrustedHttpClientImpl:382) - Sleeping 118573ms before trying request
http://admin.dig.uib.no:8080/workflow/dispatch again due to a HTTP/1.1

401 Nonce has expired/timed out

is still spamming the log, and it seems like everything MH is trying to
do takes ages. Once a operation has been initialized, i.e. a ffmpeg job,
the operation is completed in the normal speed.

It takes forever to do simple task like:
Processing : Tagging metadata catalogs for archival and publication
Processing : Tagging access control lists for archival

I have checked the system.properties file to the one in the 1.4.4-tag,
no diffs.

Got this yesterday:
2014-09-03 18:06:57 ERROR (AbstractFaultChainInitiatorObserver:101) -
Error occurred during error handling
, give up!
org.apache.cxf.interceptor.Fault:
Internal Exception: org.postgresql.util.PSQLException: ERROR: column
t1.role does not exist
Position: 17
Error Code: 0
Call: SELECT DISTINCT t1.role FROM mh_user t0, mh_role t1 WHERE
((t1.organization = t0.organization) AND (
t1.username = t0.username))
Query: ReportQuery(name="roles" referenceClass=JpaUser sql="SELECT
DISTINCT t1.role FROM mh_user t0, mh_ro
le t1 WHERE ((t1.organization = t0.organization) AND (t1.username =
t0.username))")

Could it be a DB-issue?

Anders

> E-post: Anders...@adm.uib.no <javascript:>

> Telefon: 55 58 46 70
>

> --
> You received this message because you are subscribed to the Google
> Groups "Matterhorn Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to matterhorn-use...@opencast.org
> <mailto:matterhorn-use...@opencast.org>.
> To post to this group, send email to matterho...@opencast.org
> <mailto:matterho...@opencast.org>.
> Visit this group at
> http://groups.google.com/a/opencast.org/group/matterhorn-users/.

Christian Greweling

unread,

Sep 4, 2014, 9:53:04 AM9/4/14

to matterho...@opencast.org

Hi,

are there some jobs running or Queued even if there are no jobs in processing/Ingesting?

You can see that in the stastic page. Matterhorn has a maximum number of parallel jobs for earch server configured in the database.

So if you have there some dead jobs in a service, Matterhorn is maybe only able to do one job on that service.

That would be a reason for your slow working MH.

Try to identify that "Sleeping 118573ms" and try to stop/remove it with the REST-End-Points.

We had similar problems some time a go after a Network/Database Crash.

Our only way to solve that was to delete this jobs directly in the database.

Christian

> <mailto:matterhorn-users+unsub...@opencast.org>.

Karen Dolan

unread,

Sep 10, 2014, 8:12:24 AM9/10/14

to matterho...@opencast.org

Anders,

If you haven't already seen this, there is a /docs/upgrade/1.4.0_to_1.4.1/postgres91.sql that adds the role column to mh_user.

I notice that the MH 1.4.4 modules/matterhorn-db/pom.xml references the postgres 8.4, but there is a released postgres sql driver 9.1-903.jdbc4. I haven't been following the postgres discussions, so there may be a reason not to upgrade the reference there. But if there isn't, it might help to upgrade that reference.

<groupId>postgresql</groupId>

<artifactId>postgresql</artifactId>

</dependency>

Best of luck,

Karen

Anders Myren

unread,

Sep 12, 2014, 10:00:10 AM9/12/14

to matterho...@opencast.org

Karen,

Sorry for taking so long to answer, it's been one of those weeks.

On 10. sep. 2014 13:34, Karen Dolan wrote:
> Anders,
>
> If you haven't already seen this, there is
> a /docs/upgrade/1.4.0_to_1.4.1/postgres91.sql that adds the role column
> to mh_user.
>

Thank's for the tip Karen. Taking the time to look into it, i can't seem
to see where that adds the role column. It does add the role(along with
other things) in docs/upgrade/1.3_to_1.4/postgres84.sql, but since our
server was installed as 1.4, we haven't used this.

But after living with issues a week, I'm not sure if this is the related
to the Nonce timeout. But after reading your kickstarting jobs
trick(https://opencast.jira.com/wiki/display/MHDOC/Job+Dispatching) we,
unintentionally, set all our jobs in status running, causing all the
jobs to run again(!) and fail. But instead of restoring the db from
backup, we let the jobs run. Some older jobs suddenly appeared, which
might have been clogging the system. After a cleanup by removing these
things look normal. For Now.

BTW: Are the any harmful effects of reingesting a failed recording?
These jobs seem to belong to recordings that might have been ingested
several times.

Have a nice weekend!
Anders

> I notice that the MH 1.4.4 modules/matterhorn-db/pom.xml references the
> postgres 8.4, but there is a released postgres sql driver 9.1-903.jdbc4.
> I haven't been following the postgres discussions, so there may be a
> reason not to upgrade the reference there. But if there isn't, it might
> help to upgrade that reference.
>
> <dependency>
> <groupId>postgresql</groupId>
> <artifactId>postgresql</artifactId>
> <version>8.4-701.jdbc4</version>
> </dependency>
>
> Best of luck,
> Karen
>
> On Sep 4, 2014, at 7:41 AM, Anders Myren <anders...@adm.uib.no

> <mailto:anders...@adm.uib.no>> wrote:
>
>> Got this yesterday:
>> 2014-09-03 18:06:57 ERROR (AbstractFaultChainInitiatorObserver:101) -
>> Error occurred during error handling
>> , give up!
>> org.apache.cxf.interceptor.Fault:
>> Internal Exception: org.postgresql.util.PSQLException: ERROR: column
>> t1.role does not exist
>> Position: 17
>> Error Code: 0
>> Call: SELECT DISTINCT t1.role FROM mh_user t0, mh_role t1 WHERE
>> ((t1.organization = t0.organization) AND (
>> t1.username = t0.username))
>> Query: ReportQuery(name="roles" referenceClass=JpaUser sql="SELECT
>> DISTINCT t1.role FROM mh_user t0, mh_ro
>> le t1 WHERE ((t1.organization = t0.organization) AND (t1.username =
>> t0.username))")
>>
>> Could it be a DB-issue?

Rubén Pérez

unread,

Sep 12, 2014, 10:59:34 AM9/12/14

to matterho...@opencast.org

Hi,

As per you "BTW" question: there are no known side effects that I know of. Historically, you could get failed recordings because some of the generated images' names conflicted with existing filenames (if the previous workflow had failed after the image extraction), but this has been fixed for a long while. The other "side" effects of running a workflow with an existing mediapackage (that I can think of) will only manifest when the previous workflow has finished correctly, but not failed; namely:

You will get another version of the mediapackage in the archive, with all what that implies (potentially more space consumed, inability to delete specific versions of the mediapackage on the archive).
Your published media will be overwritten by the newer versions.

Well, and the obvious effect that, if the previous workflow was failed and never reingested, then that mediapackage was never published, but if this newer workflow succeeds, then you will have one more mediapackage in the search index :D

Hope it helps (I am well aware that my explanations can easily get messy --regardless of the language in which I give them; sorry about it).

Best regards

--
Rubén Pérez Vázquez

Universität zu Köln
Regionales Rechenzentrum (RRZK)
Weyertal 121, Raum 4.07
D-50931 Köln
✆: +49-221-470-89603

Karen Dolan

unread,

Sep 12, 2014, 12:26:37 PM9/12/14

to matterho...@opencast.org

Anders,

My apologies for that. I must have had too many files open at once!

Thank's for the tip Karen. Taking the time to look into it, i can't seem to see where that adds the role column. It does add the role(along with other things) in docs/upgrade/1.3_to_1.4/postgres84.sql, but since our server was installed as 1.4, we haven't used this.

-Karen

Reply all

Reply to author

Forward