New metadata field

97 views
Skip to first unread message

ArneWilken

unread,
Nov 7, 2023, 8:38:49 AM11/7/23
to Opencast Users
Hi all,

We are thinking about adding a new field to the event metadata catalog sometime next year. It should reflect the moment in time when the event was ingested into Opencast. `dateSubmitted` seems like a good candidate for that. Does anyone have thoughts or concerns on this idea?

Best wishes,
Arne Wilken

Matthias Neugebauer

unread,
Nov 7, 2023, 9:26:05 AM11/7/23
to us...@opencast.org
Hi Arne,

Can give a definition of `dateSubmitted` and the existing `created` and `startDate`? For me `created` always sounded like the date and time a video was uploaded (I know that there may be other interpretations, but these are probably ok displaying in the UI).

– Matthias


educast.nrw / ZHLdigital (eLectures)
ERCIS - European Research Center for Information Systems

University of Münster
Leonardo-Campus 3 - Room 327
48149 Münster
Germany

Tel: +49 251 83-38268
Mail: matthias....@uni-muenster.de
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

Karen Dolan

unread,
Nov 7, 2023, 9:59:46 AM11/7/23
to us...@opencast.org
Hi Arne,

Here are my 3 sequential thoughts: Using DublinCore industry standard metadata is a good way to promote metadata sharing and understanding of terms with developers working on Opencast, on Opencast plugins, and system integration.  It’s problematic to link event metadata with the appearance of Opencast processing. But, the spirit of the DublinCore “dateSubmitted” attribute field is that the event material and submitting it into another context.

For example, from the DublinCore “dateSubmitted” attribute definition: an article is “submitted" to a publication. And later, the publication may or may not be published. A thesis is submitted to a University. And later, the thesis may or may no be accepted. In this spirit, submitting the recorded event material to Opencast is alike.

If it’s problematic to use the DublinCore dateSubmitted, consider using the OC namespace, to keep OC specific attributes separate:

If it makes sense to include Opencast in the life history of the event recording, used DublinCore standard metadata. Consider adding a second “issue” in addition to “dateSubmitted":
(1) event time - Keep date and start-time of the event that was recorded: dcTerm created
(2) submitted time - Add date and time that the recording was submitted to OC for processing:
(3) publish time - Add date and time that the processed OC mediapackage was published:

Regards,
Karen


Karen Dolan

unread,
Nov 7, 2023, 10:12:28 AM11/7/23
to Opencast Users
Hi Arne,

Wow, that was way too wordy. In short, my thought is that if you go through the work of adding the DublinCore industry attribute “dateSubmitted", it makes sense to also add the DublinCore industry standard “issued” date attribute, for consistency. But if you go with adding dateSubmitted to the OC namespace in the OC DublinCore service module, it doesn’t matter. It’s just another OC processing metadata.

- Karen

ArneWilken

unread,
Nov 8, 2023, 4:18:25 AM11/8/23
to Opencast Users, kdo...@g.harvard.edu
Thanks for the quick replies.

@Matthias: We want to add a new field precisely to avoid the discussion on what `created` and `startDate` should mean, which seems to not be straightforward: https://github.com/opencast/opencast/issues/3473. The new field should record the date and time of the ingest of the event, and not be able to be changed by users later on.

@Karen: We are not dead set on adding `dateSubmitted` per se. Our main goal is having a metadata field that represents the thing we want, namely the date and time of the ingest of an event. We merely thought `dateSubmitted` might be a good fit for that. What do you think?

I also wasn't aware about the OC namespace thing. Does that allow us to define our own metadata fields in the dublincore.xml? If so, ditching `dc_dateSubmitted` and adding `oc_ingestDate` instead might be a good solution? But the constants you linked to in `[...]/DublinCores.java` are not used anywhere?

- Arne

Lukas Kalbertodt

unread,
Nov 8, 2023, 4:34:43 AM11/8/23
to us...@opencast.org
I'm very much in favor of adding metadata fields that contain useful
technical information and that cannot be changed by users. And ideally,
those metadata fields should be always present and always represent a
value one can rely on.

I didn't know about the OC namespace either, but it sounds like a good
idea using that more. Too often did discussions about adding a new
metadata field end with scrolling through the dcterms docs, not finding
a good match, and then giving up on the whole idea. While yes, if there
is a close fit, we should use standard metadata terms, this shouldn't
prevent us from adding other useful fields.
>> https://github.com/opencast/opencast/blob/develop/modules/dublincore/src/main/java/org/opencastproject/metadata/dublincore/DublinCores.java#L77-L82 <https://github.com/opencast/opencast/blob/develop/modules/dublincore/src/main/java/org/opencastproject/metadata/dublincore/DublinCores.java#L77-L82>
>> https://github.com/opencast/opencast/blob/develop/modules/dublincore/src/test/java/org/opencastproject/metadata/dublincore/DublinCoreTest.java#L43 <https://github.com/opencast/opencast/blob/develop/modules/dublincore/src/test/java/org/opencastproject/metadata/dublincore/DublinCoreTest.java#L43>
>>
>> If it makes sense to include Opencast in the life history of the
>> event recording, used DublinCore standard metadata. Consider
>> adding a second “issue” in addition to “dateSubmitted":
>> (1) event time - Keep date and start-time of the event that was
>> recorded: dcTerm created
>> https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/created <https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/created>
>> (2) submitted time - Add date and time that the recording was
>> submitted to OC for processing:
>> https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/dateSubmitted/ <https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/dateSubmitted/>
>> (3) publish time - Add date and time that the processed OC
>> mediapackage was published:
>> https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/issued/ <https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/issued/>
>>
>> Regards,
>> Karen
>>
>>
>>> On Nov 7, 2023, at 8:38 AM, ArneWilken <wil...@elan-ev.de> wrote:
>>>
>>> Hi all,
>>>
>>> We are thinking about adding a new field to the event metadata
>>> catalog sometime next year. It should reflect the moment in time
>>> when the event was ingested into Opencast. `dateSubmitted` seems
>>> like a good candidate for that. Does anyone have thoughts or
>>> concerns on this idea?
>>>
>>> Best wishes,
>>> Arne Wilken
>>>
>>> --
>>> To unsubscribe from this group and stop receiving emails from it,
>>> send an email to users+un...@opencast.org.
>>
>>
>> /
>> /
>
> --
> To unsubscribe from this group and stop receiving emails from it, send
> an email to users+un...@opencast.org
> <mailto:users+un...@opencast.org>.

Matthias Neugebauer

unread,
Nov 8, 2023, 5:17:19 AM11/8/23
to us...@opencast.org
@Arne: well this reminds me of https://xkcd.com/927/. IMO adding another field without a clear definition will only add to the confusion leading to more discussions. BTW the field `created` is read-only by default.

My conclusion would be: before adding another field, let us be clear what `created` and `startDate` mean and only add another one, if we feel something is missing.

– Matthias


educast.nrw / ZHLdigital (eLectures)
ERCIS - European Research Center for Information Systems

University of Münster
Leonardo-Campus 3 - Room 327
48149 Münster
Germany

Tel: +49 251 83-38268
Mail: matthias....@uni-muenster.de

Karen Dolan

unread,
Nov 8, 2023, 11:33:41 AM11/8/23
to us...@opencast.org
Hi Arne

@Karen: We are not dead set on adding `dateSubmitted` per se. Our main goal is having a metadata field that represents the thing we want, namely the date and time of the ingest of an event. We merely thought `dateSubmitted` might be a good fit for that. What do you think?

Yes, I also came to the conclusion that benefits of using the DublincCore “dateSubmitted” outweighs the problems. The DublinCore standard definition of “dateSubmitted" matches the intent of submitting the raw product to be evaluated and processed. IMHO, it’s better to lean on a standard than for making up custom stuff.

I also wasn't aware about the OC namespace thing. Does that allow us to define our own metadata fields in the dublincore.xml? If so, ditching `dc_dateSubmitted` and adding `oc_ingestDate` instead might be a good solution? But the constants you linked to in `[...]/DublinCores.java` are not used anywhere?

@Arne: What is your use case for “dateSubmitted"? Is it to identify how long a workflow has been stuck? What else do you need that date for?

Here are dates that are relevant to our site:

Scenario:
- A capture agent is scheduled to start recording in a class at 9:55am, Nov 1
- The instructor started live class 5 minutes later at 10:00, Nov 1
- The recording for the class was stopped 5 minutes after class ended at 12:05
- The recording was ingested into Opencast 5 minutes later at 12:10 Nov1
- The recording was processed and published at 14:35 Nov1
- The captions file was added an hour later at 15:35

Distinct Dates:
- Capture Agent schedule Start and Stop - Traditionally DublinCore “temporal” in the episode catalog
- Date of class-time, represented by the recording, used for searching for 10am classes in OC, engage, etc - Traditionally the DublinCore “created” in the episode catalog
- Date of mediapackage creation - Traditionally the “created” attribute of the mediapackage,  potentially confused with the “created” attribute of the episode DublinCore
- Date of ingest - the ingest workflow's start time of the first workflow associated to the mediapackage. This information is not published with the mediapackage.
- Date of processing - also represented in the workflow history of the mediapackage. This information is not published with the mediapackage.
- Date of publishing the processed mediapackage - The engage search result's “modified” attribute. Typically, the created date is the time that represents the class time represented by the recording. The modified date is not used in the UI. 

———
TMI Example of published Dates available from the Engage search endpoint (not same dates from example)

    "result": {
     
"id": "e8060941-84e9-47a3-acb9-be17c4ac6792",

      "mediapackage": {
       
"duration": 10722160,
       
"id": "e8060941-84e9-47a3-acb9-be17c4ac6792”,  < same Id as the event
       
"start": "2023-12-06T17:59:00Z”,     < potentially different start date than the event

       },

        "metadata": {
         
"catalog": [
           
{
             
"id": "f81f4d01-0b40-4f2b-8d5b-afbd3d3c6f70",
             
"type": "dublincore/episode,   <— contains the XML with additional content of the event, such as the temporary capture agent schedule time.
             
"mimetype": "text/xml",
             
},
             
 "url": "https://example.edu/engage-player/e8060941-84e9-47a3/dab37fff-f1/episode.xml
           
}]

       }

      "dcCreated": "2023-12-06T13:00:00-05:00”,    < Traditionally used to illustrate the start time of the class lecture, not the recording start. Student “show me the 10am class recording”.

      "modified": "2023-08-29T11:37:04.505-04:00”,  < The date that the current version of this mediapackage was last published (for when you use merge publish strategy to update files and attributes)

      "dcTitle": "J'Accuse,   <— The following are a subset of duplicate information from the episode catalog, also relevant to the mediapackage top level.
     
"dcCreator": "Donald Ostrowski",
     
"dcPublisher": "arafferty",
     
"dcContributor": "kking",
     
"dcSpatial": "1story-304",
     
"dcIsPartOf": 20240116935,
     
"dcType": "P15",



From the episode.xml:
Temporal == The actual time the capture agent was told to start recording, Usually 5 minutes before the start of the actual live class time.
Created == Traditionally, the start time of the actual class. Used for display and search on media module. For example: "show me my 10am class recordings"

<dcterms:contributor>ssmith</dcterms:contributor>
<dcterms:created xsi:type="dcterms:W3CDTF">2023-12-06T16:00:00Z</dcterms:created>
<dcterms:creator>Sam Smith</dcterms:creator>
<dcterms:extent xsi:type="dcterms:ISO8601">PT4H15M59.800S</dcterms:extent>
<dcterms:identifier>e8060941-84e9-47a3</dcterms:identifier>
<dcterms:isPartOf>20240116935</dcterms:isPartOf>
<dcterms:publisher>jharvard</dcterms:publisher>
<dcterms:spatial>building1-room3</dcterms:spatial>
<dcterms:temporal xsi:type="dcterms:Period">start=2023-12-06T17:59:00Z; end=2023-12-06T22:14:59Z; scheme=W3C-DTF;</dcterms:temporal>
<dcterms:title>J'Accuse</dcterms:title>
<dcterms:type>P15</dcterms:type>
</dublincore>

Karen Dolan

unread,
Nov 8, 2023, 11:42:43 AM11/8/23
to us...@opencast.org
FYI - our site borrowed the DublinCore “Type" field in the spirit that it describes the nature of the event in the context of our site: P= project, S=section, L= lecture. It is a required attribute for our site’s process.

      "dcType": "P15",
.
<dcterms:type>P15</dcterms:type>
</dublincore>

Martin Schamberger

unread,
Oct 17, 2024, 3:31:03 AM10/17/24
to Opencast Users
Sorry for bringing this up again - but regarding to the upcoming Lifecycle-Management (https://github.com/opencast/opencast/pull/6139) I think a field representing the date of ingest is necessary.

> My conclusion would be: before adding another field, let us be clear what `created` and `startDate` mean and only add another one, if we feel something is missing.

At the moment there may be different definitions for `created` and `startDate`, but in fact they are always the same (see: https://github.com/opencast/opencast/issues/3473) and moreover can be altered by users.

Think of an lifecycle rule "Delete all events with an `created`-date older than 2 years" in place:

- User uploads an lecture recording held 3 years ago.
- User wants to display the "correct" date in the player and changes the "Start date" in der EditorUI/AdminUI/ltitools/whatever to date 3 years ago. `created` and `startDate` are synced.
- Lifecycle rule triggers deletion.

As Karen pointed out there may be alternatives - but are these accessible for lifecycle rules?
> - Date of ingest - the ingest workflow's start time of the first workflow associated to the mediapackage. This information is not published with the mediapackage.
> - Date of processing - also represented in the workflow history of the mediapackage. This information is not published with the mediapackage.
> - Date of publishing the processed mediapackage - The engage search result's “modified” attribute. Typically, the created date is the time that represents the class time represented by the recording. The modified date is not used in the UI.

Regards,
Martin

Schulte Olaf Andreas (ID)

unread,
Oct 28, 2024, 5:44:04 PM10/28/24
to us...@opencast.org

Dear All

 

Welcome to the 12th episode of this topic, cf. the discussion at https://groups.google.com/a/opencast.org/g/dev/c/vuopMKcgJQc eight years ago when Matthias was “not yet very familiar with the Opencast code base”. That still holds true for me, so consider these comments non-technical.

 

I agree “created” and “startDate” should be allowed to hold different values - but not required to:

 

“created” refers to dcterms:Created and is the “Date of creation of the resource” where the resource is both the video and the content it shows, often a lecture (https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/created/). This to me is a canonical date, at least for lecture recording.

 

“startDate” is not part of the DC terms (https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) and was created to address differences between the time of a lecture and the actual recording time of a video, cf. mail attached. This to me is more to the technical side and required for scheduling, I think.

 

Things are different with uploads though: Here, the two values should be allowed to be both identical and different: There’s no reason to differentiate the two if the teacher uploads an explanatory video out of the blue. However, if she uploads a missing/failed lecture recording with one she did at home, she needs to backdate this for the video to have the correct date of the lecture held (dcterms:Created) in the context of the lecture series. Or antedate for a lecture they know she will be missing. So editing dcterms:Created should be allowed when uploading and the startDate becomes canonical.

 

In conclusion, I would argue for separating the two and using them for life cycle management as it fits your requirements: If you want to manage lecture recordings en masse use dcterms:Created. If you define a retention rule for uploads, use startDate and let users manipulate dcterms:Created to indicate the video is from a different date than the upload date. I’m not sure, but I think the life cycle management implemented gives us the freedom to do this.

 

Regards

 

Olaf A.

To unsubscribe from this group and stop receiving emails from it, send an email to users+un...@opencast.org.

[OC Users] [MH-12250] Synchronize Dublin Core date created and start date in DC temporal - Opencast.eml

Katrin Ihler

unread,
Oct 29, 2024, 7:53:35 AM10/29/24
to us...@opencast.org

Hi Olaf,

I don't have time to go into the details right now, but there are actually three dates related to recordings: created, the bibliographic start date (when the lecture happened/will happen) and the technical start date (used by the capture agent for scheduling, can differ from the bibliographic one for technical reasons e.g. start five minutes early and end five minutes later). If I remember correctly, the first two are part of the dublin core catalog (though start date might be a custom term as Olaf said) and displayed as metadata while the later is part of the schedule information and not part of the catalog at all.

I'm not disagreeing that the usage in Opencast is occasionally nonsensical, especially where created is concerned, which is pretty useless as it is in my opinion. But this is a pretty complex topic and we need to be careful not to misrepresent the actual state of things.

And I would actually say things should be opposite from what Olaf said: The (bibliographic) start date should be when the lecture happened and can be changed (which is already the case), and created should be something that is set automatically e.g. on creation of the event and/or the actual ingest (here is where people usually disagree, I don't really have an opinion) and be read-only. It's possible this is at odds with the way created is defined by DC, but I think it's closer to the way Opencast handles things right now.

Cheers,

Katrin

Opencast-DevOps Teamlead
elan e.V.
Karlstr. 23
D-26123 Oldeburg

elan-ev.de

David Graf

unread,
Oct 29, 2024, 9:18:57 AM10/29/24
to us...@opencast.org

Dear all

 

I agree here with Katrin. In short after an internal discussion how we (University of Bern) see things:

  • startDate: When the lecture happened (for recordings) or when the lecturer wants his/her students to watch the video (for uploads). This needs to be editable, since for example a lecture takes place on a different date as initially planned.
  • createDate: When the event (containing the schedule information for the lecture or the video) was created/ingested. Should be read only. This createDate can also be crucial for support since it carries information about when a possible error might have occurred on our systems.
  • A third date, that may exist here, might be “when the video was recorded”. In our opinion this date is of no (great) use and can be ignored by Opencast. Meaning: A lecturer uploads today (createDate) a video she recorded yesterday (possible name: recordingDate) which replaces a missing/failed lecture recording from the day before yesterday (startDate).

 

Cheers,

David Graf (UniBe)

Karen Dolan

unread,
Oct 29, 2024, 10:02:31 AM10/29/24
to Opencast Users
My 2¢ is that there are Student facing & System processing dates and that the 8 use cases below should be addressed some place in Opencast, which can be in workflow metadata, logs, or other db tables, but not necessarily all of it in the published metadata.

Student Facing Dates:
1.a Datetime of the official time as displayed in the course catalog ( i.e. Lectures on Tues-Wed at 10AM-11:30AM)
1.b Datetime that the recording material “should” be represented to the student (i.e. the live event had to happen early, 8AM, but it’s recording should represent the regular 10AM lecture so to not confuse students looking for the 10AM class recording. Or the instructor will be out for surgery so is republishing a lecture from last year as the 10AM lecture for this year)

Workflow/Systems Processing Dates:
Scheduling: 
2. Date that the capture agent was scheduled to capture a specific recording for a course (this may happen before the term starts).
3. Datetime for starting the recording capture agent (usually 5minutes or so before the live event starts)
4. Datetime for stopping the capture agent (usually 5 minutes or so after the official end of the live event)
Processing:
5. Date that the recording material was ingested into Opencast (this may happen 0-24 hours or so after the end of a live event).
6. Datetime that the live event actually started (usually a little before or a little after the predetermined start time. i.e. “Since everyone is here, let's start the class now…”. Correlates to the start trim time)
Publishing:
7. Datetime of publishing the processed material to an engagement service.
8. Datetime of publishing a modified part of the published processed material.

- Karen


Dear All
 
Welcome to the 12th episode of this topic, cf. the discussion athttps://groups.google.com/a/opencast.org/g/dev/c/vuopMKcgJQc eight years ago when Matthias was “not yet very familiar with the Opencast code base”. That still holds true for me, so consider these comments non-technical.

--------------------------------
Karen Dolan
Software Engineering Team for Teaching and Learning
Harvard University
Division of Continuing Education
125 Mount Auburn Street
Cambridge, MA 02138
karen...@harvard.edu






Schulte Olaf Andreas (ID)

unread,
Oct 29, 2024, 6:45:59 PM10/29/24
to us...@opencast.org

It’s pretty hard to continue this thread now that Martin’s request to differentiate two metadata fields for the sake of life cycle management has become a discussion on 10+ dates, but let’s try to find some common ground:

  • We seem to agree there are two different use cases, scheduled lecture recording and uploads; plus, they congregate if uploads are added to lecture recordings, replacing lectures not recorded or adding videos to a series.
  • There is external metadata (student-facing, says Karen, we also serve library systems) and there are internal dates (“Workflow/Systems Processing Dates”); the former are closer to the DC world, the later to the OC namespace.
  • We seem to agree adding new metadata fields is probably not a good idea until we fleshed out what we currently got and how it is used.

And that’s it already. Could people please comment on whether these are perspectives we share so we could move on differentiate existing datetime information along those lines? Other suggestions to structure this thread are welcome, of course.

 

O

majo...@gmail.com

unread,
Oct 30, 2024, 4:13:51 AM10/30/24
to us...@opencast.org
The fact that this is - approximately - the 12th iteration of a
discussion about metadata does eventually indicate, that the current
approach of handling metadata dates is just not flexible enough ;)

Here at Uni Wien we do not use scheduling at all and users are allowed
to change metadata after uploading/processing. Moreover we have a lot
of uploads that are no lecture recordings, but movies, TV broadcasts,
historical video footage... hence my fear, that the common ground is
quite small.

I completely agree with Karen's proposal to differentiate between
technical/system processing/internal dates (the more the better) and
student facing/bibliographical/DCC/external dates.

While the former should be more or less immutable and, the latter may
be altered by users.
It would be great, if adopters could configure whether and which
technical metadata is used/linked to student
facing/bibliographical/DCC/external dates.

As side effect, admins/support would have a better overview about the
entire lifecycle of an event.

I just noticed, that there was a meeting of the architecture board on
Monday - and metadata was discussed as well.
https://hedgedoc.uni-osnabrueck.de/wriz3MMpQIq0qox8v8moOA

Regards,
Martin

Am Dienstag, dem 29.10.2024 um 22:45 +0000 schrieb Schulte Olaf
Andreas (ID):
> It’s pretty hard to continue this thread now that Martin’s request to
> differentiate two metadata fields for the sake of life cycle
> management has become a discussion on 10+ dates, but let’s try to
> find some common ground:
>  * We seem to agree there are two different use cases, scheduled
> lecture recording and uploads; plus, they congregate if uploads are
> added to lecture recordings, replacing lectures not recorded or
> adding videos to a series.
>  * There is external metadata (student-facing, says Karen, we also
> serve library systems) and there are internal dates
> (“Workflow/Systems Processing Dates”); the former are closer to the
> DC world, the later to the OC namespace.
>  * We seem to agree adding new metadata fields is probably not a good
> >  * startDate: When the lecture happened (for recordings) or when
> > the lecturer wants his/her students to watch the video (for
> > uploads). This needs to be editable, since for example a lecture
> > takes place on a different date as initially planned.
> >  * createDate: When the event (containing the schedule information
> > for the lecture or the video) was created/ingested. Should be read
> > only. This createDate can also be crucial for support since it
> > carries information about when a possible error might have occurred
> > on our systems.
> >  * A third date, that may exist here, might be “when the video was
> > > send an email tousers+u...@opencast.org.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email tousers+u...@opencast.org.
> > To unsubscribe from this group and stop receiving emails from it,
> > send an email tousers+u...@opencast.org.

Katrin Ihler

unread,
Oct 30, 2024, 5:39:16 AM10/30/24
to us...@opencast.org
> until we fleshed out what we currently got and how it is used.

I think that, for me, would be the most important step. We can all talk
until we're blue in the face but the fact remains that even more
experienced Opencast developers like myself are not 100% certain about
the status quo until we actually look at the code (which is probably
already a problem). And then there are the different use cases from the
different institutions. i think compiling an overview and then a
proposal on how to move forward that will be agreeable to at least most
of us would be the way to go. If that means more metadata in the end to
better differentiate things, that's fine by me.

I would also argue that we need to find a balance between being able to
configure things and for things to reliably exist and have a predefined
structure and meaning, because otherwise writing integrations like
Tobira becomes a nightmare (a point that was made during the
architecture meeting before the DACH conference).

Cheers,

Katrin


--
Opencast-DevOps Teamlead

ELAN e.V.
Reply all
Reply to author
Forward
0 new messages