Text limits in LRMI

4 views
Skip to first unread message

Hugh Paterson III

unread,
May 3, 2022, 12:12:14 PMMay 3
to lr...@googlegroups.com, DCMI-LRMI Task Group

Greetings,

My understanding of reading the documents for LRMI is that many of the expected values include the data type "text". This appears to be defined as schema.org/Text.


However, my understanding is that there is no technical definition of what can or can not be included in schema.org/Text. For example, control characters, or a defined character set such as UTF-8 (as opposed to UTF-16 or latin-1). This seems to be a hole in the LRMI standard... at least for implementation purposes. See discussion here: https://github.com/schemaorg/schemaorg/issues/3033

Is my understanding correct?

- Hugh

Phil Barker

unread,
May 4, 2022, 5:37:54 AMMay 4
to lr...@googlegroups.com


On 03/05/2022 17:11, Hugh Paterson III wrote:

Greetings,

My understanding of reading the documents for LRMI is that many of the expected values include the data type "text". This appears to be defined as schema.org/Text.

Yes, that is true of the properties as adopted by schema.org, for example schema:educationalLevel. It is a feature of schema.org that the ranges are loosely defined and very often include Text. This is makes publishing data easier but consuming it harder than if the range were more tightly constrained, which is a schema.org design decision.

The same property is also defined in the Dublin Core LRMI namespace, as http://purl.org/dcx/lrmi-terms/educationalLevel, where the value is is expected to be a schema:DefinedTerm or a SKOS:Concept. This is arguably more suited to environments where there are greater expectations on the producers of metadata to conform to a standard in order to make it easier to consume that data.

In both cases the expected value type is defined using schema:rangeIncludes, which (intentionally) leaves enough leeway for us to say that the properties are the same (in RDF terms).

However, my understanding is that there is no technical definition of what can or can not be included in schema.org/Text. For example, control characters, or a defined character set such as UTF-8 (as opposed to UTF-16 or latin-1). This seems to be a hole in the LRMI standard... at least for implementation purposes. See discussion here: https://github.com/schemaorg/schemaorg/issues/3033

Is my understanding correct?

That is my understanding too.

I assume that when it comes to character sets such as UTF-8 etc, the character encoding would be set for the file as a whole, e.g. in the charset parameter of the HTTP header or as the "encoding" attribute in the xml header.

Phil


- Hugh
--
You received this message because you are subscribed to the Google Groups "Learning Resource Metadata Initiative" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lrmi+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lrmi/CAE%3D3Ky9rc9DgBYEpOeXVrJCeDm%2BDq9N0Gp_4cFQtyDFvtrSVcA%40mail.gmail.com.
--

Phil Barker. http://people.pjjk.net/phil
CETIS LLP: a cooperative consultancy for innovation in education technology.
PJJK Limited: technology to enhance learning; information systems for education.

CETIS is a co-operative limited liability partnership, registered in England number OC399090
PJJK Limited is registered in Scotland as a private limited company, number SC569282.

Reply all
Reply to author
Forward
0 new messages