Exercise metadata

23 views
Skip to first unread message

Matthew Leingang

unread,
Aug 13, 2025, 12:25:24 PMAug 13
to pretext...@googlegroups.com
Hi everybody,

Where is the best place to give metadata about an exercise element?

I'm working on a course, and I'm conceiving of keeping exercises in a shared repository, from which individual problem sets could <xi:include> individual exercises. It would be helpful if the exercise elements had information attached to them, such as the author, date of creation, difficulty level, learning objective, Bloom type, or depth of knowledge level.

Right now, I'm primarily considering a use case of human authors scanning the XML source, and using the metadata to decide what problems to include. If that's the only use case, ever, then a big comment would be sufficient. But another possible application (which I am *not* requesting at this time, to be clear) is in an instructor edition publication, where the metadata could be used to annotate or decorate the exercise. For that, it would be better to embed metadata into the XML source. Or if someone sets an exam and wants to analyze which topics were best understood, structured metadata would be helpful.

The only metadata container I know of in PreTeXt is <bibinfo>, so I've been playing around with adding it to exercises, and adding other metadata properties using Dublin Core as in https://www.dublincore.org/specifications/dublin-core/dc-xml-guidelines/. I know this violates the document schema, but AFAICT the extra elements are silently skipped during processing. So is it that bad after all?

Interested to hear the sages' thoughts on this.

Best,
Matthew



--

Matthew Leingang (he/his)
Clinical Professor of Mathematics
Assistant Director of Undergraduate Studies

Department of Mathematics
Courant Institute of Mathematical Sciences
New York University

Schedule an appointment with me at https://calendly.com/leingang

Mitch Keller

unread,
Aug 13, 2025, 12:37:23 PMAug 13
to pretext...@googlegroups.com
I’d suggest using extra attributes. Unknown elements are at high risk of being silently dropped along the way, while unknown attributes are generally just copied along in case someone is using extra XSL that might want to make use of it. Attributes generally should not be text displayed to readers of a document, but I think that since you’re thinking about this as metadata and likely to use it down the road to perhaps produce text that is displayed, it would be a reasonable approach. (Perhaps that expands “diff-chain” to show “Compute a derivative using the chain rule” via some auxiliary list you’re maintaining, for instance.)

--
You received this message because you are subscribed to the Google Groups "PreTeXt support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pretext-suppo...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pretext-support/CACG4xxgeRXFDxoTEnnhj6XOrfLcGJPdAbrgmP%3DBnFPvihjf2Cg%40mail.gmail.com.


Matthew Leingang

unread,
Aug 13, 2025, 2:38:18 PMAug 13
to PreTeXt support
That makes sense.  But the Dublin Core document I linked to makes the opposite recommendation: metadata properties should be elements, and their values should be the content of those elements. Attributes of the property elements can qualify the value. For instance, if your learning objectives come from a registered taxonomy, an attribute could specify that. Or if you wanted to give metadata in different languages, you could use the xml:lang attribute on the property's element.

Another disadvantage of using attributes is that they are single-valued. If you want to say that an exercise covers a range of subjects, with elements you could add <dc:subject>differentiation</dc:subject> and <dc:subject>chain rule</dc:subject>, and maybe also for instance <dc:subject>trigonometric-functions</dc:subject>. With attributes, you can only have one dc:subject. Smushing them into a CSV string makes it harder to search for specific values.

For now, I can live with extra elements being silently dropped. If anything like my pipe dreams were realized, I suppose I could add an XML preprocessing stage to convert metadata elements to attributes. 

Best,
Matthew

Rob Beezer

unread,
Aug 14, 2025, 12:09:37 PMAug 14
to pretext...@googlegroups.com
Dear Matthew,

I like where you are going here. Through the years, we have consciously tried
not to get bogged down with all the intracacies of quality metadata (along with
not getting bogged down with new syntax for mathematics). Oscar has done a nice
job of reforming document-level stuff with #bibinfo. And maybe it is time to
think more carefully about the ideas you propose.

Alex J suggests that WeBWorK problemes as bare PG code will just be brought
on-board by authors from somewhere else. David F suggests we record that
origin. There are other places where we might record the provenance of some
material, such as a TikZ diagram "borrowed" from "somewhere else."

Could we agree on a #provenace metadata-ish container, for #exercise and other
elements (e.g. STACK exercises)? Maybe populated with a subset of Dublin Core
(DC) items so we don't have to think about designing it (and we get a namespace
for free)?

exercise
title
provenance
dc:date
dc:creator
...

I know this does not cover your entire use case. But it is more broadly
applicable for us than just a collection of exercises. Maybe a second container
just for exercises, describing their topic coverage, difficulty, etc.

Since you do not plan (yet!) to do anything with these, we can just explicitly
kill them in "pretext-common.xsl" so they never bleed through to output. This
is already standard practice for #title since it gets used lots of places in
different ways and different forms (ToC, file names, etc). I could see a nice
exercise in making a friendly catalog of your exercises by mining and presenting
the metadata (reducing the strain of doing visual scans).

Mitch is right about attributes and DC is right about elements. My rule of
thumb for PreTeXt is that what a reader sees goes in elements and what is a
property or behavior-modifier or enumerated token belongs in an attribute. We
routinely stuff several items in an attribute (space- or comma- delimited). I
just saw a DC example stuff a bunch of keywords (or topics) into one element,
separated by spaces. (Not sure I would *ever* do that.) Not sure DC buys us a
whole lot, but maybe it can't hurt, and the namespace will avoid some confusion
(e.g. #theorem already has a #creator child).

Once you capture stuff as XML it is pretty easy to write transforms that morph
it into other forms of XML, so you could have some freedom to experiment, so
long as use does not spread too far.

Short answer: two new metadata containers, #provenance and #????, to be killed
on-sight, and ill-defined as to children?

Maybe discussion could/should move to "pretext-dev"? @Matthew - you could
request a membership to that group?

Rob

Matthew Leingang

unread,
Aug 15, 2025, 1:34:26 PMAug 15
to PreTeXt support
Will do. Thanks for considering!
Reply all
Reply to author
Forward
0 new messages