Some titles have html tags

30 views
Skip to first unread message

Purna Srivatsa

unread,
Nov 15, 2025, 8:12:14 AM (14 days ago) Nov 15
to OpenAlex Community
Eg - https://api.openalex.org/w1918865178

"title":"<i>Superman</i>, a regulator of floral homeotic genes in<i>Arabidopsis</i>"

"display_name":"<i>Superman</i>, a regulator of floral homeotic genes in<i>Arabidopsis</i>"

Just checking if this is a bug or inherently populated by design ?

Kevin McCurley

unread,
Nov 19, 2025, 1:46:21 PM (9 days ago) Nov 19
to OpenAlex Community
If you look at the crossref schema, they allow some markup in titles, including some 'face markup' like <i>. Since crossref is currently the primary supplier of data to openalex, you should expect some things like this. The specification says that math:mml is also allowed, and https://api.crossref.org/works/10.1016/j.jnt.2025.08.009 from Elsevier is an example where that happens. Another example from Elsevier is https://api.crossref.org/works/10.1016/j.jnt.2025.08.019 where they use a mix of UTF-8 and an HTML entity for the '>' symbol. 

Some publishers encode mathematics in titles using some variation of TeX. An example from the European Mathematical Society is https://api.crossref.org/works/10.4171/JEMS/1454. Springer also does this; see https://api.crossref.org/works/10.1007/978-3-032-01855-7_2 for example. Unfortunately they use the double delimiter $$ which is supposed to be for display mathematics rather than inline mathematics (and is also not recommended for LaTeX - it's only a TeX primitive). The American Mathematical Society uses a mix of things, including TeX mathematics without delimiters: https://api.crossref.org/works/10.1090/bproc/150

I think the bottom line is that you cannot make many assumptions about the format of titles. It's a string blob.

Reply all
Reply to author
Forward
0 new messages