HTML or Unicode in titles?

21 views
Skip to first unread message

Jörg Prante

unread,
Aug 1, 2024, 4:44:15 AMAug 1
to OpenAlex Community
Hello,

I would like to know if there are any plans to clean up HTML tags in titles such as in W2315752015:

The First Example of a Crystalline Subvalent Organolanthanum Complex: [K([18]crown-6)- (η<sup>2</sup>-C<sub>6</sub>H<sub>6</sub>)<sub>2</sub>][(LaCp<sup>tt</sup><sub>2</sub>)<sub>2</sub>(μ-η<sup>6</sup>:η<sup>6</sup>-C<sub>6</sub>H<sub>6</sub>)]•2C<sub>6</sub>H<sub>6</sub> (Cp<sup>tt</sup> = η<sup>5</sup>-C<sub>5</sub>H<sub>3</sub>Bu<sup>t</sup><sub>2</sub>-1,3)

Probably, HTML tags like <sup> or <sub> might be preferred, or just being brought in by the original sources, and kept in OpenAlex for reference or title matching, for whatever reason.

As there are possibilities to replace the HTML tags by Unicode - see https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts - it would seem to be feasible and helpful to the community to use Unicode instead.

There are rare examples of titles which seem to exceed to maximum length, and therefore, the HTML tags are unbalanced. Unfortunately, I do not have the W identifiers of such titles at hand.

Is it recommendable to start a private effort to clean up  title strings?

Best regards,

Jörg
Reply all
Reply to author
Forward
0 new messages