abstract has some extra illegible characters

55 views
Skip to first unread message

Purna Srivatsa

unread,
Nov 13, 2025, 7:51:00 AMNov 13
to OpenAlex Community
Eg - https://api.openalex.org/w1540483267

The reconstructed abstract( even the inverted one ) have - 

from API - 
,"abstract_inverted_index":{"<div":[0],"class=\"htmlview":[1],"paragraph\"><b>A</b>":


Reconstructed text - 

"<div class="htmlview paragraph"><b>A</b> model based on an ionization equilibrium analysis, that can relate the ion...."


Bianca Kramer

unread,
Nov 13, 2025, 10:46:15 AMNov 13
to Purna Srivatsa, OpenAlex Community
Hi Purna, all,

This is a direct representation of the abstract as it is in Crossref: 

"\u003Cjats:p\u003E<div class=\"htmlview paragraph\"><b>A</b> model based on an ionization equilibrium analysis,

While OpenAlex also has abstracts sourced in other ways, one direct route is taking them from Crossref where they are deposited by the publisher. So any lapses in quality (like JATS/html tags) are also the responsibility of the publisher :-) 

kind regards,
Bianca


--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/c8726876-3fea-49c3-9d5e-e0695f6d3cb0n%40googlegroups.com.

Purna Srivatsa

unread,
Nov 13, 2025, 2:12:25 PMNov 13
to OpenAlex Community
I see. Thanks for the clarification.

Also i see some results have "null" for abstract.


But the HTML link for paper does have abstract.

What is the reason these papers have "null" abstracts and is there a plan to re-populate/enhance these works the abstract text ?

Kevin McCurley

unread,
Nov 14, 2025, 1:47:37 PMNov 14
to OpenAlex Community
In some cases the publisher may regard the abstract as copyrighted material that they do not wish to disclose. In today's world where some parts of publishing is dominated by commercial publishers, this is to be expected. In other cases the abstract may simply be too complicated to reformat in JATS format for reporting to crossref. We face that problem since we allow TeX mathematics (even display mathematics) and itemized lists in abstracts, but JATS has poor support for these in an abstract. The JATS standard for abstracts is too limited: https://jats.nlm.nih.gov/publishing/tag-library/1.4/element/abstract.html

You probably shouldn't expect all abstracts to appear in openalex until the publishing world changes more things.

Kaveh Bazargan

unread,
Nov 14, 2025, 4:09:03 PMNov 14
to Kevin McCurley, OpenAlex Community
I had no idea JATS limit the use of XML elements in Abstracts. So can you not use MathML in abstracts?



--
Kaveh Bazargan PhD
Director
River Valley ● X ● LinkedIn ● ORCID BlueSky
Accelerating the Communication of Research
https://rivervalley.io/news/river-valleys-review-awarded-ismte-innovation-award  https://bit.ly/446djXt  https://www.sspnet.org/community/news/announcing-winners-for-the-2025-epic-awards/#:~:text=Gold%20%E2%80%A2%20River%20Valley%20Technologies%20%E2%80%93%20ReView%203.0:%20Next%20Generation%20Peer%20Review

Kevin McCurley

unread,
Nov 14, 2025, 4:51:14 PMNov 14
to OpenAlex Community
Well, sort of. An <abstract> may contain <sec>, and <tex-math> can be used inside of <sec>. but a <sec> requires a <title> inside of it. It's not really representative of abstracts as they occur in scholarly communication, unless the title is just "Abstract" (which feels redundant). It's noteworthy that neither JATS or MATHML are used by authors - they are mostly used as middleware. The rendering of MATHML in browsers is almost always inferior to the rendering of mathematics in TeX.

Kaveh Bazargan

unread,
Nov 14, 2025, 4:55:15 PMNov 14
to Kevin McCurley, OpenAlex Community
Thank you. Very informative. Yes, the final rendering of math should be with the TeX engine. We keep as MathML but can embed nuances of spacing in the MathML too. 

Regards
Kaveh

Purna Srivatsa

unread,
Nov 15, 2025, 8:09:58 AMNov 15
to OpenAlex Community
Hi,
Just following up. From what i gather, the publisher reports it to crossref ? And open alex then fetches it from cross ref ? 
Who does the reformatting in JATS format for reporting to crossref. Is it the publisher ?

> In some cases the publisher may regard the abstract as copyrighted material that they do not wish to disclose.

For these cases, can we have a different value/indicator/field that clearly indicates the abstract was not disclosed instead of null. Sort of having a way to separate data formatting issues from non-disclosure issues ?

Reply all
Reply to author
Forward
0 new messages