Filtering where there are commas

49 views
Skip to first unread message

Ed Summers

unread,
Mar 19, 2025, 7:33:05 AMMar 19
to OpenAlex Community
Hi OpenAlex:

While looking up some DOIs from an existing dataset in OpenAlex I discovered that trying to filter on a DOI with a comma in it causes the API to throw a somewhat difficult to understand error. For example:

https://api.openalex.org/works?filter=doi:10.1103/physrevd.72,031

returns an error

Invalid query parameter in 031101.

It also returns an error when the comma is URL encoded:

https://api.openalex.org/works?filter=doi:10.1103/physrevd.72%2C031101

I honestly don’t know if commas are valid in DOIs, and in this case the DOI isn’t valid, but I was thinking perhaps the API should still be able to look them up without returning an error that doesn’t really make much sense?

I see from the docs that the comma is used to filter by multiple fields [1]? Is there a way to escape commas that might appear in values?

Thanks for your consideration, and for your excellent service.

//Ed

[1] https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists#intersection-and

PS. If it would be preferable for me to drop this as an issue in a Github repository just point me in the right direction.
Message has been deleted

Samuel Mok

unread,
Mar 19, 2025, 8:03:52 AMMar 19
to Ed Summers, OpenAlex Community
Comma's are apparently valid in DOIs; incredibly, all Unicode 2.0 chars are valid!
DOI registrars do have rules they set on top of the DOI specification that limit the charspace for various reasons, see e.g. here for crossref's own rules. That doesn't change the underlying standard though; so older DOIs and those from other registrars could contain a wider set of chars than crossref accepts.

Back to the error: indeed, the issue is that comma's are interpreted by the OA api as part of the query structure instead of the value -- encoding the comma will not change that, as the incoming query is first parsed by the api, and then split on delimiters like = and , . 
Fixing this would require a rewrite/change in how the API interprets delimiters. This'll probably need a specific check for this edge case, something like: if a comma is preceded by filter=doi:<...>  don't immediately use as a delimiter, first -somehow- check if the comma is part of a doi instead. I think that'll be pretty gnarly to handle; I'm not sure if this is worth it. Oh, and this'll probably also need some changes in the backend, as I imagine the parser/ingestor currently doesn't include handling of all possible delimiter chars that can be part of a DOI.

Interesting question though, I never realized the DOI valid charspace was so broad!

Cheers,
Samuel

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/80EE9C08-3241-4898-AD24-C5345A0F4563%40pobox.com.

Ed Summers

unread,
Mar 19, 2025, 8:05:59 AMMar 19
to Gabor Schubert, OpenAlex Community
Thanks, I suspected as much, although I know what is in the wild varies a bit from what is currently allowed.

But I guess my question remains, should there be a way to escape commas during filter queries?

//Ed

> On Mar 19, 2025, at 7:54 AM, Gabor Schubert <gabor.sch...@gmail.com> wrote:
>
> Commas are apparently not valid in DOIs: according to Crossref's documentation: https://www.crossref.org/documentation/member-setup/constructing-your-dois/ The only accepted values are letters of the Roman alphabet: a-z (case insensitive), numbers 0-9, and "-._;()/" (hyphen, period, underscore, semicolon, parentheses, forward slash). Some older DOIs might contain ">", "<" or "#" signs (https://www.crossref.org/documentation/member-setup/constructing-your-dois/suffixes-containing-special-characters/), like this: https://api.openalex.org/works?filter=doi:10.1002/(SICI)1521-3951(199911)216:1<135::AID-PSSB135>3.0.CO;2-# Interestingly this latter contains ":" colon, which is not explicitely documented at Crossref.
>
> Gabor Schubert
> --
> You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/openalex-community/4fa6f230-d381-42d2-8b1a-0eab6ad34787n%40googlegroups.com.

Ed Summers

unread,
Mar 19, 2025, 8:26:11 AMMar 19
to Samuel Mok, OpenAlex Community

> On Mar 19, 2025, at 8:03 AM, Samuel Mok <sam...@gmail.com> wrote:
>
> This'll probably need a specific check for this edge case, something like: if a comma is preceded by filter=doi:<...> don't immediately use as a delimiter,

I don’t think it’s only an issue with doi fields, but any field you are filtering by where the value happens to contain a comma?

//Ed

Gabor Schubert

unread,
Mar 19, 2025, 8:33:03 AMMar 19
to OpenAlex Community
It seems that someone has reported the same problem for title search at OpenAlexR at Github June 2024: https://github.com/ropensci/openalexR/issues/254 . According to the answer OpenAlex was aware of the problem, but as far as I see it's still not working for title search either:  https://api.openalex.org/works?filter=title.search:%22Hello,John%22

Gabor

Reply all
Reply to author
Forward
0 new messages