How to access full text for articles without PMCID

23 views
Skip to first unread message

Malay Gaherwar

unread,
Jan 26, 2026, 10:26:10 AM (4 days ago) Jan 26
to Europe PMC Developer Forum

I am building a data pipeline for my University to extract full-text content for a large set of Open Access records for the query "breast cancer". When I query this using the EPMC website then I get: Full text: In Europe PMC (656,815)
Full text: Unpaywall link (63,878) 
And a total of 921,629 results. Can someone clarify how many of these can I get Full text using API and for how many can I get the metadata?

My current logic uses the /search endpoint with resultType=core and cursorMark for deep paging. I have two distinct workflows:

  1. For records with a PMCID, I successfully used the /{PMCID}/fullTextXML endpoint to get JATS XML which were around 400k. 

  2. For Open Access records without a PMCID, I am currently parsing the fullTextUrlList to find links where availabilityCode is "F" and documentStyle is "html". Sometimes there is a PDF which helps a lot. But the HTML does not have the full text. 

My Question: How can I retrieve the full-text for these no-PMCID articles/book/etc?

Any advice on improving the robustness of this "no-PMCID" workflow would be greatly appreciated.

Best regards


Madhumiethaa Jayaprabha Palanisamy

unread,
Jan 27, 2026, 6:41:19 AM (3 days ago) Jan 27
to Europe PMC Developer Forum, Malay Gaherwar
Hi Malay,

Thanks for reaching out.

We provides bulk download of Open Access full text content available in Europe PMC via FTP, including both XML and PDFs. The XML set is updated weekly, while the PDF set is updated monthly. Each file is mapped to its corresponding PMCID. You can explore the bulk download options here: https://europepmc.org/downloads/openaccess

Although the FTP doesn’t support query-based downloads directly, you can combine it with the Search API to achieve the same result. For example, refine your query using the open access and full-text filters: "breast cancer" AND OPEN_ACCESS:Y AND HAS_FT:Y, and retrieve all matching PMCIDs. (Check more syntax here: https://europepmc.org/searchsyntax#fulltextavailability). If you only need IDs, set resultType=idList in the API call. Once you have the PMCIDs, download the OA bulk set from FTP and filter locally to keep only the files corresponding to your PMCIDs. You can consider this approach to avoid a lot of API calls.

And yes, only the subset of OA articles with a PMCID will have full text directly available in Europe PMC. For all other OA records, your approach using fullTextUrlList looks good. It has links to pdf/ html/ doi whichever available.

Hope this helps.

Kind regards,
Madhu

Malay Gaherwar

unread,
Jan 28, 2026, 5:13:46 AM (2 days ago) Jan 28
to Europe PMC Developer Forum, mad...@ebi.ac.uk, Malay Gaherwar
Thanks Madhu for the detailed response.

I will check out the Bulk download via FTP option as well. But again it only helps me in the cases were PMCID is present.

For the case of articles with no PMCID as I said I tried the fullTextUrlList option. But most of them only have HTML and when I try to download using HTML, it is giving me only the website shell structure and not the actual content. I can share the code in case that helps but could you provide me the best practice to download the full text of such OA articles without PMCID?

And could you also explain the numbers I had mentioned as this is quite important for our future paper:
Full text: In Europe PMC (656,815) Does this mean these many should have a PMCID? If yes then why did my code stop at 400k (with no errors)?
Full text: Unpaywall link (63,878) Can I download this full text or not? And via which method?
Total : 921,629 results. This also includes articles which are paid and are linked externally I assume for example an Elsevier link. Is there a way I can just download the Metadata of the files which are not OA?

Best regards,
Malay

Madhumiethaa Jayaprabha Palanisamy

unread,
Jan 29, 2026, 12:13:08 PM (yesterday) Jan 29
to Europe PMC Developer Forum, Malay Gaherwar, Madhumiethaa Jayaprabha Palanisamy
Hi Malay,
If a full text article is open access but does not have a PMCID, it is not downloadable via FTP. But if its a open access preprint with fulltext, you can programatically access using fulltextXML api as https://www.ebi.ac.uk/europepmc/webservices/rest/PPR1006247/fullTextXML.
For other articles full text open access articles, there is no supported best practice in Europe PMC to download the content. You can only check the fullTextUrlList which has links to external publisher sites.

About the reported numbers,

Full text: In Europe PMC (656,815) 
This does not mean that all these articles have a PMCID as mentioned above. This count includes all articles that are freely available to read within Europe PMC, but only a subset is open access and downloadable. Also as mentioned earlier, full-text content can be downloaded/accessed from Europe PMC only for articles that are open access and explicitly marked with OPEN_ACCESS:Y. When you restrict your query using OPEN_ACCESS:Y ("breast cancer" AND OPEN_ACCESS:Y AND HAS_FT:Y), the count drops to around 400k, which explains why your code stops at that number without errors.

Full text: Unpaywall link (63,878) These are all articles with a known link to a free and legal copy of the full text, but that are not available to read/downloadable within Europe PMC. These links are usually provided by Unpaywall. Hence, Europe PMC does not support downloading full text for these articles.

And yes, the total results (921,629) includes open access articles, non open access articles, articles linking to external publisher platforms.,

You can download the metatdata using EuropePMC search API  with Lite Result type which gives key metadata. There is also ftp for Metadata of all Full-Text Europe PMC articles on the downloads page:https://europepmc.org/downloads

Hope this helps. Do let me know if anything is unclear or if you have any further questions.

Best regards,
Madhu
Reply all
Reply to author
Forward
0 new messages