Dear Renaud
Thank you for your message and for your interest in Europe PMC and our work.
The machine-learning pipeline we use for literature processing is available here:
https://github.com/ML4LitS/CAPITALWithin this pipeline, the model specifically used for gene name extraction is part of the annotation models repository:
https://github.com/ML4LitS/annotation_modelsThese repositories contain the components used to train and run the models that identify biological entities such as genes from article titles, abstracts, and other text sources.
If your goal is to perform gene name recognition from titles and abstracts, the annotation models repository should be the most relevant starting point. The CAPITAL pipeline provides additional context on how the models are used within a broader literature processing workflow using our Annotations API.
Please feel free to reach out if you have further questions or need clarification on any of the components.
Best wishes
Santosh