Dear Colleagues,
Thank you all for contributing to the GA4GH Connect session:
Literature derived variant evidence: toward GA4GH compatible representation - what a productive and engaging discussion! A big thank you to
Anaïs Mottaz and Michael Baudis for an excellent session, and to all participants for their valuable contributions and engaging discussions. Your input made this a truly meaningful conversation.
‼️ Please take a moment to review the
meeting notes and share any edits or additions to help keep the summary accurate and complete. If you notice any mistakes, have resources to add, or want to share more details, please feel free to jump in and update the notes - we welcome your input!
A link to the recording is available in the
linked meeting notes.
________
Key Takeaways
🔶 Literature-based variant search tools like Variomes provide higher coverage and recall for new or understudied variants compared to curated databases, serving as a vital complement for curation and evidence tracking.
🔶 GA4GH standards (VRS, CAT-VRS, VRSATILE) enable precise representation of imprecise/underspecified variants from literature, facilitating interoperability, discovery, and integration with knowledge bases like Civic or Beacon.
🔶 Addressing gaps in CNV annotations for cancer requires hybridising observational data with literature evidence, using data-driven frameworks to identify drivers and improve pathogenicity assessments.
________
Meeting Summary
The meeting explored
how literature-derived variant evidence can be better integrated with GA4GH standards to improve discoverability, representation, provenance, and supporting evidence. This session focused on advancing variant representation and extraction from scientific literature to bridge observational and interpretative data in genomics.
Anaïs Mottaz from the SIB Text Mining Group presented tools like LitVar and Variomes for searching variants in PubMed and PMC, emphasising the need for database-independent normalisation using HGVS and GA4GH standards like VRS (Variation Representation Specification) and Cat-VRS to handle underspecified or categorical variants, improve interoperability, and potentially integrate with Beacon for unified access.
Discussions highlighted challenges such as non-SNP variants, multilingual support (e.g. Japanese case reports), confidence scoring, and evidence extraction using language models.
Michael discussed
somatic copy number variants (CNVs) in cancer, pointing out the "pathogenicity annotation gap" due to under-annotation and proposing data-driven approaches with Cat-VRS to integrate functional evidence and observational data from resources like Progenetix.
Attendees, including experts from GKS, explored collaborations for stepwise implementation, precise vs. categorical representations, and hybrid evidence aggregation.
________
Action Items
🔶 Join the GKS Think Tank channel to collaborate on use cases for variant extraction, normalisation, and representation (e.g. starting with SNVs/small indels and abstracts/full-text papers).
🔶 Develop and test a prototype for representing literature-extracted variants in JSON files using Cat-VRS and VRS, focusing on a small literature subset for benchmarking and validation.
🔶 Explore API development for variant annotation in non-English languages, potentially incorporating machine translation and multilingual models.
🔶 Coordinate with teams (e.g. Microsoft/EVAG collaboration) to incorporate Variomes into broader literature search assessments and manual validation efforts.
🔶 Bring CNV data to GKS for enhancing Cat-VRS support, including rearrangements and fusions.
________
If there
's anything we missed or anything you'd like to add, feel free to reach out!
Please take a moment to complete our
Exit Survey:
https://forms.ga4gh.org/t/qjHm9R1EGmusYour feedback is invaluable and helps us improve future GA4GH Plenary and Connect meetings, ensuring they continue to provide the best possible experience for our community.
Thanks again for your engagement and contributions!
Best,
Beatrice