Hi
I am posting here as I think the answers will be of interest for everybody.
My questions are:
- What is vectorised? Title and Abstract? Fulltext if available?
- Which embeddings model does OpenAlex use, and how stable is this, i.e. how often will this be updated? Ion the same line - if updated, can I select the old embeddings model (repeatability of searches and change over time).
- Which similarity metric is used? I assume cosine similarity?
- I assume, that this will be deterministic, i.e. I get the same works back each time?
It would be great, if the following extensions could be added:
- vectorise a document locally, and supply the vector to the API. Use case: I have a hand full of Articles and want to find additional papers similar too these articles.
- instead of getting the top 50 back, getting everything above a certain similarity. This would make a explorative systematic literature search possible using the semantic search.
Thanks a lot,
Rainer
---
Dr. Rainer M. Krug (PhD Conservation Ecology, SUN; MSc Conservation Biology, UCT; Dipl. Phys. Germany)
Senior Data Specialist
Environmental Bioinformatics,
SIB Swiss Institute of Bioinformatics
Zurich
Senckenberg Biodiversity and Climate Research Centre,
Senckenberg Society for Nature Research
Frankfurt Main