Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Download Sentence-transformers All-mpnet-base-v2

353 views

Skip to first unread message

Candace Patanella

unread,

Jan 9, 2024, 5:08:12 PM1/9/24

The all-* models where trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. Toggle All models to see all evaluated models or visit HuggingFace Model Hub to view all existing sentence-transformers models.

download sentence-transformers all-mpnet-base-v2

Download https://t.co/C4UEuT3zIb

So I have been using two sentence transformers, the 'sentence-transformers/all-MiniLM-L12-v2' and 'sentence-transformers/all-mpnet-base-v2'. I thought they were both working well and I could use any of them for a good document retrieval result. But I have tried their hosted inference apis and the results were pretty disappointing.

Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.

We also released the model NegBLEURT, finetuned on the CANNOT dataset, making it significantly more sensitive to negations than its base model. Additionally, I also finetuned a Sentence Transformer: all-mpnet-base-v2-negation (it can be used with the sentence-transformers module). Again, it deals much better with negations.

You can check again at dmlls/all-mpnet-base-v2-negation. I also added a few examples to the widget so that anyone can play with the model directly from the model page. I recommend comparing scores with its base model (sentence-transformers/all-mpnet-base-v2) to verify that the finetuning did work.

This configuration enables text2vec-transformers, sets it as the default vectorizer, and sets the parameters for the Transformers Docker container, including setting it to use sentence-transformers-multi-qa-MiniLM-L6-cos-v1 image and to disable CUDA acceleration.

I was able to find the way around to solve the issue by using hugging face and downloading the whole model and mean_pooling method: sentence-transformers/all-mpnet-base-v2 Hugging FaceThe sentence similarity model I am using to score the similarity of two sentences (the similarity of user utterance and the statements in my knowledge base)

Since we have built the model on top of all-mpnet-base-v2 model, any other projects using this embedding can use our bit embedding as a drop in replacement and reduce inference costs by almost 10 times.

Another option is using Ortex and an ONNX model. Here is an example livebook that uses this approach: Running the all-mpnet-base-v2 sentence transformer in Elixir using Ortex GitHub. In that case, using a serving does not improve the performance that much, at least when running on CPU. Therefore one can also just call Ortex.run directly.

Next, we load the question answering embeddings using the sentence transformer sentence-transformers/all-mpnet-base-v2 into Aurora PostgreSQL DB cluster as our vector database using the pgvector vector store in LangChain:

Some models have been fine-tuned on massive Information Retrieval data and can be used to retrieve documents based on a short query (for example, multi-qa-mpnet-base-dot-v1). There are others that are more suited to semantic similarity tasks where you are trying to find the most similar documents to a given document (for example, all-mpnet-base-v2). There are even models that are multilingual (for example, paraphrase-multilingual-mpnet-base-v2). For a good overview of different models with their evaluation metrics, see the Pretrained Models in the Sentence Transformers documentation.

Be sure to review the documentation for the model you are using. Many models will silently truncate content beyond a certain number of tokens. all-mpnet-base-v2 says that "input text longer than 384 word pieces is truncated", for example.

As an example, I decided to download all my tweets (about 20k) and build a semantic searcher on top. For my first prototype I used mpnet-v2 from sentence-transformers, a relatively small model (438Mb) that should run on any cpu or gpu. It worked fine as long as I used relatively common words that the model had seen, but it didn\u2019t do so well for my tweets in other languages (Spanish, mostly). The next step was to try the Instructor models. They are larger, but I have an 8Gb GPU on my machine that can load instructor-xl into memory. I tried both the large and xl models, and my subjective impression was that xl was indeed more accurate.

The results showed that the top two performers were TF-IDF and the Sentence Transformer model all-MiniLM-L12-v2, with the best % correct identification cosine similarity comparison. In addition to having a very high accuracy, TF-IDF had an extremely fast vector creation time. Among the Sentence Transformers, all-MiniLM-L12-v2 was more than twice faster on a CPU than all-mpnet-base-v2, while having a slightly better accuracy.

The results of the classification tasks show that the sentence transformers as a group performed well, with all-mpnet-base-v2 and multi-qa-mpnet-base-dot-v1 taking the lead with the highest F1-Score. Nevertheless taking the top contenders: TF-IDF and all-MiniLM-L12-v2 they also performed equally well in the classification task.

Next, we conducted experiments with HDBSCAN. We tracked the number of outliers and number of suggested clusters with the default parameters. We knew that FT content does not produce a high number of outlier articles and used it as an indicator for clusterability. Results are reported in Table 4 below. It became apparent that the Sentence Transformers; all-MiniLM-L12-v2 and all-mpnet-base-v2, allowed for clustering a much higher proportion of our training data without generating a high number of outliers. It was decided that the performance metrics alone would not provide enough complete information to make a selection of a candidate model.

Finally, the clusters were assessed qualitatively via BERTopic and bulk. A Streamlit dashboard was created for BERTopic topic modelling that facilitated the ability to quickly toggle between contenders and assess the quality of the topics produced. all-MiniLM-L12-v2 and all-mpnet-base-v2 showed the most coherent topics.

Around the same time of the evaluation phase in the project, a benchmarking piece, known as the Massive Text Embedding Benchmark or MTEB, was released on the huggingface blog. This benchmarking project detailed the performance of a range of text embedding models from the huggingface library (including two of our shortlist models: all-MiniLM-L12-v2, all-mpnet-base-v2) across different embedding tests. Each of the embedding test categories had a wide variety of datasets with short and long text inputs and different languages (see datasets MTEB page).

The MTEB clustering task shows all-mpnet-base-v2 performed well, reaching the top 5 out of 47 models. Meanwhile all-MiniLM-L12-v2 placed 12 out of 47 models and was the top performing model with the smallest embedding dimension size in this task. In the classification both models also performed similarly to each other. Nevertheless it is worth noting the majority of models tested in MTEB generate vast embedding dimensions. Although these larger embedding dimensions demonstrate good clustering/classification capability, unfortunately this factor makes these models much harder to implement within a commercial context. This is due to computational requirements needed for testing and running the model as well as storing the outputs.

We can use several different models, as shown on this webpage. Each has its own advantages and disadvantages in terms of size, speed, and performance compared to the other models. We'll use the same model from the official documentation for computing semantic similarity called "all-MiniLM-L6-v2." You may be interested in using the model "all-mpnet-base-v2" which has better performance but is five times slower than the one we'll use.

35fe9a5643

0 new messages