Integration of gensim to the Hugging Face Hub

126 views
Skip to first unread message

Omar Sanseviero

unread,
Jul 18, 2021, 5:08:25 AM7/18/21
to Gensim
Hi all,

At Hugging Face we are collaborating with open source libraries in the ecosystem such as spaCy, Sentence Transformers and more in order to implement integration in the models hub.

The idea is to make it as easy as possible for your users to share their models and get access to models from other members of the community. 

I think it would be very cool to have some integration with Gensim. Users would get:

- Free hosting of models
- Built-in file versioning
- Hosted Inference API and widgets to try out the models
- Code snippets, filters to find models, and other features to help with discoverability

We have documentation on the integration process, but in summary, we usually have 3 things
  • Downstream support: allow users to download pretrained models from the Hub
  • Upstream support: allow users to upload models to the Hub
  • Hub features: code snippets, widgets, Inference API, etc.
If it's of interest, I think beginning with the first and last points would be very cool. I imagine a couple of steps:
1. Create a gensim organization in hf.co 
2a. Upload two or three models to this organization. 
2b [optional] We can have a script that uses the GitHub API to create all models, but I would do this as a follow-up after doing 2a.
3. Opening a PR in gensim in order to modify the api to also allow loading from the Hub

I can do 2 and 3 (although let me know if you would prefer to do 2 to try the Hub). I can send a draft for 3. 

Happy to hear your thoughts,
Omar


Radim Řehůřek

unread,
Aug 5, 2021, 5:34:01 AM8/5/21
to Gensim
Thanks Omar! Both for the offer and also for your proof-of-concept implementation.

For anyone following along – Omar's PR triggered an internal discussion around the future of gensim-data. We'll likely be amending the way publishing models and corpora in gensim-data works, and this Hugging Face integration got caught in the middle.

So, we didn't drop the ball on the HF Hub integration. But seeing this through will likely take some time, to prevent wasted effort.

Best,
Radim

Omar Sanseviero

unread,
Aug 10, 2021, 4:06:56 PM8/10/21
to Gensim
Thank you for reviewing the proof of concept!

I think the points brought up in the discussion are quite interesting. I agree that finding the correct way to distributing the models and corpora is quite important. I would be glad to re-ignite this integration in a later future once that's done.

Cheers,
Omar

Reply all
Reply to author
Forward
0 new messages