BM25 ranking function

164 views
Skip to first unread message

Luciano Perezzini

unread,
Apr 3, 2019, 5:29:11 PM4/3/19
to Gensim
Hello!

We know BM25 ranking function is derived from the "classic probabilistic model retrieval" but, because the formula looks like a "TF-IDF" one, is sometimes considered a VSM (or BoW) ranking function... or at least is what I understood (I'm a computer science student getting deeper in Information Retrieval, so sorry if I'm not following some "basic" theory...).

I can see in Gensim, BM25 is implemented like a standalone ranking function: It's not part of the TFIDF module. My problem is the following: a BM25 object takes a list as a corpus and not a generator. This brings me obvious problems when I want to initialize a BM25 object with large collection of documents that do not fit in memory.

So, is there straightforward solution to this that I am not considering right now? Or is that Gensim does not consider large collection of documents in BM25 at the moment?

Thanks in advance!! If I'm not following something, some materials (papers) about this topic would be great!!

Radim Řehůřek

unread,
Apr 4, 2019, 2:46:51 AM4/4/19
to Gensim
If it's true that BM25 needs a list and doesn't support a generator, I'd consider that a bug. Can you file a reproducible ticket at https://github.com/RaRe-Technologies/gensim/issues? We'll have a look.

Cheers,
Radim

Luciano Perezzini

unread,
Apr 4, 2019, 8:06:08 AM4/4/19
to Gensim
Thanks for the response, Radim! I'll file the issue.
Reply all
Reply to author
Forward
0 new messages