Hello!
We know BM25 ranking function is derived from the "classic probabilistic model retrieval" but, because the formula looks like a "TF-IDF" one, is sometimes considered a VSM (or BoW) ranking function... or at least is what I understood (I'm a computer science student getting deeper in Information Retrieval, so sorry if I'm not following some "basic" theory...).
I can see in Gensim, BM25 is implemented like a standalone ranking function: It's not part of the TFIDF module. My problem is the following: a BM25 object takes a list as a corpus and not a generator. This brings me obvious problems when I want to initialize a BM25 object with large collection of documents that do not fit in memory.
So, is there straightforward solution to this that I am not considering right now? Or is that Gensim does not consider large collection of documents in BM25 at the moment?
Thanks in advance!! If I'm not following something, some materials (papers) about this topic would be great!!