estimating hyperparameters

1,967 views
Skip to first unread message

Skipper Seabold

unread,
Oct 12, 2011, 7:40:33 PM10/12/11
to gen...@googlegroups.com
Hi,

I'm pretty new to topic modelling, but am learning as I go. So far I
like gensim a lot. Thanks.

Quick question, I ran David Blei's lda-c code over some data as well
as gensim's LdaModel with update_every=0 (I expect a lot of topic
drift). IIUC, this option for batch lda should be very similar to the
lda-c code except that it's computationally a bit more efficient? My
inferred topics were a little different, however, and I'm wondering if
it could be because of the concentration parameters used or maybe just
that my topics aren't all that robust?

Blei's lda-c code has the option to estimate the concentration
parameter alpha via maximum likelihood, which I used, but gensim
doesn't look like it can. Is this something that's desirable in gensim
and just hasn't been implemented or is there something else keeping it
from being used? I'd be happy to code it up and submit a pull request,
if this would be helpful.

Cheers,

Skipper

Radim

unread,
Oct 13, 2011, 1:09:50 PM10/13/11
to gensim
Hi Skipper,


> Quick question, I ran David Blei's lda-c code over some data as well
> as gensim's LdaModel with update_every=0 (I expect a lot of topic
> drift). IIUC, this option for batch lda should be very similar to the
> lda-c code except that it's computationally a bit more efficient? My

similar yes, but not the same. The code in gensim is based on Matt
Hoffman's online-lda package (links are in the source). If I recall
correctly, Blei's code estimates the word-topic distributions directly
(and calls it beta), whereas online lda puts a dirichlet prior on it
(parametrized by eta).

> inferred topics were a little different, however, and I'm wondering if
> it could be because of the concentration parameters used or maybe just
> that my topics aren't all that robust?

Both alpha and eta hyper-parameters affect the resulting topics. As
well as the decay parameter in online version. In theory, the algo
always converges to the same solution. In practice, one usually stop
training well before complete convergence, so the results can differ.


> Blei's lda-c code has the option to estimate the concentration
> parameter alpha via maximum likelihood, which I used, but gensim
> doesn't look like it can. Is this something that's desirable in gensim
> and just hasn't been implemented or is there something else keeping it
> from being used? I'd be happy to code it up and submit a pull request,
> if this would be helpful.

It is desirable -- getting rid of free parameters is always a good
thing :)

This function used to exist in older versions of gensim -- the ones
based directly on Blei's LDA-C, before I switched to online-lda. See
`optAlpha` in http://trac.assembla.com/gensim/browser/tags/release-0.7.6/src/gensim/models/ldamodel.py

But I never found it useful: in my experience, alpha just kept getting
smaller and smaller with each iteration, and never seemed to converge
(in both LDA-C and my re-implementation). More importantly, the topics
coming out of the onlina-lda algorithm seemed more coherent and made
more sense, which, frankly, was more important to me than optimizing
some likelihood quantity. So I never looked back.

TL;DR: please share your experience with alpha auto-tuning. And with
topics coming out of lda-c vs. online-lda. If your auto-tune
implementation behaves well and is helpful, a pull request will be
most welcome!

Cheers,
Radim

Skipper Seabold

unread,
Oct 13, 2011, 4:35:50 PM10/13/11
to gen...@googlegroups.com
On Thu, Oct 13, 2011 at 1:09 PM, Radim <radimr...@seznam.cz> wrote:
> Hi Skipper,
>> Quick question, I ran David Blei's lda-c code over some data as well
>> as gensim's LdaModel with update_every=0 (I expect a lot of topic
>> drift). IIUC, this option for batch lda should be very similar to the
>> lda-c code except that it's computationally a bit more efficient? My
>
> similar yes, but not the same. The code in gensim is based on Matt
> Hoffman's online-lda package (links are in the source). If I recall
> correctly, Blei's code estimates the word-topic distributions directly
> (and calls it beta), whereas online lda puts a dirichlet prior on it
> (parametrized by eta).

Ah, ok.


>> inferred topics were a little different, however, and I'm wondering if
>> it could be because of the concentration parameters used or maybe just
>> that my topics aren't all that robust?
>
> Both alpha and eta hyper-parameters affect the resulting topics. As
> well as the decay parameter in online version. In theory, the algo
> always converges to the same solution. In practice, one usually stop
> training well before complete convergence, so the results can differ.

If I may ask one more question, what do these lines in my log indicate?

"132/1000 documents converged within 50 iterations"

Correct me if I'm wrong, but this suggests to me that the the
intermediate optimization of the variational parameters wasn't
actually all that good here and therefore my per-document posteriors
over topics aren't all that good. Blei et. al [2003] suggests that the
number of iterations required here is usually on the order of the
number of words in the document, and the Hoffman source uses 100
instead of 50. Would it make sense to change the VAR_MAXITER to #
words + 50 for better performance in my case? My document sizes are
all over the place.

>
>> Blei's lda-c code has the option to estimate the concentration
>> parameter alpha via maximum likelihood, which I used, but gensim
>> doesn't look like it can. Is this something that's desirable in gensim
>> and just hasn't been implemented or is there something else keeping it
>> from being used? I'd be happy to code it up and submit a pull request,
>> if this would be helpful.
>
> It is desirable -- getting rid of free parameters is always a good
> thing :)
>
> This function used to exist in older versions of gensim -- the ones
> based directly on Blei's LDA-C, before I switched to online-lda. See
> `optAlpha` in http://trac.assembla.com/gensim/browser/tags/release-0.7.6/src/gensim/models/ldamodel.py
>

Good to know. Will have a look.

> But I never found it useful: in my experience, alpha just kept getting
> smaller and smaller with each iteration, and never seemed to converge
> (in both LDA-C and my re-implementation). More importantly, the topics
> coming out of the onlina-lda algorithm seemed more coherent and made
> more sense, which, frankly, was more important to me than optimizing
> some likelihood quantity. So I never looked back.

Agreed on practicality over purity. Running the lda-c on my corpus
did, however, seem to converge to a reasonable value, which I used in
the online-lda.

> TL;DR: please share your experience with alpha auto-tuning. And with
> topics coming out of lda-c vs. online-lda. If your auto-tune
> implementation behaves well and is helpful, a pull request will be
> most welcome!

Thanks for the detailed answer. My intuition is not yet very strong
wrt the algorithms and why my topics would differ, so I appreciate the
guidance.

Aside: What I'm most interested in is speed right now, and I do have
access to a cluster; hence my interest in gensim. If I can find the
time, I was thinking of doing some profiling and trying to push some
of the loops down into Cython and calling dgemm directly using tokyo
[1]. It's been my experience that repeated calls to numpy.dot even
with ATLAS BLAS add quite a bit of overhead. Will report on my
progress if I ever make any.

Skipper

[1] Up to date fork: https://github.com/wesm/tokyo

Radim

unread,
Oct 14, 2011, 12:25:16 PM10/14/11
to gensim
Hi Skipper,

> If I may ask one more question, what do these lines in my log indicate?
>
> "132/1000 documents converged within 50 iterations"

Depends where in the log do you see them: if near the beginning of
training, it's ok. If near the end, something went wrong.


> Correct me if I'm wrong, but this suggests to me that the the
> intermediate optimization of the variational parameters wasn't
> actually all that good here and therefore my per-document posteriors
> over topics aren't all that good. Blei et. al [2003] suggests that the
> number of iterations required here is usually on the order of the
> number of words in the document, and the Hoffman source uses 100
> instead of 50. Would it make sense to change the VAR_MAXITER to #
> words + 50 for better performance in my case? My document sizes are
> all over the place.

Sure, you can increase that param to whatever value you like. The
reason why I set it to a smaller value is that in online training, the
convergence happens over many smaller mini-batches. With large
corpora, there will be many mini-batches, and there's not much point
training on any particular document excessively. If a document is
important, it (=its latent structure) will appear in the stream again
later, boosting its impact. Why 50 and not 100? I tried both and
didn't see much difference in results (=just another magic constant).

This reasoning might not apply to batch training where the entire
corpus is used for each training iteration. Here underutilizing
available info in an update is costly, because each update is a
monumental sweep through the entire training set (ala LDA-C).
Interestingly, the online algo with mini-batches can converge faster
than the batch algo -- exactly because it can use the information
earlier, helping in updates down the stream (see the Hoffman's NIPS
article).


> Agreed on practicality over purity. Running the lda-c on my corpus
> did, however, seem to converge to a reasonable value, which I used in
> the online-lda.

Oh, interesting. What value was that? Can you post some overall
statistics about your training corpus/training params? There's much
talk of LDA in literature, but practical results are hard to come by.

Best,
Radim

Skipper Seabold

unread,
Oct 14, 2011, 12:54:45 PM10/14/11
to gen...@googlegroups.com
On Fri, Oct 14, 2011 at 12:25 PM, Radim <radimr...@seznam.cz> wrote:
Hi Skipper,

> If I may ask one more question, what do these lines in my log indicate?
>
> "132/1000 documents converged within 50 iterations"

Depends where in the log do you see them: if near the beginning of
training, it's ok. If near the end, something went wrong.


That was near the "end," though perhaps this indicates I need more passes?
 

This was my intuition as well. As a test, I raised it to max(100, # of words in document) and it didn't seem to make much difference in the early passes before I killed it.
 

> Agreed on practicality over purity. Running the lda-c on my corpus
> did, however, seem to converge to a reasonable value, which I used in
> the online-lda.

Oh, interesting. What value was that? Can you post some overall
statistics about your training corpus/training params? There's much
talk of LDA in literature, but practical results are hard to come by.


I have about 25k documents with about 700k unique words. I'm using a vocabularly of the top 10k by tf-idf. Right now I'm doing two things - using LDA to try and identify a first pass at the latent structure of topics, do some more stop word pruning, and to improve my intuition about the algorithms and models. We expect that ultimately the structure of the topics is more complex. I'm new to this stuff though obviously.

I ran the lda-c code using 50 topics for 30 passes with 100 iterations of the em step and let it estimate alpha. I gave a starting value of 1/50, but it looks like it might have used alpha=1, judging by the logs. After about 15 passes it converged to an alpha of ~.038 and stayed there. The problem is that this first run took 60 hours on my machine. This is how I came to online-lda and settled on gensim + pyro, though I actually found gensim during the data cleaning stage. Very clear OO approach and python generators for the win. Cheers on some good work and very helpful code. Any other info I can provide let me know.

In any event, maybe I'm too worried about topic drift (the texts span a good deal of time) and maybe I should just run online lda.

Skipper


Skipper Seabold

unread,
Oct 15, 2011, 4:53:53 PM10/15/11
to gen...@googlegroups.com

This is incorrect. I had a bug in my change (summed word count over
wrong axis). Now on the first pass I get convergence for 40-90% of my
documents for the variational parameters, which is a huge improvement
from 5-25% convergence.

>>
>> > Agreed on practicality over purity. Running the lda-c on my corpus
>> > did, however, seem to converge to a reasonable value, which I used in
>> > the online-lda.
>>
>> Oh, interesting. What value was that? Can you post some overall
>> statistics about your training corpus/training params? There's much
>> talk of LDA in literature, but practical results are hard to come by.
>>
>
> I have about 25k documents with about 700k unique words. I'm using a
> vocabularly of the top 10k by tf-idf. Right now I'm doing two things - using
> LDA to try and identify a first pass at the latent structure of topics, do
> some more stop word pruning, and to improve my intuition about the
> algorithms and models. We expect that ultimately the structure of the topics
> is more complex. I'm new to this stuff though obviously.
>

One other stat is that I have rather long documents. Mean word count
is 8k with a max count of 800k. This could be why I needed to increase
the number of iterations. If you're interested in my change, I can
make it an option to LdaModel and push it.

Radim

unread,
Oct 15, 2011, 6:10:04 PM10/15/11
to gensim
On Oct 15, 10:53 pm, Skipper Seabold <jsseab...@gmail.com> wrote:
> > I have about 25k documents with about 700k unique words. I'm using a
> > vocabularly of the top 10k by tf-idf. Right now I'm doing two things - using
> > LDA to try and identify a first pass at the latent structure of topics, do
> > some more stop word pruning, and to improve my intuition about the
> > algorithms and models. We expect that ultimately the structure of the topics
> > is more complex. I'm new to this stuff though obviously.
>
> One other stat is that I have rather long documents. Mean word count
> is 8k with a max count of 800k. This could be why I needed to increase
> the number of iterations. If you're interested in my change, I can
> make it an option to LdaModel and push it.

Interesting dataset you have :) This is all using the batch LDA algo,
right? (not online mini-batches)

There I'm in favour of increasing the default internal accuracy
parameters (self.VAR_MAXITER and self.VAR_THRESH), as per earlier
discussion. But changing one magic value to another (50 to 100) seems
somehow unsatisfactory... can you think of a better way to get rid of
them?

With the batch algo, we're allowed to go over the training data
multiple times, so maybe we can sacrifice one pass (or a part of it),
and estimate some sensible defaults, automatically?

Radim

Skipper Seabold

unread,
Oct 17, 2011, 4:43:37 PM10/17/11
to gen...@googlegroups.com
On Sat, Oct 15, 2011 at 6:10 PM, Radim <radimr...@seznam.cz> wrote:
> On Oct 15, 10:53 pm, Skipper Seabold <jsseab...@gmail.com> wrote:
>> > I have about 25k documents with about 700k unique words. I'm using a
>> > vocabularly of the top 10k by tf-idf. Right now I'm doing two things - using
>> > LDA to try and identify a first pass at the latent structure of topics, do
>> > some more stop word pruning, and to improve my intuition about the
>> > algorithms and models. We expect that ultimately the structure of the topics
>> > is more complex. I'm new to this stuff though obviously.
>>
>> One other stat is that I have rather long documents. Mean word count
>> is 8k with a max count of 800k. This could be why I needed to increase
>> the number of iterations. If you're interested in my change, I can
>> make it an option to LdaModel and push it.
>
> Interesting dataset you have :) This is all using the batch LDA algo,
> right? (not online mini-batches)

Yes, this is the batch algorithm.

> There I'm in favour of increasing the default internal accuracy
> parameters (self.VAR_MAXITER and self.VAR_THRESH), as per earlier
> discussion. But changing one magic value to another (50 to 100) seems
> somehow unsatisfactory... can you think of a better way to get rid of
> them?

Let the user set them?

Default would be current defaults:
LdaModel(..., var_maxiter = 50, var_thresh = 1e-3)

Optionally:
LdaModel(..., var_maxiter = None, var_thresh = 1e-5)

Estimates the maxiter based on per document word count? Or this
behavior is var_maxiter='est' to be more explicit and let None
literally iterate until var_thresh is met or some really big upper
bound is hit?

>
> With the batch algo, we're allowed to go over the training data
> multiple times, so maybe we can sacrifice one pass (or a part of it),
> and estimate some sensible defaults, automatically?
>

Right, I'm currently iterating over mine around 20 times just to be on
the safe side. Ideally would like to check how much change there are
in the last few iterations.

I just added a line or two in the inference method.

if update_every == 0: # sometimes I thought I saw it changed to 1?
VAR_MAXITER = max(100, int(1.25*sum(doc, axis=0)[1]))

(I've imported all the methods from numpy to avoid getattribute
overhead ... hopeful micro-optimizations)

Though this chokes if it gets an empty document (user error?). Of
course it could be saved after the first pass and used again, though I
doubt there's too much difference in practice between summing again vs
getting and slicing an attribute list/array.

Skipper

Radim

unread,
Oct 19, 2011, 3:14:51 PM10/19/11
to gensim
> Let the user set them?
>
> Default would be current defaults:
> LdaModel(..., var_maxiter = 50, var_thresh = 1e-3)
>
> Optionally:
> LdaModel(..., var_maxiter = None, var_thresh = 1e-5)

Well the user can already set them, via self.VAR_MAXITER and
self.VAR_THRESH :)

But the truth is, the parameter list to LDA is already too long and
unwieldy. We could create a config object instead of passing params
around like this, but that feels ungensim-like. If there is no way to
make LDA robust enough (with automatic parameter resolution or
otherwise), then I'd sooner remove LDA from gensim, as the algo is
obviously "not for humans" (gensim's motto). `eta` and `alpha` are
already a sore in my eye.


> Estimates the maxiter based on per document word count? Or this
> behavior is var_maxiter='est' to be more explicit and let None
> literally iterate until var_thresh is met or some really big upper
> bound is hit?

Sounds good. I'll open a github issue for setting these params
automatically. Thanks for bringing this up,

Radim

Skipper Seabold

unread,
Oct 19, 2011, 3:24:41 PM10/19/11
to gen...@googlegroups.com
On Wed, Oct 19, 2011 at 3:14 PM, Radim <radimr...@seznam.cz> wrote:
>> Let the user set them?
>>
>> Default would be current defaults:
>> LdaModel(..., var_maxiter = 50, var_thresh = 1e-3)
>>
>> Optionally:
>> LdaModel(..., var_maxiter = None, var_thresh = 1e-5)
>
> Well the user can already set them, via self.VAR_MAXITER and
> self.VAR_THRESH :)
>
> But the truth is, the parameter list to LDA is already too long and
> unwieldy. We could create a config object instead of passing params
> around like this, but that feels ungensim-like. If there is no way to
> make LDA robust enough (with automatic parameter resolution or
> otherwise), then I'd sooner remove LDA from gensim, as the algo is
> obviously "not for humans" (gensim's motto). `eta` and `alpha` are
> already a sore in my eye.

Ha, ok. I'm probably -1 on a config file option. I wonder if you want
to drop LDA if it might be able to find a home in scikit-learn, or if
it's too domain specific. If you want to keep it, maybe it would make
sense to split up the model instantiation from the training - and you
can split the options over the two methods __init__ and train.

>
>> Estimates the maxiter based on per document word count? Or this
>> behavior is var_maxiter='est' to be more explicit and let None
>> literally iterate until var_thresh is met or some really big upper
>> bound is hit?
>
> Sounds good. I'll open a github issue for setting these params
> automatically. Thanks for bringing this up,

I have a branch here with the changes already plus a few other
optimizations, so you can see if it's too much with the params.

https://github.com/jseabold/gensim/tree/speedup-olda

Timmy Wilson

unread,
Oct 19, 2011, 4:28:26 PM10/19/11
to gen...@googlegroups.com
> I'd sooner remove LDA from gensim, as the algo is
> obviously "not for humans" (gensim's motto). `eta` and `alpha` are
> already a sore in my eye.

There really should be a distributed, general, python library for
probabilistic modeling

Pymc is close, but not distributed

I'm just learning probabilistic modeling, and can attest to it not
being for humans, but it's certainly powerful and useful for a broad
range of problems!!

Norvig does a nice job of underlining the importance of statistical
modeling here -- http://norvig.com/chomsky.html

Radim -- i would love to see gensim build out the LDA + topic modeling stuff :]

Radim

unread,
Oct 20, 2011, 6:24:16 PM10/20/11
to gensim
On Oct 19, 10:28 pm, Timmy Wilson <tim...@smarttypes.org> wrote:
>
> Radim -- i would love to see gensim build out the LDA + topic modeling stuff :]


Yeah me too. Gensim is open-source... so you know what's coming:

If you want it, make it :)

Radim

>
>
>
>
>
>
>
> On Wed, Oct 19, 2011 at 3:24 PM, Skipper Seabold <jsseab...@gmail.com> wrote:

Timmy Wilson

unread,
Oct 20, 2011, 8:04:59 PM10/20/11
to gen...@googlegroups.com
Sounds good ;]

i'm learning --

just found this tutorial which was helpful in general --
http://videolectures.net/ecmlpkdd09_park_sldair/

one thing i'm not sure of -- he claims making alpha asymmetric causes
fitting to take longer

i assume that's true -- but i think alpha;s still useful for
increasing/decreasing term-topic sparsity

ie -- lowering alpha increases sparsity

i think that's correct --- ????

it seems to jive w/ this:

> In theory, the algo always converges to the same solution.
> In practice, one usually stop training well before complete
> convergence, so the results can differ.

and this:

https://lists.cs.princeton.edu/pipermail/topic-models/2011-October/001603.html

if that's correct -- it's worth noting that there are benefits to
choosing sparsity at the expense of precision --
http://www.youtube.com/watch?v=ZmNOAtZIgIk

Radim

unread,
Oct 23, 2011, 1:23:58 PM10/23/11
to gensim
Hi Timmy,

On Oct 21, 2:04 am, Timmy Wilson <tim...@smarttypes.org> wrote:
> Sounds good ;]
>
> i'm learning --
>
> just found this tutorial which was helpful in general --http://videolectures.net/ecmlpkdd09_park_sldair/
>
> one thing i'm not sure of -- he claims making alpha asymmetric causes
> fitting to take longer
>
> i assume that's true -- but i think alpha;s still useful for
> increasing/decreasing term-topic sparsity
>
> ie -- lowering alpha increases sparsity
>
> i think that's correct --- ????
>
> it seems to jive w/ this:
>
> > In theory, the algo always converges to the same solution.
> > In practice, one usually stop training well before complete
> > convergence, so the results can differ.
>
> and this:
>
> https://lists.cs.princeton.edu/pipermail/topic-models/2011-October/00...
>
> if that's correct -- it's worth noting that there are benefits to
> choosing sparsity at the expense of precision --http://www.youtube.com/watch?v=ZmNOAtZIgIk


sparsity and convergence (precision) are two orthogonal issues -- you
can train inaccurate sparse representation and vice versa.

Re. sparsity: gensim actually supports asymmetric alpha -- you can
set it to a vector, instead of a single number! That support is still
experimental (=untested ;-) but I had been planning to test it out for
a long time, and gain some insight into how it works. An asymmetric
prior makes more sense imo (=not all topics are equally likely, I
imagine something like Zipf's law holds here as well), and could lead
to better results. See also "why priors matter", by Wallach and Mimno.

If you want to work on this part of gensim, please let me know. I'd be
grateful for experiments and improvements.

Cheers,
Radim

Timmy Wilson

unread,
Oct 25, 2011, 4:44:36 PM10/25/11
to gen...@googlegroups.com
> If you want to work on this part of gensim, please let me know. I'd be
> grateful for experiments and improvements.

I'm excited to experiment/play-with/bend the model

I'm still messing w/ my db + data collection routines (the boring stuff)

Next on my list is spending more time w/ gensim + the new 'Similarity Server'

Radim

unread,
Oct 26, 2011, 1:58:36 PM10/26/11
to gensim
On Oct 13, 7:09 pm, Radim <radimrehu...@seznam.cz> wrote:
> But I never found it useful: in my experience, alpha just kept getting
> smaller and smaller with each iteration, and never seemed to converge
> (in both LDA-C and my re-implementation). More importantly, the topics

Apparently other ppl had a similar experience re. fitting of alpha:
https://lists.cs.princeton.edu/pipermail/topic-models/2011-October/001628.html

Interesting.

Radim

Skipper Seabold

unread,
Nov 11, 2011, 5:35:59 PM11/11/11
to gen...@googlegroups.com
On Thu, Oct 13, 2011 at 1:09 PM, Radim <radimr...@seznam.cz> wrote:

Hi Radim,

Can you (or someone) sanity check me here? Looking at the lda-c code,
I see that f here

http://trac.assembla.com/gensim/browser/tags/release-0.7.6/src/gensim/models/ldamodel.py#L414

is the same as their code. Both look correct for alpha as a scalar
*except* the last term. According to A.4.2 in Blei, Ng, and Jordan,
this should be

term3 = np.sum(np.sum((alpha-1)*(gammaln(gamma) -
gammaln(np.sum(gamma,1)[:,None])), axis=1))

which is the same if alpha is a scalar or a 1d array. This does *not* equal

(alpha - 1) * alpha_suff_stats

for alpha is a scalar, unless I am missing something.

Any thoughts?

Thanks,

Skipper

Skipper Seabold

unread,
Nov 11, 2011, 6:38:23 PM11/11/11
to gen...@googlegroups.com

Doh. gammaln should be digamma. Sorry for the noise.

Radim

unread,
Nov 14, 2011, 4:52:16 PM11/14/11
to gensim
Yeah, the connection between the sufficient statistics and digammas is
explained in section A.1 of that same paper (pure magic!). Gensim
doesn't use that code anymore, but if you find any errors or problems,
please do tell.

Best,
Radim



On Nov 12, 12:38 am, Skipper Seabold <jsseab...@gmail.com> wrote:
> On Fri, Nov 11, 2011 at 5:35 PM, Skipper Seabold <jsseab...@gmail.com> wrote:
> >> `optAlpha` inhttp://trac.assembla.com/gensim/browser/tags/release-0.7.6/src/gensim...
>
> > Hi Radim,
>
> > Can you (or someone) sanity check me here? Looking at the lda-c code,
> > I see that f here
>
> >http://trac.assembla.com/gensim/browser/tags/release-0.7.6/src/gensim...

Skipper Seabold

unread,
Nov 14, 2011, 7:57:07 PM11/14/11
to gen...@googlegroups.com
On Mon, Nov 14, 2011 at 4:52 PM, Radim <radimr...@seznam.cz> wrote:
> Yeah, the connection between the sufficient statistics and digammas is
> explained in section A.1 of that same paper (pure magic!). Gensim
> doesn't use that code anymore, but if you find any errors or problems,
> please do tell.
>

Looks good to me. I've brought the code back in in a branch in my
fork, except I'm just using a solver from scipy.optimize after doing
inference on all the documents instead of trying to roll my own (bfgs
takes a few seconds at best with one parameter). It seems to be
working very well on the code I've got running. I'm only using it for
batch updating though, I haven't added the online updates yet.

Radim

unread,
Nov 15, 2011, 6:36:11 AM11/15/11
to gensim
> Looks good to me. I've brought the code back in in a branch in my
> fork, except I'm just using a solver from scipy.optimize after doing
> inference on all the documents instead of trying to roll my own (bfgs
> takes a few seconds at best with one parameter). It seems to be

Hah, I didn't even know it was there. Scipy is like an RPG game --
full of hidden goodies and treasures.

-rr
Reply all
Reply to author
Forward
0 new messages