Visualzing topic models with pyLDAvis

3,139 views
Skip to first unread message

Ben Mabey

unread,
May 29, 2015, 5:51:04 PM5/29/15
to gen...@googlegroups.com

Hi all,
I just released a python library for interactive topic model visualization. It integrates with IPython notebook and has some helpers to quickly visualize gensim's LdaModel.


Check out the project page and the example notebooks on how to use it with gensim.

The gensim support right now is pretty basic. If you end up using it and find ways of improving it or expanding it (e.g. support the different model types) please send a pull request my way. :)

Happy topic modelling!

-Ben

Christopher S. Corley

unread,
May 30, 2015, 11:13:32 PM5/30/15
to gensim
Wow, this is great! I've used LDAvis in the past, and not having to break out of my gensim workflow to go to R is certainly welcomed.

Thanks so much for sharing!

Chris.

--
You received this message because you are subscribed to the Google Groups "gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Mabey

unread,
Jun 3, 2015, 1:43:31 AM6/3/15
to gen...@googlegroups.com


On Saturday, May 30, 2015 at 9:13:32 PM UTC-6, Christopher Corley wrote:
Wow, this is great! I've used LDAvis in the past, and not having to break out of my gensim workflow to go to R is certainly welcomed.

Hi Chris,
When you used LDAvis in the past did you use gensim to fit the model?  I ask because I have a small bug in how I extract data from a gensim model and corpus to feed it into the visualization. The bug causes problems in the visualization.. I'm not sure how bad but things are certainly off[1]. If you could send me how you have pulled out this data in the past I would appreciate it.

This is how I am pulling out the distributions and other needed information:
https://github.com/bmabey/pyLDAvis/blob/master/pyLDAvis/gensim.py#L13-L29

Thanks,
Ben


1. See the last three comments on this PR for an explanation of what is wrong with the gensim visualizations: https://github.com/cpsievert/LDAvis/issues/32
 

Radim Řehůřek

unread,
Jul 17, 2015, 7:08:03 AM7/17/15
to gen...@googlegroups.com, b...@benmabey.com
Hello Ben,

that's a really awesome package, I just tried it :)

I also used my own data preparation (not your gensim.py), but I think I can see where the bug is:

The term-topic matrix is ordered differently than your vocabulary:

The order of words in
doesn't match the order of words in

As a result, I guess you'd get random labels for the relevant tokens.

Re. extensions: it would be awesome to have a way of exploring the documents-topics relationship too -- same way you can now visualize terms-topics relationships. I don't think I can do this myself, but if you need assistance on the Python/gensim side, I'd be happy to help!

Best,
Radim

Ben Mabey

unread,
Jul 17, 2015, 11:31:13 AM7/17/15
to gen...@googlegroups.com
Thanks Radim!

On 7/17/15 5:08 AM, Radim Řehůřek wrote:
Hello Ben,

that's a really awesome package, I just tried it :)

I also used my own data preparation (not your gensim.py), but I think I can see where the bug is:

The term-topic matrix is ordered differently than your vocabulary:

The order of words in
doesn't match the order of words in

As a result, I guess you'd get random labels for the relevant tokens.


Okay, thanks for pointing that out. The way I was calculating those before was more explicit about vocab ordering so maybe I should revert to something like that: https://github.com/bmabey/pyLDAvis/blob/e228444dab02511d99ff985cfd5c478664c6e3be/pyLDAvis/gensim.py#L25

Would you mind sharing with me how you prepared your data or suggest the best (accurate and efficient) way of doing it?

As an aside, it turns out that the main problem that I was seeing in the gensim visualization is specific to the model fit and is revealing a (yet to be fixed bug) in [py]LDAvis (discussed here: https://github.com/cpsievert/LDAvis/issues/32).



Re. extensions: it would be awesome to have a way of exploring the documents-topics relationship too -- same way you can now visualize terms-topics relationships. I don't think I can do this myself, but if you need assistance on the Python/gensim side, I'd be happy to help!

Yes, I agree. The initial goal of LDAvis was never to be a corpus browser but I think it could maybe list the top documents (a title or snippet of each) per topic. That way we could still keep the data sent to the browser minimal and avoid the need for a server component. (Right now being able to ship a static file with the embedded JSON is a nice feature.) The original LDAvis paper, in the future work section, also makes a comment about visualizing correlations of topics to " provide insight into what is happening on the document level without actually displaying entire documents ".

If you have some concrete ideas feel free to add them to the project's GH issues page and tag it as an enhancement: https://github.com/bmabey/pyLDAvis/issues

Thanks again,
Ben


You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/9IESNxWNuqk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.

Akanksha Tiwari

unread,
Jul 21, 2015, 3:03:02 AM7/21/15
to gen...@googlegroups.com
Hello Ben,

I am a newbiw to topic modelling and I wanted to try the package but I am having some issues installing it on a windows machine that uses the gcc compiler provided with mingw. 

Actually, the problem is with the scikit-bio package that is a requirement. I tried installing the scikit bio package independantly , and realized that the problem was the sse2 extension was not being enabled so I explicitly enabled it in the setup.py of scikit-bio and the package was successfully installed. 

But, when installing pyldavis, it again seems to be compiling the c files in skbio.alignment._ssw_wrapper.c. I just wanted to know where in the code for pyldavis can I specify a compiler argument '-msse2' to enable sse2.

Regards,
Akanksha

Tolani Jaiye-Tikolo

unread,
Aug 11, 2015, 11:32:40 AM8/11/15
to gensim

Hi Ben,



I just tried visualizing the gensim LDA with  pyLDAvis . But unfortunately, I can't get it to display anything in the browser not in IPython notebook. Please find attached my example in this link.

Thanks
Tee

lanc...@gmail.com

unread,
Aug 7, 2017, 5:12:44 AM8/7/17
to gensim
Hi,your link is overdue,have you solved the problem?I can't use pyldavis to display my lda model.Every time it says the kernel has died,restarting.I wanna know why

在 2015年8月11日星期二 UTC+8下午11:32:40,Tolani Jaiye-Tikolo写道:

Ivan Menshikh

unread,
Aug 9, 2017, 2:12:31 AM8/9/17
to gensim
Hello,

Look at this and this tutorials

Hiba Aleqabie

unread,
Mar 15, 2018, 6:02:48 AM3/15/18
to gensim
Hello  Mr.Menshikh
 Can you share the tutorial you mentioned, the page has an error. not showing anything.
regards

Ivan Menshikh

unread,
Mar 15, 2018, 10:56:18 PM3/15/18
to gensim

Hiba Aleqabie

unread,
Mar 15, 2018, 11:05:04 PM3/15/18
to gen...@googlegroups.com
Thank you sir.

--
Reply all
Reply to author
Forward
Message has been deleted
0 new messages