Dear
BigARTM team!
Thanks for your product!
I try to build a model to mining some economic content. That is something similar to Murat Apishev and colleagues work (Mining Ethnic Content Online with Additively Regularized Topic Models). I have about 150000 documents and its dictionary is about 710000 terms (words and most common bigrams ). I also have dictionary of economic words that I want to mine (about 3000 terms: words and bigrams). In this economic dictionary I know to what topic each word belongs to. there are 22 topics.
I’ve built artm model (with modification of value for economic and non-economic words in the dictionary like Murat suggested; with most common regularizers). What I want to do is to add a-priori information about each economic word’s class. Classes are not intersected and unbalanced (some classes contain about 10-20 economic terms, but some about 600 economic terms).
Is that LabelRegularizationPhiRegularizer that can help? I saw examples how it works for documents classification tasks, but that does not work for me because this economic content that I want to mine is a small part of my collection and I am not interested in other part of collection.
Thanks in advance for any suggestions!
Very sincerely,
Maria
Dear Alex, thanks for your reply!
That is clear about custom dictionary - thanks for the detailed documentation! But the question was how to add to this ‘white list’ of words a-priori information about each word’s class. I applied DecorrelatorPhiRegularizer, it works, but not as I need. The idea is to tell the model that terms w11,…w1s1 always belong to class1, w21,…w2s2 always belong to class2,…, w221,…w22s22 always belong to class22.
That is something like this: http://www.machinelearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf p.35, section 5.2, see paragraph “Topic relevance data” on p.36.
Thanks for your time!
Very sincerely,
Maria
Hi,In this case you may use SmoothSparsePhi with custom dictionary, as describe here: http://docs.bigartm.org/en/stable/tutorials/python_userguide/regularizers_and_scores.html (look at the end of this tutorial, starting from "Let’s return to the dictionaries")Kind regardsAlex
--
You received this message because you are subscribed to the Google Groups "bigartm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/1399d5d4-d6cd-4c55-bc40-49aba94e15be%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/CACkLbjtO71FehQeZG6H9OFWU8aNRhb2q50EHCNuM%3DDqNOLaodg%40mail.gmail.com.