Add or delete topics?

212 views
Skip to first unread message

Svetozar Lashin

unread,
Nov 2, 2016, 12:26:00 PM11/2/16
to bigart...@googlegroups.com
Dear bigartm users,

I have two very simple questions, but either my math is not good enough to find the answers or there is something wrong with the description(s).

1. Is it possible to exclude some topics while using a trained model for categorizing  new texts?
2. Is it possible to use custom topics in the same situation, like, "here's a bunch of texts, this is a new topic, it is called custom_topic1"?

Best wishes,
Svetozar

Oleksandr Frei

unread,
Nov 2, 2016, 12:53:46 PM11/2/16
to Svetozar Lashin, bigartm-users
Hi

The answer depends on which API you use --- python API or BigARTM command-line interface?

1.  In Python API removing topics is possible, but not documented. I can provide an example. If you use command-line interface then this is not supported in BigARTM v0.8.1, but this pull request has already implemented this feature --- so, for sure it will be available in BigARTM v0.8.2.
2. It depends... To add new topics you'd have to define corresponding p(w|t) values in phi matrix. Would you like to use your p(w|t) values for new topics, or would add more topics and let BigARTM to infer them from some new documents?

Kind regards,
Alex


--
You received this message because you are subscribed to the Google Groups "bigartm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-users+unsubscribe@googlegroups.com.
To post to this group, send email to bigart...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/CA%2B221f5R8sm-iB_Qg%3D_NBCWYj8wwa3hez%3DwWrb8RqAOeanuP3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Svetozar Lashin

unread,
Nov 3, 2016, 3:39:59 AM11/3/16
to bigartm-users
Thank you, Alex.

1. Right now I am trying to use Python API, but it may change, it depends on my future goals. So, as soon as it is completely available, it will also be documented, right?

2. Rather the second option, I think.

Best,
Svetozar

Oleksandr Frei

unread,
Dec 9, 2016, 4:05:31 AM12/9/16
to bigartm-users
Hi Svetozar,

It was a long time ago since we discussed adding/removing topics from an existing model in BigARTM, but finally there is a solution (at master branch):

model = artm.ARTM(topic_names=...)
topic_names = list(model.topic_names)
del topic_names[5]   # Delete, let's say, topic at 5th position
model.reshape_topics(topic_names)

This will exclude column at 5th position from phi matrix. You may also use reshape_topics() to add or reorder columns. If you add columns they will be initialized with zeros.
Note that there is also a setter for ARTM.topic_names field, but it only allows you to rename existing topic names -- e.g. the list of topic names must be of the same length, and the will be set as new labels for existing columns (no add/delete or reorder in phi matrix).

This is documented here:
And here:

If you add new columns you'll have to initialized them manually via model.master.attach_model() . So, there still is no out-of-the-box solution in BigARTM to infer new topics from new documents, but I know few people who are looking into this and hopefully came up with solution.

Kind regards,
Alex
Reply all
Reply to author
Forward
0 new messages