Infer document topic distributions for new data on learned model

Viacheslav Seledkin

unread,

Mar 13, 2017, 4:12:12 AM3/13/17

to bigartm-users

Hello!

I am in process of making a golang wrapper https://github.com/vseledkin/goartm for bigartm and for now it supports loading learned model and inference on unseen documents via ArtmRequestTransformMasterModel

API method. My concern is: why result are varying from call to call? Results are somehow related to each other but sufficiently different. Do i have to maintain some additional parameters for MasterModelConfig like

NumDocumentPasses? which tried but no difference... I understand that process of inference in such models is a kind of sampling which is probabilistic in nature but having many passes/sampling cycles should stabilize result.

How to manage this?

Oleksandr Frei

unread,

Mar 14, 2017, 5:41:25 AM3/14/17

to Viacheslav Seledkin, bigartm-users

Hi Viacheslav,

ArtmRequestTransformMasterModel should return the same results from run to run, so what you describe sounds like a bug. May I take a look at your script that triggers this behavior?

Also, let's try to simplify few things:

1. If you use theta regularizers (SmoothSparseTheta or TopicSelectionTheta), could you please turn them off and let me know if you still observe non-deterministic behavior?

2. If you use MasterModelConfig.cache_theta = true or MasterModelConfig.reuse_theta = true, could you please try to switch off both options?

3. If you use ArtmInitializeModel, please set a fixed seed (for example InitializeModelArgs.seed = 123).

4. Do you observe non-deterministic results from ArtmRequestTransformMasterModel without import/export of the model? E.g. you infer the model with ArtmFitOfflineMasterModel (or ArtmFitOnlineMasterModel), and then immediately call ArtmRequestTransformMasterModel several times. Does this give expected result?

5. Do you use in-memory processing of batches (ImportBatchesArgs / ArtmDisposeBatch)?

6. Please confirm that you don't use ArtmRequestProcessBatches. This is an old (and somewhat more feature-complete) version of the ArtmTransformMasterModel, wich has a flag ProcessBatchesArgs.use_random_theta -- if this is set to True then non-deterministic behavior is expected. But I assume this is not your case.

If we don't find the the cause I'll try to reproduce this behavior and investigate.

Kind regards,

Alex

--
You received this message because you are subscribed to the Google Groups "bigartm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-users+unsubscribe@googlegroups.com.
To post to this group, send email to bigart...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/0f3b5438-4692-47c3-b379-9e020612eb83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Viacheslav Seledkin

unread,

Mar 14, 2017, 9:39:49 AM3/14/17

to Oleksandr Frei, bigartm-users

I found bug in my batch creation procedure (local batch dictionary). I processing batch protobuf messages manually since i do not have wrappers for batch processing API methods yet. Now everything works as expected! Thanks!

--

------------------------------------------------------------------------------------

Viacheslav Seledkin

11001010101011101 make soft not bugs 010101001001011

The real part of any non-trivial zero of the ζ(x) = Σ (n^-x) is 1/2.

Is it true?

Reply all

Reply to author

Forward