Infer document topic distributions for new data on learned model

47 views
Skip to first unread message

Viacheslav Seledkin

unread,
Mar 13, 2017, 4:12:12 AM3/13/17
to bigartm-users
Hello!

I am in process of making a golang wrapper https://github.com/vseledkin/goartm for bigartm and for now it supports loading learned model and inference on unseen documents via ArtmRequestTransformMasterModel 
API method. My concern is: why result are varying from call to call? Results are somehow related to each other but sufficiently different. Do i have to maintain some additional parameters for MasterModelConfig like 
NumDocumentPasses? which tried but no difference... I understand that process of inference in such models is a kind of sampling which is probabilistic in nature but having many passes/sampling cycles should stabilize result. 
How to manage this?

Oleksandr Frei

unread,
Mar 14, 2017, 5:41:25 AM3/14/17
to Viacheslav Seledkin, bigartm-users
Hi Viacheslav,

ArtmRequestTransformMasterModel should return the same results from run to run, so what you describe sounds like a bug. May I take a look at your script that triggers this behavior?

Also, let's try to simplify few things:

1. If you use theta regularizers (SmoothSparseTheta or TopicSelectionTheta), could you please turn them off and let me know if you still observe non-deterministic behavior?
2. If you use MasterModelConfig.cache_theta = true or MasterModelConfig.reuse_theta = true, could you please try to switch off both options?
3. If you use ArtmInitializeModel, please set a fixed seed (for example InitializeModelArgs.seed = 123).
4. Do you observe non-deterministic results from ArtmRequestTransformMasterModel without import/export of the model? E.g. you infer the model with ArtmFitOfflineMasterModel (or ArtmFitOnlineMasterModel), and then immediately call ArtmRequestTransformMasterModel  several times. Does this give expected result?
5. Do you use in-memory processing of batches (ImportBatchesArgs / ArtmDisposeBatch)?
6. Please confirm that you don't use ArtmRequestProcessBatches. This is an old (and somewhat more feature-complete) version of the ArtmTransformMasterModel, wich has a flag ProcessBatchesArgs.use_random_theta -- if this is set to True then non-deterministic behavior is expected. But I assume this is not your case.

If we don't find the the cause I'll try to reproduce this behavior and investigate.

Kind regards,
Alex


--
You received this message because you are subscribed to the Google Groups "bigartm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigartm-users+unsubscribe@googlegroups.com.
To post to this group, send email to bigart...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigartm-users/0f3b5438-4692-47c3-b379-9e020612eb83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Viacheslav Seledkin

unread,
Mar 14, 2017, 9:39:49 AM3/14/17
to Oleksandr Frei, bigartm-users
I found bug in my batch creation procedure (local batch dictionary). I processing batch protobuf messages manually since i do not have wrappers for batch processing API methods yet. Now everything works as expected!  Thanks! 
--

------------------------------------------------------------------------------------

Viacheslav Seledkin

11001010101011101 make soft not bugs 010101001001011

The real part of any non-trivial zero of the ζ(x) =  Σ (n^-x) is 1/2. 

Is it true?

Reply all
Reply to author
Forward
0 new messages