Epoch Logger for LDA Modell

36 views
Skip to first unread message

Felix Selgert

unread,
Jun 29, 2024, 4:16:20 PM6/29/24
to Gensim
Hi All,
I want to build an Epoch Logger for gensims LDA model from the blueprint of this example: https://radimrehurek.com/gensim/models/callbacks.html
I tried to replace the "CallbackAny2Vec" class with a callback class that is supported by the LDA Model (Perplexity, Coherence, etc.) but I always receive an Attribute Error:
'EpochLogger' object has no attribute 'logger'

Has anyone an idea if it is at all possible to have a simple Epoch Logger that prints information on the start and end on a training epoch.

Many thanks,
Felix
 

Gordon Mohr

unread,
Jul 1, 2024, 3:37:15 AM7/1/24
to Gensim
Unfortunately the docs page you've linked isn't as clear as it probably should be about differing behaviors/requirements/examples for different models. Still, from a quick glance at it & related source code, I think what you want to do should be possible in just a few lines of code. 

Without seeing your code or the full error traceback, it's hard to guess what might be going wrong. (The error message excerpt you've shown implies some code may be trying  a `logger` variable that's not defined.) Can you show your `EpochLogger` code & the full traceback you're getting?

- Gordon

Felix Selgert

unread,
Jul 11, 2024, 2:00:54 PM7/11/24
to gen...@googlegroups.com

Dear Gordon, thank you for taking this. I appreciate your help. This is my code:

class EpochLogger(CoherenceMetric):
    
    def __init__(self):
        self.epoch = 0

    def on_epoch_begin(self, model):
        print("Epoch #{} start".format(self.epoch))

    def on_epoch_end(self, model):
        print("Epoch #{} end".format(self.epoch))
        self.epoch += 1


id2word = dictionary
corpus = bow_corpus
epoch_logger = EpochLogger()

lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
                                                id2word=id2word,
                                                num_topics=10, # Anzahl der topics
                                                random_state=100, # random state zum reproduzieren der Ergebnisse
                                                chunksize=1000, # Anzahl Dokumente, die auf ein Mal in den Arbeitsspeicher geladen werden  
                                                update_every=1, # Anzahl von chunks, die auf ein Mal bearbeitet werden.
                                                eval_every=10, # Schätzung der log-perplexity (Maß für Modellgüte)
                                                passes=10, # Anzahl wie häufig der Alg. durch den gesamten Datensatz geht
                                                alpha='symmetric', # prior für die Dokumenten-Topic Verteilung
                                                eta ='symmetric',  #  prior für die Topic-Wort Verteilung
                                                iterations=100, # Anzahl der Iterationen
                                                per_word_topics=True, #  Wenn auf True gesetzt berechnet das Modell auch eine Liste mit wahrscheinlichsten Topics für jedes Wort
                                                callbacks=[epoch_logger])


And this is the full traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[16], line 6
      3 corpus = bow_corpus
      4 epoch_logger = EpochLogger()
----> 6 lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
      7                                                 id2word=id2word,
      8                                                 num_topics=10, # Anzahl der topics
      9                                                 random_state=100, # random state zum reproduzieren der Ergebnisse
     10                                                 chunksize=1000, # Anzahl Dokumente, die auf ein Mal in den Arbeitsspeicher geladen werden  
     11                                                 update_every=1, # Anzahl von chunks, die auf ein Mal bearbeitet werden. 
     12                                                 eval_every=10, # Schätzung der log-perplexity (Maß für Modellgüte)
     13                                                 passes=10, # Anzahl wie häufig der Alg. durch den gesamten Datensatz geht
     14                                                 alpha='symmetric', # prior für die Dokumenten-Topic Verteilung
     15                                                 eta ='symmetric',  #  prior für die Topic-Wort Verteilung 
     16                                                 iterations=100, # Anzahl der Iterationen
     17                                                 per_word_topics=True, #  Wenn auf True gesetzt berechnet das Modell auch eine Liste mit wahrscheinlichsten Topics für jedes Wort
     18                                                 callbacks=[epoch_logger]) 
     21 # Um zu sehen, was gensim tut müssen wir in den log file schauen!

File /opt/conda/lib/python3.11/site-packages/gensim/models/ldamodel.py:521, in LdaModel.__init__(self, corpus, num_topics, id2word, distributed, chunksize, passes, update_every, alpha, eta, decay, offset, eval_every, iterations, gamma_threshold, minimum_probability, random_state, ns_conf, minimum_phi_value, per_word_topics, callbacks, dtype)
    519 use_numpy = self.dispatcher is not None
    520 start = time.time()
--> 521 self.update(corpus, chunks_as_numpy=use_numpy)
    522 self.add_lifecycle_event(
    523     "created",
    524     msg=f"trained {self} in {time.time() - start:.2f}s",
    525 )

File /opt/conda/lib/python3.11/site-packages/gensim/models/ldamodel.py:973, in LdaModel.update(self, corpus, chunksize, decay, offset, passes, update_every, eval_every, iterations, gamma_threshold, chunks_as_numpy)
    970 if self.callbacks:
    971     # pass the list of input callbacks to Callback class
    972     callback = Callback(self.callbacks)
--> 973     callback.set_model(self)
    974     # initialize metrics list to store metric values after every epoch
    975     self.metrics = defaultdict(list)

File /opt/conda/lib/python3.11/site-packages/gensim/models/callbacks.py:480, in Callback.set_model(self, model)
    478     # store diff diagonals of previous epochs
    479     self.diff_mat = Queue()
--> 480 if any(metric.logger == "visdom" for metric in self.metrics):
    481     if not VISDOM_INSTALLED:
    482         raise ImportError("Please install Visdom for visualization")

File /opt/conda/lib/python3.11/site-packages/gensim/models/callbacks.py:480, in <genexpr>(.0)
    478     # store diff diagonals of previous epochs
    479     self.diff_mat = Queue()
--> 480 if any(metric.logger == "visdom" for metric in self.metrics):
    481     if not VISDOM_INSTALLED:
    482         raise ImportError("Please install Visdom for visualization")

AttributeError: 'EpochLogger' object has no attribute 'logger'


I am running the code in a jupyter notebook on the jupyter hub of our university.

Many thanks,
Felix

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/d2701d66-db55-48b7-9617-94505b9e5b22n%40googlegroups.com.

Gordon Mohr

unread,
Jul 11, 2024, 3:07:55 PM7/11/24
to Gensim
I'm not sure I fully understand the code around the (LDA-focused) `Callback` class – from naming, it seems it may have aspired to be an interface of extension, but in fact it just serves as a concrete collection of `Metric` items inside a model. (It's not easy to create your own specialized `Callback` to be supplied instead; you can only supply custom `Metric`s, as you seem to have discovered.)

Despite the fact that the nearby `CallbackAny2Vec` (for the other word2vec/etc models) has an `on_epoch_begin()` method, the LDA-related `Callback` and assorted `Metric` classes do not, so there isn't an easy place to hang your code that you want to run at an epoch's beginning. 

And specifically, from the traceback, it appears existing code is assuming that your `EpochLogger` (and any other `metric` it might handle) would have a `.logger` attribute, and your `EpochLogger` instance doesn't. And looking at the `CoherenceMetric` class from which you've extended your `EpochLogger`, its `__init__()` (if it had been called) would have ensured the instance has at least a `.logger` of `None` (if nothing else specified). 

Though I'm not sure any of these would be enough, some suggested steps could be:

* derive your `EpochLogger` from the more-generic `Metric`, rather than a specific `CoherenceMetric` which has extra functionality you don't need or use

* ensure your new logging pseudo-`Metric` behaves like the others in ways assumed by the other code - for example, has at the very least a `.logger` value that is `None` – if not `'shell'`, see below. (There may be other implied requirements you'll have to tackle as you hit them.)

* don't rely on your custom `Metric`'s methods `on_epoch_end()` or `on_epoch_begin()` being called - that's not a behavior of the LDA model & `Callback`. Only the (hard-to-customize) `.on_epoch_end()` of the (hard-to-swap-out) `Callback` class is called, which then asks for each `Metric`'s value via `.get_value()`. (It looks like `.get_value()` is only called once at the end of each epoch, but I'm not sure that's a guarantee, and none of the parameters shared with `.get_value()` can help determine things like which epoch has been reached.)

* it appears that simply by returning a `.logger` of `'shell'`, you can get the arbitrary value your `Metric` returns from `.get_value()` printed at the end of each epoch; if that (or other custom behavior upon each call to `.get_value()`) isn't enough to achieve your ultimate aim, you might need to look into deeper customization of the `LDAModel` class, as this idiosyncratic 'callback' system isn't very general.

- Gordon

Reply all
Reply to author
Forward
0 new messages