Perplexity calculation for language models

tokes...@gmail.com

unread,

Aug 18, 2015, 12:25:14 PM8/18/15

to Keras-users

I am following along the lstm_text_generation.py example and trying to get perplexity scores for new sentences. I want to make sure I am doing things correctly.

Steps:

for a new string:

- grab segments (x) and the next character (next_char) of that segment

- model.predict(x) gives probabilities of the output (preds)

- grab the probability of the next_char in preds so this is p(next_char | segment)

- use the probabilities for the entropy calculation

Here is some sample code, thanks.

import numpy as np

def perplexity(string):

string = ''.join([i for i in string.lower() if i in char_indices])

p = [] # probs

sentences = []

next_chars = []

for i in range(0, len(string) - maxlen, step):

sentences.append(string[i : i + maxlen])

next_chars.append(string[i + maxlen])

count = len(sentences)

print(count)

x = np.zeros((len(sentences), maxlen, len(chars)))

for i, sentence in enumerate(sentences):

for t, char in enumerate(sentence):

x[i, t, char_indices[char]] = 1.0

preds = model.predict(x)

for i, (sentence, next_char) in enumerate(zip(sentences, next_chars)):

next_ind = char_indices[next_char]

p.append(preds[i, next_ind])

return np.power(2, -np.log2(p).sum() / count)

cedric....@gmail.com

unread,

May 1, 2016, 2:30:04 PM5/1/16

to Keras-users, tokes...@gmail.com

Hi,

I'm sitting on the same issue right now.

Have you by any chance figured it out?

Kind regards

Cedric

geo...@sanctuary.ai

unread,

May 14, 2018, 4:31:34 PM5/14/18

to Keras-users

Did you ever get confirmation that this is correct?

Reply all

Reply to author

Forward