However, you cannot pickle generator.
In my specific code, what is the cleanest way to save and load the
state of the example stream?
More generally, do you have a pattern for this, as a more general
pylearn concept?
Thanks!
def get_train_example():
for l in open(HYPERPARAMETERS["TRAIN_SENTENCES"]):
prevwords = []
for w in string.split(l):
w = string.strip(w)
id = None
prevwords.append(wordmap.id(w))
if len(prevwords) >= HYPERPARAMETERS["WINDOW_SIZE"]:
yield prevwords[-HYPERPARAMETERS["WINDOW_SIZE"]:]
def get_train_minibatch():
minibatch = []
for e in get_train_example():
minibatch.append(e)
if len(minibatch) >= HYPERPARAMETERS["MINIBATCH SIZE"]:
assert len(minibatch) == HYPERPARAMETERS["MINIBATCH SIZE"]
yield minibatch
minibatch = []
class ExampleStream(object):
def __init__(self):
self.file = open(HYPERPARAMETERS["TRAIN_SENTENCES"])
self.prewords = []
self.line = []
self.line_pos = 0
self.line_no = 0
...
def __iter__(self): return self
def next(self):
while self.line_pos == len(self.line):
self.line = self.file.readline().split()
self.line_pos = 0
self.line_no += 1
return self.line[self.line_pos]
Rewrite your other function as another class with a __iter__ function
and a next() function.
Now you can use "for token in ExampleStream()" type syntax, but you
can also make your class picklable (overriding __setstate__ and
__getstate__ as necessary, to seek through the example file when
reloading the stream).
There may be an easier way if you permit yourself more freedom in
terms of restructuring the code, but that would be the persistable way
of doing what you're already doing with generators.
James
Thanks.
> Rewrite your other function as another class with a __iter__ function
> and a next() function.
Turns out that you can just put the current generator code into
__iter__ and 'yield' the result.
You then don't even need the next method.