How to use the replay buffer in tf_agents for contextual bandit, that predicts and trains on a daily basis

10 views

Skip to first unread message

T V

unread,

Apr 27, 2022, 1:09:45 PM4/27/22

to TensorFlow Developers

I am using the tf_Agents library for contextual bandits usecase.

In this usecase predictions (daily range between 20k and 30k predictions, 1 for each user) are made daily (multiple times a day) and training only happens on all the predicted data from 4 days ago (Since the labels for predictions takes 3 days to observe).

The driver seems to replay only the batch_size number of experience (Since max_step length is 1 for contextual bandits). Also the replay buffer has the same constraint only handling batch size number of experiences.

I wanted to use checkpointer and save all the predictions (experience from driver which are saved in replay buffer) from the past 4 days and train only on the first of the 4 days saved on each given day.

I am unsure how to do the following and any help is greatly appreciate.

How to (run the driver) save replay buffer using checkpoints for the entire day (a day contains, say, 3 predictions runs and each prediction will be made on 30,000 observations [say batch size of 16]). So in this case I need multiple saves for each day
How to save the replay buffers for past 4 days (12 prediction runs ) and only retrieve the first 3 prediction runs (replay buffer and the driver run) to train for each day.
Unsure how to handle the driver, replay buffer and checkpointer configurations given the above #1, #2 above

Reply all

Reply to author

Forward

0 new messages