I'm wondering whether the following sort of inference problem can be solved using a negative log likelihood function. The following example should serve.
Imagine that we have three groups of entities and some observations of these entities. We suspect that the observations are modeled by a random draw from a normal distribution. Thus, we want to estimate the parameters of the normal distribution, given the group. For the sake of the example, imagine that group 1 produces observations from x1 ~ N(1, .1), and two from x2 ~ N(2, .2) and three x3 ~ N(3, .3). Can I feed data that fits this description to build a model that will estimate these distributions?
Here's some code:
# Build synthetic data; 200 samples from each group as described.
size = 200
samples = tfd.Normal([1, 2, 3], [.1, .2, .3]).sample(size)
x1 = np.ones(size).reshape(-1,1)
x1 = np.concatenate([x1, samples[:,0].numpy().reshape(-1,1)], axis = 1)
x2 = np.ones(size).reshape(-1,1)*2
x2 = np.concatenate([x2, samples[:,1].numpy().reshape(-1,1)], axis = 1)
x3 = np.ones(size).reshape(-1,1)*3
x3 = np.concatenate([x3, samples[:,2].numpy().reshape(-1,1)], axis = 1)
data = np.concatenate([x1, x2, x3])
# define a loss fuctionnegloglik = lambda y, rv_y: -rv_y.log_prob(y)
inputs = tf.keras.Input(shape=1)
# A simple model for learning the distributions that produced the data
x = tf.keras.layers.Dense(2)(inputs)
output = tfp.layers.DistributionLambda(lambda t: tfd.Normal(
model = tf.keras.Model(inputs, output)
# fit the model to the data
model.fit(data[:,0], data[:,1], epochs=1000)
The result of this code is a tfd.Distribution lambda with batch shape 600. If we investigate the loc and scale of the batches, we find that they are all very close to (2, .1). What I expected was that loc and scale would depend on the single input feature (1, 2, 3) that loc would approximately be 1, 2, or 3. What we have instead looks much more like the model is trying to find parameters for a normal distribution that fits the target data while ignoring the input features.
Now, obviously, for a problem like this, we could just model each group separately. But what if we don't really know what the groups are and instead want a learner to infer distributions based on more complex input features? Maybe there are four features in our data that seem to interact to predict, with some uncertainty, the target value. Maybe there are four hundred features... I think you can see where this is going, but feel free to request a clarification.In any event, I think the goal is clear: is there a way to modify the code above to find a distribution for each data point?