question about contextualization and reasoning code

36 views
Skip to first unread message

Xuejiao Tang

unread,
Jan 15, 2020, 10:00:40 AM1/15/20
to Visual Commonsense Reasoning
Hi Rowan,

I read the code, and I have some questions. As my understanding, the follwing 2 lines refers to the Grounding part of R2C moel.

q_rep, q_obj_reps = self.embed_span(question, question_tags, question_mask, obj_reps['obj_reps'])
a_rep, a_obj_reps = self.embed_span(answers, answer_tags, answer_mask, obj_reps['obj_reps'])

in contextualization part, αi,j=softmaxj(riWqj), qi=jαi,jqj.(2) is used to find attended query, the corresponding code as follows, but I'm a little confused, is qa_similarity refers to ri*qi in the described equation? Because qa_similarity and question_mask are used to calculate weight, and then the weight is input into torch.einsum to get attend_q. I cannot match the equation with code.

qa_similarity = self.span_attention(
q_rep.view(q_rep.shape[0] * q_rep.shape[1], q_rep.shape[2], q_rep.shape[3]),
a_rep.view(a_rep.shape[0] * a_rep.shape[1], a_rep.shape[2], a_rep.shape[3]),
).view(a_rep.shape[0], a_rep.shape[1], q_rep.shape[2], a_rep.shape[2])
qa_attention_weights = masked_softmax(qa_similarity, question_mask[..., None], dim=2)
attended_q = torch.einsum('bnqa,bnqd->bnad', (qa_attention_weights, q_rep))


# Have a second attention over the objects, do A by Objs
# [batch_size, 4, answer_length, num_objs]
atoo_similarity = self.obj_attention(a_rep.view(a_rep.shape[0], a_rep.shape[1] * a_rep.shape[2], -1),
obj_reps['obj_reps']).view(a_rep.shape[0], a_rep.shape[1],
a_rep.shape[2], obj_reps['obj_reps'].shape[1])
atoo_attention_weights = masked_softmax(atoo_similarity, box_mask[:,None,None])
attended_o = torch.einsum('bnao,bod->bnad', (atoo_attention_weights, obj_reps['obj_reps']))



In reasoning part, the input is torch.cat result of a_rep,attended_o, and attended_q. But why use self.reasoning_use_answer and self.reason_use_obj....and I saw all of these bool parameter is true.


 
reasoning_inp = torch.cat([x for x, to_pool in [(a_rep, self.reasoning_use_answer),
(attended_o, self.reasoning_use_obj),
(attended_q, self.reasoning_use_question)]
if to_pool], -1)





Sorry for the black things, I don't know how to change the color.....
best regards,
Xuejiao





Rowan Zellers

unread,
Jan 16, 2020, 7:27:04 PM1/16/20
to Visual Commonsense Reasoning
Hi Xuejiao,

Yeah, the qa_similarity is the r_{i}Wq_{i}. The weight W is in the span_attention module.
Hope that helps!

thanks,
Rowan
Reply all
Reply to author
Forward
0 new messages