You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for your great work in FlowMDM !!! But i have a question about the pe_bias matrix
In code, it explains, pe_bias --> [T, T] matrix with -inf and 0's limiting where the attention during APE mode focuses (0's), i.e., inside each subsequence , and in the BPE_Rotary:
if pe_bias != None:
assert (w.int() == w).all(), "w should be 0 or 1 when using multitext at training"
pe_bias[w.squeeze() == 1] = 0 # need to zero bias out for the relative PE batch elements
i am completely confused. In my opinion, this matrix will be used to adding into QK dots when using abs_pos,and i dont know what the above code want to do...
Any response will be appreciated !!!
The text was updated successfully, but these errors were encountered:
wait...maybe because rot_pos_emb do not need -inf , it has attention horizon? so we should zero the pe_bias out?
hope someone can answer my question, thanks
Hello, thanks for your great work in FlowMDM !!! But i have a question about the pe_bias matrix
In code, it explains, pe_bias --> [T, T] matrix with -inf and 0's limiting where the attention during APE mode focuses (0's), i.e., inside each subsequence , and in the BPE_Rotary:
if pe_bias != None:
assert (w.int() == w).all(), "w should be 0 or 1 when using multitext at training"
pe_bias[w.squeeze() == 1] = 0 # need to zero bias out for the relative PE batch elements
i am completely confused. In my opinion, this matrix will be used to adding into QK dots when using abs_pos,and i dont know what the above code want to do...
Any response will be appreciated !!!
The text was updated successfully, but these errors were encountered: