Question about pe_bias build for position embedding #18

1311894932 · 2024-12-15T05:07:00Z

Hello, thanks for your great work in FlowMDM !!! But i have a question about the pe_bias matrix

In code, it explains, pe_bias --> [T, T] matrix with -inf and 0's limiting where the attention during APE mode focuses (0's), i.e., inside each subsequence , and in the BPE_Rotary:
if pe_bias != None:
assert (w.int() == w).all(), "w should be 0 or 1 when using multitext at training"
pe_bias[w.squeeze() == 1] = 0 # need to zero bias out for the relative PE batch elements
i am completely confused. In my opinion, this matrix will be used to adding into QK dots when using abs_pos，and i dont know what the above code want to do...

Any response will be appreciated !!!

1311894932 · 2024-12-15T09:37:14Z

wait...maybe because rot_pos_emb do not need -inf , it has attention horizon? so we should zero the pe_bias out?
hope someone can answer my question, thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about pe_bias build for position embedding #18

Question about pe_bias build for position embedding #18

1311894932 commented Dec 15, 2024 •

edited

Loading

1311894932 commented Dec 15, 2024 •

edited

Loading

Question about pe_bias build for position embedding #18

Question about pe_bias build for position embedding #18

Comments

1311894932 commented Dec 15, 2024 • edited Loading

1311894932 commented Dec 15, 2024 • edited Loading

1311894932 commented Dec 15, 2024 •

edited

Loading

1311894932 commented Dec 15, 2024 •

edited

Loading