Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

将softmax注意力替换为线性注意力后,模型训练出现Nan #32

Open
wzh326 opened this issue Dec 22, 2024 · 2 comments
Open

Comments

@wzh326
Copy link

wzh326 commented Dec 22, 2024

将softmax注意力替换为线性注意力后,模型训练出现Nan,发现如果不使用值域在1以内的激活函数的话都会出现这种情况。请问这个问题有人遇到过么?将softmax注意力替换为线性注意力的过程中是如何解决这类问题的呢?

@jiaobin
Copy link

jiaobin commented Dec 23, 2024

将softmax注意力替换为线性注意力后,模型训练出现Nan,发现如果不使用值域在1以内的激活函数的话都会出现这种情况。请问这个问题有人遇到过么?将softmax注意力替换为线性注意力的过程中是如何解决这类问题的呢?

遇到同样的问题,请问你解决了吗

@tian-qing001
Copy link
Collaborator

Hi @jiaobin @wzh326.
I would like to know if you are using FLatten or the vanilla linear attention. Could you share your code and settings, which would help address the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants