-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QS8 / QU8 PReLU microkernels #7738
base: master
Are you sure you want to change the base?
Conversation
swamipreksha
commented
Jan 30, 2025
- Implementations for various ISAs:
- x86 AVX2
- Scalar ISA
- Unit tests
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
2adbcf3
to
4af1bc9
Compare
src/qs8-vpreluc/avx2.c.in
Outdated
__m256i vacc${N} = _mm256_blendv_epi8(va${N}_sub, _mm256_mullo_epi32(va${N}_sub, vslope), _mm256_cmpgt_epi32(_mm256_setzero_si256(), va${N}_sub)); | ||
|
||
$for N in range(2*SIMD_TILE): | ||
__m256 vscale${N} = _mm256_blendv_ps(vnegative_multiplier, vpositive_multiplier, _mm256_castsi256_ps(_mm256_cmpgt_epi32(va${N}_sub, _mm256_setzero_si256()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the same condition here too? Both for consistency, and to rely less on compiler smartness to effectively do CSE. In fact, consider computing the comparison explicitly with an intermediate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have addressed all your reviews. Kindly let us know if any other change is required.
4af1bc9
to
cfac799
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, this is great work.
Just a few remaining minor nits.
aa81513
to
fee5c12
Compare
- Implementations for various ISAs: - x86 AVX2 - Scalar ISA - Unit tests Signed-Off-by: Ravi Kumar Soni <[email protected]> Signed-off-by: Swami, Preksha <[email protected]>