Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QS8 / QU8 PReLU microkernels #7738

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

swamipreksha
Copy link

  • Implementations for various ISAs:
    • x86 AVX2
    • Scalar ISA
  • Unit tests

Copy link

google-cla bot commented Jan 30, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@dsharlet
Copy link
Collaborator

Thanks for the PR. Unfortunately, it looks like this is implementing the old prelu kernels. We now support prelu as a binary operator, and removed the old prelu operator: #6962, #7034

Can you please add binary operator implementation of the kernels you would like to have instead?

src/operators/binary-elementwise-nd.c Show resolved Hide resolved
src/xnnpack/microparams.h Outdated Show resolved Hide resolved
src/qs8-vpreluc/avx2.c.in Outdated Show resolved Hide resolved
src/qs8-vpreluc/scalar.c.in Outdated Show resolved Hide resolved
src/qs8-vprelu/avx2.c.in Outdated Show resolved Hide resolved
src/qs8-vprelu/scalar.c.in Outdated Show resolved Hide resolved
src/qs8-vprelu/scalar.c.in Outdated Show resolved Hide resolved
__m256i vacc${N} = _mm256_blendv_epi8(va${N}_sub, _mm256_mullo_epi32(va${N}_sub, vslope), _mm256_cmpgt_epi32(_mm256_setzero_si256(), va${N}_sub));

$for N in range(2*SIMD_TILE):
__m256 vscale${N} = _mm256_blendv_ps(vnegative_multiplier, vpositive_multiplier, _mm256_castsi256_ps(_mm256_cmpgt_epi32(va${N}_sub, _mm256_setzero_si256())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same condition here too? Both for consistency, and to rely less on compiler smartness to effectively do CSE. In fact, consider computing the comparison explicitly with an intermediate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have addressed all your reviews. Kindly let us know if any other change is required.

Copy link
Collaborator

@dsharlet dsharlet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, this is great work.

Just a few remaining minor nits.

src/configs/binary-elementwise-config.c Outdated Show resolved Hide resolved
src/configs/binary-elementwise-config.c Outdated Show resolved Hide resolved
src/qs8-vprelu/gen/qs8-vprelu-avx2-u16.c Show resolved Hide resolved
@swamipreksha swamipreksha force-pushed the qs8_qu8_vprelu branch 3 times, most recently from aa81513 to fee5c12 Compare February 6, 2025 07:57
- Implementations for various ISAs:
  - x86 AVX2
  - Scalar ISA
- Unit tests

Signed-Off-by: Ravi Kumar Soni <[email protected]>
Signed-off-by: Swami, Preksha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants