Help the compiler vectorize adjacent_difference#4958
Merged
StephanTLavavej merged 27 commits intomicrosoft:mainfrom Oct 30, 2024
Merged
Help the compiler vectorize adjacent_difference#4958StephanTLavavej merged 27 commits intomicrosoft:mainfrom
adjacent_difference#4958StephanTLavavej merged 27 commits intomicrosoft:mainfrom
Conversation
adjacent_differentceadjacent_difference
This comment was marked as resolved.
This comment was marked as resolved.
Contributor
Interestingly it vectorizes if we use |
StephanTLavavej
requested changes
Oct 24, 2024
AlexGuteniev
commented
Oct 24, 2024
StephanTLavavej
approved these changes
Oct 26, 2024
Member
|
Thanks! 😻 I pushed minor nitpicks and a significant fix for C++14/17. Speedups look good on my 5950X:
|
Member
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
StephanTLavavej
approved these changes
Oct 29, 2024
Member
|
I had to push an additional commit to fix the overlap check for heterogeneous types. |
AlexGuteniev
commented
Oct 30, 2024
Member
|
Thanks for helping the compiler, said the author of the presentation, Don't Help The Compiler 😹 😻 🎉 |
This was referenced Oct 30, 2024
Merged
Contributor
Author
|
Good news -- the 8 bit case is auto-vectorized now, despite DevCom-10745948 is not marked as resolved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📜 The approach
The following things prevented the original algorithm from vectorization:
🛑 Correctness concern
The standard defines exact steps for this algorithm. The optimization alters the steps.
In particular the standard wants the subtracted value to be saved from the previous iteration, rather than being read again.
The two below sections explain what precautions are made to make the change unobservable, so I hope the change is correct.
✅ Checks for eligibility
The following checks were added:
There's no need in check for integral types or so, since the compiler makes the final decision anyway, and it may be able optimize even something that wouldn't pass a strict check.
Apparently there's no rule that the source and the destination ranges may not overlap.
We should handle aliasing.
Unlike the #4431 precedent, we can't yield to the compiler here. The compiler is able to insert overlaps check that prevents vectorization and go to the scalar fallback in case of checks failure, but:
So we do our own checks.
Then we tell the compiler with
__restrictthat we already checked, and it should not bother. This is done in a separate function, because the__restrictis not aliased within scope, so saying__restrictwithin the original algorithm would apparently be a lie.The extra check by the compiler, if not prevented would slightly add run time and dead code size.
😾 Compiler warnings
We have a great feature called integral promotion. Smaller types are converted to integers, and there is a warning about converting them back. Local pragma suppresses them in benchmark, but not in the test.
@StephanTLavavej used a function object with
static_castto avoid warnings in the test.⏱️ Benchmark results
🥇 Results interpretation