-
Notifications
You must be signed in to change notification settings - Fork 1.2k
implementing the backslash-free string optimization #2596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The check for backslashes seems like a prime candidate for SIMD or SWAR approaches. |
|
@lemire nice! |
|
@camel-cdr Yes, but we don’t want to put it behind runtime dispatching. |
That would be great. The priority is to make sure we do not get any performance regression. |
LLVM 16, Apple M4 testsBenchmark: Main branch: This PR: Conclusion: Essentially no change.Benchmark: Main: This PR: Conclusion: a regressionGCC 15, Intel Ice Lake testsBenchmark: Main branch: This PR: Conclusion: Essentially no change.Benchmark: Main: This PR: Conclusion: a regression |
|
@CarlosEduR Did you have a chance to run your own tests? I'd like someone to confirm my good results. |
|
I am getting a performance regression on the |
|
Possibly @camel-cdr is correct, we might need better machine specific optimization. |
|
@lemire it seems there was actually a slight regression, wasn't there? Based on your comment: Main: PR: I just ran the benchmarks in my machine, Intel i5-10400F (Comet Lake, GCC 13.3): Benchmark: Main branch: This PR: Benchmark: Main branch: This PR: |
|
@CarlosEduR This is a useful comment. It confirms the regression on a different system. We'll have to go with a different design. |
|
A reasonable thing to try would be to only apply the trick for short strings. |
|
I can run the benchmarks with the updated version later today. |
You may but I don't have high hopes for the current PR. My current take is that we might need go down deeper. Not necessarily more code, but the attempt at filtering early does not seem to help. |
This builds on top of a PR by @CarlosEduR. #2211
Fixes #1470
It looks like the proper design to me right now, but we need to run benchmarks to see if it is at least performance neutral.