You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
34 lines
1.3 KiB
34 lines
1.3 KiB
From 272708884cb750f12f5c74a00e6620c19dc6d567 Mon Sep 17 00:00:00 2001 |
|
From: Adhemerval Zanella <adhemerval.zanella@linaro.org> |
|
Date: Thu, 8 Feb 2024 10:08:39 -0300 |
|
Subject: [PATCH] x86: Do not prefer ERMS for memset on Zen3+ |
|
Content-type: text/plain; charset=UTF-8 |
|
|
|
For AMD Zen3+ architecture, the performance of the vectorized loop is |
|
slightly better than ERMS. |
|
|
|
Checked on x86_64-linux-gnu on Zen3. |
|
Reviewed-by: H.J. Lu <hjl.tools@gmail.com> |
|
--- |
|
sysdeps/x86/dl-cacheinfo.h | 5 +++++ |
|
1 file changed, 5 insertions(+) |
|
|
|
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h |
|
index f34d12846c..5a98f70364 100644 |
|
--- a/sysdeps/x86/dl-cacheinfo.h |
|
+++ b/sysdeps/x86/dl-cacheinfo.h |
|
@@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) |
|
minimum value is fixed. */ |
|
rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, |
|
long int, NULL); |
|
+ if (cpu_features->basic.kind == arch_kind_amd |
|
+ && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) |
|
+ /* For AMD Zen3+ architecture, the performance of the vectorized loop is |
|
+ slightly better than ERMS. */ |
|
+ rep_stosb_threshold = SIZE_MAX; |
|
|
|
TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); |
|
TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); |
|
-- |
|
2.39.3 |
|
|
|
|