You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
88 lines
3.4 KiB
88 lines
3.4 KiB
From a68dabd45f3591456ecf7e35f6a6077db79f6bc6 Mon Sep 17 00:00:00 2001 |
|
From: "Darrick J. Wong" <djwong@kernel.org> |
|
Date: Wed, 15 Mar 2023 15:59:35 +0100 |
|
Subject: [PATCH] xfs: fix off-by-one error in xfs_btree_space_to_height |
|
|
|
Source kernel commit: c0f399ff51495ac8d30367418f4f6292ecd61fbe |
|
|
|
Lately I've been stress-testing extreme-sized rmap btrees by using the |
|
(new) xfs_db bmap_inflate command to clone bmbt mappings billions of |
|
times and then using xfs_repair to build new rmap and refcount btrees. |
|
This of course is /much/ faster than actually FICLONEing a file billions |
|
of times. |
|
|
|
Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with |
|
EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not |
|
sufficiently large for the test scenario. For a 1TB filesystem (~67 |
|
million AG blocks, 4 AGs) the btheight command reports: |
|
|
|
$ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda |
|
rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node) |
|
level 0: 4400801200 records, 52390491 blocks |
|
level 1: 52390491 records, 1164234 blocks |
|
level 2: 1164234 records, 25872 blocks |
|
level 3: 25872 records, 575 blocks |
|
level 4: 575 records, 13 blocks |
|
level 5: 13 records, 1 block |
|
6 levels, 53581186 blocks total |
|
|
|
The AG is sufficiently large to build this rmap btree. Unfortunately, |
|
m_rmap_maxlevels is 5. Augmenting the loop in the space->height |
|
function to report height, node blocks, and blocks remaining produces |
|
this: |
|
|
|
ht 1 node_blocks 45 blockleft 67108863 |
|
ht 2 node_blocks 2025 blockleft 67108818 |
|
ht 3 node_blocks 91125 blockleft 67106793 |
|
ht 4 node_blocks 4100625 blockleft 67015668 |
|
final height: 5 |
|
|
|
The goal of this function is to compute the maximum height btree that |
|
can be stored in the given number of ondisk fsblocks. Starting with the |
|
top level of the tree, each iteration through the loop adds the fanout |
|
factor of the next level down until we run out of blocks. IOWs, maximum |
|
height is achieved by using the smallest fanout factor that can apply |
|
to that level. |
|
|
|
However, the loop setup is not correct. Top level btree blocks are |
|
allowed to contain fewer than minrecs items, so the computation is |
|
incorrect because the first time through the loop it should be using a |
|
fanout factor of 2. With this corrected, the above becomes: |
|
|
|
ht 1 node_blocks 2 blockleft 67108863 |
|
ht 2 node_blocks 90 blockleft 67108861 |
|
ht 3 node_blocks 4050 blockleft 67108771 |
|
ht 4 node_blocks 182250 blockleft 67104721 |
|
ht 5 node_blocks 8201250 blockleft 66922471 |
|
final height: 6 |
|
|
|
Fixes: 9ec691205e7d ("xfs: compute the maximum height of the rmap btree when reflink enabled") |
|
Signed-off-by: Darrick J. Wong <djwong@kernel.org> |
|
Reviewed-by: Dave Chinner <dchinner@redhat.com> |
|
Signed-off-by: Carlos Maiolino <cem@kernel.org> |
|
Signed-off-by: Pavel Reichl <preichl@redhat.com> |
|
--- |
|
libxfs/xfs_btree.c | 7 ++++++- |
|
1 file changed, 6 insertions(+), 1 deletion(-) |
|
|
|
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c |
|
index 65d38637..38a3092d 100644 |
|
--- a/libxfs/xfs_btree.c |
|
+++ b/libxfs/xfs_btree.c |
|
@@ -4663,7 +4663,12 @@ xfs_btree_space_to_height( |
|
const unsigned int *limits, |
|
unsigned long long leaf_blocks) |
|
{ |
|
- unsigned long long node_blocks = limits[1]; |
|
+ /* |
|
+ * The root btree block can have fewer than minrecs pointers in it |
|
+ * because the tree might not be big enough to require that amount of |
|
+ * fanout. Hence it has a minimum size of 2 pointers, not limits[1]. |
|
+ */ |
|
+ unsigned long long node_blocks = 2; |
|
unsigned long long blocks_left = leaf_blocks - 1; |
|
unsigned int height = 1; |
|
|
|
-- |
|
2.40.0 |
|
|
|
|