From e536b1fedf777ad8958a6f299d9d59db6299e697 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Mon, 21 Oct 2019 18:00:39 +0200 Subject: [PATCH 1/5] Documentation: mention more worktree-specific exceptions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If a directory in $GIT_DIR is overridden when $GIT_COMMON_DIR is set, then usually all paths within that directory are overridden as well. There are a couple of exceptions, though, and two of them, namely 'refs/rewritten' and 'logs/HEAD' are not mentioned in 'gitrepository-layout'. Document them as well. Signed-off-by: SZEDER Gábor Signed-off-by: Junio C Hamano --- Documentation/gitrepository-layout.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/gitrepository-layout.txt b/Documentation/gitrepository-layout.txt index 216b11ee88..f4066fd026 100644 --- a/Documentation/gitrepository-layout.txt +++ b/Documentation/gitrepository-layout.txt @@ -96,9 +96,9 @@ refs:: directory. The 'git prune' command knows to preserve objects reachable from refs found in this directory and its subdirectories. - This directory is ignored (except refs/bisect and - refs/worktree) if $GIT_COMMON_DIR is set and - "$GIT_COMMON_DIR/refs" will be used instead. + This directory is ignored (except refs/bisect, + refs/rewritten and refs/worktree) if $GIT_COMMON_DIR is + set and "$GIT_COMMON_DIR/refs" will be used instead. refs/heads/`name`:: records tip-of-the-tree commit objects of branch `name` @@ -240,8 +240,8 @@ remotes:: logs:: Records of changes made to refs are stored in this directory. See linkgit:git-update-ref[1] for more information. This - directory is ignored if $GIT_COMMON_DIR is set and - "$GIT_COMMON_DIR/logs" will be used instead. + directory is ignored (except logs/HEAD) if $GIT_COMMON_DIR is + set and "$GIT_COMMON_DIR/logs" will be used instead. logs/refs/heads/`name`:: Records all changes made to the branch tip named `name`. From 7cb8c929d76c12750fdece2e5da75d207938d3b9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Mon, 21 Oct 2019 18:00:40 +0200 Subject: [PATCH 2/5] path.c: clarify trie_find()'s in-code comment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A fairly long comment describes trie_find()'s behavior and shows examples, but it's slightly incomplete/inaccurate. Update this comment to specify how trie_find() handles a negative return value from the given match function. Furthermore, update the list of examples to include not only two but three levels of path components. This makes the examples slightly more complicated, but it can illustrate the behavior in more corner cases. Finally, basically everything refers to the data stored for a key as "value", with two confusing exceptions: - The type definition of the match function calls its corresponding parameter 'data'. Rename that parameter to 'value'. (check_common(), the only function of this type already calls it 'value'). - The table of examples above trie_find() has a "val from node" column, which has nothing to do with the value stored in the trie: it's a "prefix of the key for which the trie contains a value" that led to that node. Rename that column header to "prefix to node". Note that neither the original nor the updated description and examples correspond 100% to the current implementation, because the implementation is a bit buggy, but the comment describes the desired behavior. The bug will be fixed in the last patch of this series. Signed-off-by: SZEDER Gábor Signed-off-by: Junio C Hamano --- path.c | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/path.c b/path.c index 25e97b8c3f..d10f0e0d8e 100644 --- a/path.c +++ b/path.c @@ -236,30 +236,41 @@ static void *add_to_trie(struct trie *root, const char *key, void *value) return old; } -typedef int (*match_fn)(const char *unmatched, void *data, void *baton); +typedef int (*match_fn)(const char *unmatched, void *value, void *baton); /* * Search a trie for some key. Find the longest /-or-\0-terminated - * prefix of the key for which the trie contains a value. Call fn - * with the unmatched portion of the key and the found value, and - * return its return value. If there is no such prefix, return -1. + * prefix of the key for which the trie contains a value. If there is + * no such prefix, return -1. Otherwise call fn with the unmatched + * portion of the key and the found value. If fn returns 0 or + * positive, then return its return value. If fn returns negative, + * then call fn with the next-longest /-terminated prefix of the key + * (i.e. a parent directory) for which the trie contains a value, and + * handle its return value the same way. If there is no shorter + * /-terminated prefix with a value left, then return the negative + * return value of the most recent fn invocation. * * The key is partially normalized: consecutive slashes are skipped. * - * For example, consider the trie containing only [refs, - * refs/worktree] (both with values). - * - * | key | unmatched | val from node | return value | - * |-----------------|------------|---------------|--------------| - * | a | not called | n/a | -1 | - * | refs | \0 | refs | as per fn | - * | refs/ | / | refs | as per fn | - * | refs/w | /w | refs | as per fn | - * | refs/worktree | \0 | refs/worktree | as per fn | - * | refs/worktree/ | / | refs/worktree | as per fn | - * | refs/worktree/a | /a | refs/worktree | as per fn | - * |-----------------|------------|---------------|--------------| + * For example, consider the trie containing only [logs, + * logs/refs/bisect], both with values, but not logs/refs. * + * | key | unmatched | prefix to node | return value | + * |--------------------|----------------|------------------|--------------| + * | a | not called | n/a | -1 | + * | logstore | not called | n/a | -1 | + * | logs | \0 | logs | as per fn | + * | logs/ | / | logs | as per fn | + * | logs/refs | /refs | logs | as per fn | + * | logs/refs/ | /refs/ | logs | as per fn | + * | logs/refs/b | /refs/b | logs | as per fn | + * | logs/refs/bisected | /refs/bisected | logs | as per fn | + * | logs/refs/bisect | \0 | logs/refs/bisect | as per fn | + * | logs/refs/bisect/ | / | logs/refs/bisect | as per fn | + * | logs/refs/bisect/a | /a | logs/refs/bisect | as per fn | + * | (If fn in the previous line returns -1, then fn is called once more:) | + * | logs/refs/bisect/a | /refs/bisect/a | logs | as per fn | + * |--------------------|----------------|------------------|--------------| */ static int trie_find(struct trie *root, const char *key, match_fn fn, void *baton) From 8a64881b44e03264fb0c3c26fc00a01c12cd67ff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Mon, 21 Oct 2019 18:00:41 +0200 Subject: [PATCH 3/5] path.c: mark 'logs/HEAD' in 'common_list' as file MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 'logs/HEAD', i.e. HEAD's reflog, is a file, but its entry in 'common_list' has the 'is_dir' bit set. Unset that bit to make it consistent with what 'logs/HEAD' is supposed to be. This doesn't make a difference in behavior: check_common() is the only function that looks at the 'is_dir' bit, and that function either returns 0, or '!exclude', which for 'logs/HEAD' results in 0 as well. Signed-off-by: SZEDER Gábor Signed-off-by: Junio C Hamano --- path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index d10f0e0d8e..7f243f3c56 100644 --- a/path.c +++ b/path.c @@ -113,7 +113,7 @@ static struct common_dir common_list[] = { { 0, 1, 0, "info" }, { 0, 0, 1, "info/sparse-checkout" }, { 1, 1, 0, "logs" }, - { 1, 1, 1, "logs/HEAD" }, + { 1, 0, 1, "logs/HEAD" }, { 0, 1, 1, "logs/refs/bisect" }, { 0, 1, 1, "logs/refs/rewritten" }, { 0, 1, 1, "logs/refs/worktree" }, From c72fc40d0904cbf3199258c2f471c2351e024b1e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Mon, 21 Oct 2019 18:00:42 +0200 Subject: [PATCH 4/5] path.c: clarify two field names in 'struct common_dir' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit An array of 'struct common_dir' instances is used to specify whether various paths in $GIT_DIR are specific to a worktree, or are common, i.e. belong to main worktree. The names of two fields in this struct are somewhat confusing or ambigious: - The path is recorded in the struct's 'dirname' field, even though several entries are regular files e.g. 'gc.pid', 'packed-refs', etc. Rename this field to 'path' to reduce confusion. - The field 'exclude' tells whether the path is excluded... from where? Excluded from the common dir or from the worktree? It means the former, but it's ambigious. Rename this field to 'is_common' to make it unambigious what it means. This, however, means the exact opposite of what 'exclude' meant, so we have to negate the field's value in all entries as well. The diff is best viewed with '--color-words'. Signed-off-by: SZEDER Gábor Signed-off-by: Junio C Hamano --- path.c | 66 +++++++++++++++++++++++++++++----------------------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/path.c b/path.c index 7f243f3c56..81e9bfe7a9 100644 --- a/path.c +++ b/path.c @@ -101,36 +101,36 @@ struct common_dir { /* Not considered garbage for report_linked_checkout_garbage */ unsigned ignore_garbage:1; unsigned is_dir:1; - /* Not common even though its parent is */ - unsigned exclude:1; - const char *dirname; + /* Belongs to the common dir, though it may contain paths that don't */ + unsigned is_common:1; + const char *path; }; static struct common_dir common_list[] = { - { 0, 1, 0, "branches" }, - { 0, 1, 0, "common" }, - { 0, 1, 0, "hooks" }, - { 0, 1, 0, "info" }, - { 0, 0, 1, "info/sparse-checkout" }, - { 1, 1, 0, "logs" }, - { 1, 0, 1, "logs/HEAD" }, - { 0, 1, 1, "logs/refs/bisect" }, - { 0, 1, 1, "logs/refs/rewritten" }, - { 0, 1, 1, "logs/refs/worktree" }, - { 0, 1, 0, "lost-found" }, - { 0, 1, 0, "objects" }, - { 0, 1, 0, "refs" }, - { 0, 1, 1, "refs/bisect" }, - { 0, 1, 1, "refs/rewritten" }, - { 0, 1, 1, "refs/worktree" }, - { 0, 1, 0, "remotes" }, - { 0, 1, 0, "worktrees" }, - { 0, 1, 0, "rr-cache" }, - { 0, 1, 0, "svn" }, - { 0, 0, 0, "config" }, - { 1, 0, 0, "gc.pid" }, - { 0, 0, 0, "packed-refs" }, - { 0, 0, 0, "shallow" }, + { 0, 1, 1, "branches" }, + { 0, 1, 1, "common" }, + { 0, 1, 1, "hooks" }, + { 0, 1, 1, "info" }, + { 0, 0, 0, "info/sparse-checkout" }, + { 1, 1, 1, "logs" }, + { 1, 0, 0, "logs/HEAD" }, + { 0, 1, 0, "logs/refs/bisect" }, + { 0, 1, 0, "logs/refs/rewritten" }, + { 0, 1, 0, "logs/refs/worktree" }, + { 0, 1, 1, "lost-found" }, + { 0, 1, 1, "objects" }, + { 0, 1, 1, "refs" }, + { 0, 1, 0, "refs/bisect" }, + { 0, 1, 0, "refs/rewritten" }, + { 0, 1, 0, "refs/worktree" }, + { 0, 1, 1, "remotes" }, + { 0, 1, 1, "worktrees" }, + { 0, 1, 1, "rr-cache" }, + { 0, 1, 1, "svn" }, + { 0, 0, 1, "config" }, + { 1, 0, 1, "gc.pid" }, + { 0, 0, 1, "packed-refs" }, + { 0, 0, 1, "shallow" }, { 0, 0, 0, NULL } }; @@ -331,8 +331,8 @@ static void init_common_trie(void) if (common_trie_done_setup) return; - for (p = common_list; p->dirname; p++) - add_to_trie(&common_trie, p->dirname, p); + for (p = common_list; p->path; p++) + add_to_trie(&common_trie, p->path, p); common_trie_done_setup = 1; } @@ -349,10 +349,10 @@ static int check_common(const char *unmatched, void *value, void *baton) return 0; if (dir->is_dir && (unmatched[0] == 0 || unmatched[0] == '/')) - return !dir->exclude; + return dir->is_common; if (!dir->is_dir && unmatched[0] == 0) - return !dir->exclude; + return dir->is_common; return 0; } @@ -376,8 +376,8 @@ void report_linked_checkout_garbage(void) return; strbuf_addf(&sb, "%s/", get_git_dir()); len = sb.len; - for (p = common_list; p->dirname; p++) { - const char *path = p->dirname; + for (p = common_list; p->path; p++) { + const char *path = p->path; if (p->ignore_garbage) continue; strbuf_setlen(&sb, len); From f45f88b2e483649cd063a7dc7826c03025683e56 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= Date: Mon, 21 Oct 2019 18:00:43 +0200 Subject: [PATCH 5/5] path.c: don't call the match function without value in trie_find() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 'logs/refs' is not a working tree-specific path, but since commit b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07) 'git rev-parse --git-path' has been returning a bogus path if a trailing '/' is present: $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/ /home/szeder/src/git/.git/logs/refs /home/szeder/src/git/.git/worktrees/WT/logs/refs/ We use a trie data structure to efficiently decide whether a path belongs to the common dir or is working tree-specific. As it happens b9317d55a3 triggered a bug that is as old as the trie implementation itself, added in 4e09cf2acf (path: optimize common dir checking, 2015-08-31). - According to the comment describing trie_find(), it should only call the given match function 'fn' for a "/-or-\0-terminated prefix of the key for which the trie contains a value". This is not true: there are three places where trie_find() calls the match function, but one of them is missing the check for value's existence. - b9317d55a3 added two new keys to the trie: 'logs/refs/rewritten' and 'logs/refs/worktree', next to the already existing 'logs/refs/bisect'. This resulted in a trie node with the path 'logs/refs/', which didn't exist before, and which doesn't have a value attached. A query for 'logs/refs/' finds this node and then hits that one callsite of the match function which doesn't check for the value's existence, and thus invokes the match function with NULL as value. - When the match function check_common() is invoked with a NULL value, it returns 0, which indicates that the queried path doesn't belong to the common directory, ultimately resulting the bogus path shown above. Add the missing condition to trie_find() so it will never invoke the match function with a non-existing value. check_common() will then no longer have to check that it got a non-NULL value, so remove that condition. I believe that there are no other paths that could cause similar bogus output. AFAICT the only other key resulting in the match function being called with a NULL value is 'co' (because of the keys 'common' and 'config'). However, as they are not in a directory that belongs to the common directory the resulting working tree-specific path is expected. Signed-off-by: SZEDER Gábor Signed-off-by: Junio C Hamano --- path.c | 11 ++++++----- t/t0060-path-utils.sh | 2 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/path.c b/path.c index 81e9bfe7a9..4b69360c71 100644 --- a/path.c +++ b/path.c @@ -299,9 +299,13 @@ static int trie_find(struct trie *root, const char *key, match_fn fn, /* Matched the entire compressed section */ key += i; - if (!*key) + if (!*key) { /* End of key */ - return fn(key, root->value, baton); + if (root->value) + return fn(key, root->value, baton); + else + return -1; + } /* Partial path normalization: skip consecutive slashes */ while (key[0] == '/' && key[1] == '/') @@ -345,9 +349,6 @@ static int check_common(const char *unmatched, void *value, void *baton) { struct common_dir *dir = value; - if (!dir) - return 0; - if (dir->is_dir && (unmatched[0] == 0 || unmatched[0] == '/')) return dir->is_common; diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh index c7b53e494b..501e1a288d 100755 --- a/t/t0060-path-utils.sh +++ b/t/t0060-path-utils.sh @@ -288,6 +288,8 @@ test_git_path GIT_COMMON_DIR=bar index .git/index test_git_path GIT_COMMON_DIR=bar HEAD .git/HEAD test_git_path GIT_COMMON_DIR=bar logs/HEAD .git/logs/HEAD test_git_path GIT_COMMON_DIR=bar logs/refs/bisect/foo .git/logs/refs/bisect/foo +test_git_path GIT_COMMON_DIR=bar logs/refs bar/logs/refs +test_git_path GIT_COMMON_DIR=bar logs/refs/ bar/logs/refs/ test_git_path GIT_COMMON_DIR=bar logs/refs/bisec/foo bar/logs/refs/bisec/foo test_git_path GIT_COMMON_DIR=bar logs/refs/bisec bar/logs/refs/bisec test_git_path GIT_COMMON_DIR=bar logs/refs/bisectfoo bar/logs/refs/bisectfoo