From 750b2e4785e5956122b3c565af65eb1929714fba Mon Sep 17 00:00:00 2001 From: Jeff King Date: Mon, 28 Apr 2014 12:16:31 -0400 Subject: [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames If you have existing decomposed filenames in your git repository (e.g., that were created with older versions of git that did not precompose unicode), a modern git with core.precomposeunicode set does not handle them well. The problem is that we normalize the paths coming from the disk into their precomposed form, and then compare them against the literal bytes in the index. This makes things better if you have the precomposed form in the index. It makes things worse if you actually have the decomposed form in the index. As a result, paths with decomposed filenames may have their precomposed variants listed as untracked files (even though the precomposed variants do not exist on-disk at all). This patch just adds a test to demonstrate the breakage. Some possible fixes are: 1. Tell everyone that NFD in the git repo is wrong, and they should make a new commit to normalize all their in-repo files to be precomposed. This is probably not the right thing to do, because it still doesn't fix checkouts of old history. And it spreads the problem to people on byte-preserving filesystems (like ext4), because now they have to start precomposing their filenames as they are adde to git. 2. Do all index filename comparisons using a UTF-8 aware comparison function when core.precomposeunicode is set. This would probably have bad performance, and somewhat defeats the point of converting the filenames at the readdir level in the first place. 3. Convert index filenames to their precomposed form when we read the index from disk. This would be efficient, but we would have to be careful not to write the precomposed forms back out to disk. 4. Introduce some infrastructure to efficiently match up the precomposed/decomposed forms. We already do something similar for case-insensitive files using name-hash.c. We might be able to adapt that strategy here. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t3910-mac-os-precompose.sh | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh index e4ba6013e4..23aa61ef09 100755 --- a/t/t3910-mac-os-precompose.sh +++ b/t/t3910-mac-os-precompose.sh @@ -140,6 +140,16 @@ test_expect_success "Add long precomposed filename" ' git add * && git commit -m "Long filename" ' + +test_expect_failure 'handle existing decomposed filenames' ' + echo content >"verbatim.$Adiarnfd" && + git -c core.precomposeunicode=false add "verbatim.$Adiarnfd" && + git commit -m "existing decomposed file" && + >expect && + git ls-files --exclude-standard -o "verbatim*" >untracked && + test_cmp expect untracked +' + # Test if the global core.precomposeunicode stops autosensing # Must be the last test case test_expect_success "respect git config --global core.precomposeunicode" '