kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	88134a8417	Merge branch 'ds/path-walk-2' "git pack-objects" learns to find delta bases from blobs at the same path, using the --path-walk API. * ds/path-walk-2: pack-objects: allow --shallow and --path-walk path-walk: add new 'edge_aggressive' option pack-objects: thread the path-based compression pack-objects: refactor path-walk delta phase scalar: enable path-walk during push via config pack-objects: enable --path-walk via config repack: add --path-walk option t5538: add tests to confirm deltas in shallow pushes pack-objects: introduce GIT_TEST_PACK_PATH_WALK p5313: add performance tests for --path-walk pack-objects: update usage to match docs pack-objects: add --path-walk option pack-objects: extract should_attempt_deltas()	2025-06-17 10:44:38 -07:00
Junio C Hamano	b5afd0a7ee	Merge branch 'kn/passing-leak-tests' Remove the leftover hints to the test framework to mark tests that do not pass the leak checker tests, as they should no longer be needed. * kn/passing-leak-tests: t: remove unexpected SANITIZE_LEAK variables	2025-05-28 07:59:56 -07:00
Junio C Hamano	6e5fb398d3	Merge branch 'ds/sparse-apply-add-p' "git apply" and "git add -i/-p" code paths no longer unnecessarily expand sparse-index while working. * ds/sparse-apply-add-p: p2000: add performance test for patch-mode commands reset: integrate sparse index with --patch git add: make -p/-i aware of sparse index apply: integrate with the sparse index	2025-05-27 13:59:09 -07:00
Karthik Nayak	368d8c86f7	t: remove unexpected SANITIZE_LEAK variables As of `1fc7ddf35b` (test-lib: unconditionally enable leak checking, 2024-11-20), both the `GIT_TEST_PASSING_SANITIZE_LEAK` and `TEST_PASSES_SANITIZE_LEAK` variables no longer have any meaning, the leak checks are enabled by default. However, some newly added tests include them by mistake. Let's clean this up. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 15:09:33 -07:00
Derrick Stolee	5f711504d9	repack: add --path-walk option Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the --path-walk tests in p5313 look like this: Test this tree ------------------------------------------------------------------------- 5313.18: thin pack with --path-walk 0.08(0.06+0.02) 5313.19: thin pack size with --path-walk 18.4K 5313.20: big pack with --path-walk 2.10(7.80+0.26) 5313.21: big pack size with --path-walk 19.8M 5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17) 5313.23: shallow pack size with --path-walk 33.6M 5313.24: repack with --path-walk 81.29(96.08+0.71) 5313.25: repack size with --path-walk 142.5M [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 Along with the earlier tests in p5313, I'll instead reformat the comparison as follows: Repack Method Pack Size Time --------------------------------------- Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk 142.5M 81.29s There are a few things to notice here: 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are significant, but --path-walk still compresses better than that option. 2. The --path-walk command is still using --name-hash-version=1 for the second pass of delta computation, using the increased name hash collisions as a potential method for opportunistic compression on top of the path-focused compression. 3. The --path-walk algorithm is currently sequential and does not use multiple threads for delta compression. Threading will be implemented in a future change so the computation time will improve to better compete in this metric. There are small benefits in size for my copy of the Git repository: Repack Method Pack Size Time --------------------------------------- Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk 213.2M 142.50s As well as in the nodejs/node repository [3]: Repack Method Pack Size Time --------------------------------------- Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk 698.1M 208.10s [3] https://github.com/nodejs/node This benefit also repeats in my copy of the Linux kernel repository: Repack Method Pack Size Time --------------------------------------- Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk 2.2G 1562.36s It is important to see that even when the repository shape does not have many name-hash collisions, there is a slight space boost to be found using this method. As this repacking strategy was released in Git for Windows 2.47.0, some users have reported cases where the --path-walk compression is slightly worse than the --name-hash-version=2 option. In those cases, it may be beneficial to combine the two options. However, there has not been a released version of Git that has both options and I don't have access to these repos for testing. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:39 -07:00
Derrick Stolee	3ce9e5f293	p5313: add performance tests for --path-walk The previous change added a --path-walk option to 'git pack-objects'. Create a performance test that demonstrates the time and space benefits of the feature. In order to get an appropriate comparison, we need to avoid reusing deltas and recompute them from scratch. Compare the creation of a thin pack representing a small push and the creation of a relatively large non-thin pack. Running on my copy of the Git repository results in this data (removing the repack tests for --name-hash-version): Test this tree ------------------------------------------------------------------------ 5313.2: thin pack with --name-hash-version=1 0.02(0.01+0.01) 5313.3: thin pack size with --name-hash-version=1 1.6K 5313.4: big pack with --name-hash-version=1 2.55(4.20+0.26) 5313.5: big pack size with --name-hash-version=1 16.4M 5313.6: shallow fetch pack with --name-hash-version=1 1.24(2.03+0.08) 5313.7: shallow pack size with --name-hash-version=1 12.2M 5313.10: thin pack with --name-hash-version=2 0.03(0.01+0.01) 5313.11: thin pack size with --name-hash-version=2 1.6K 5313.12: big pack with --name-hash-version=2 1.91(3.23+0.20) 5313.13: big pack size with --name-hash-version=2 16.4M 5313.14: shallow fetch pack with --name-hash-version=2 1.06(1.57+0.10) 5313.15: shallow pack size with --name-hash-version=2 12.5M 5313.18: thin pack with --path-walk 0.03(0.01+0.01) 5313.19: thin pack size with --path-walk 1.6K 5313.20: big pack with --path-walk 2.05(3.24+0.27) 5313.21: big pack size with --path-walk 16.3M 5313.22: shallow fetch pack with --path-walk 1.08(1.66+0.07) 5313.23: shallow pack size with --path-walk 12.4M This can be reformatted as follows: Pack Type Hash v1 Hash v2 Path Walk --------------------------------------------------- thin pack (time) 0.02s 0.03s 0.03s (size) 1.6K 1.6K 1.6K big pack (time) 2.55s 1.91s 2.05s (size) 16.4M 16.4M 16.3M shallow pack (time) 1.24s 1.06s 1.08s (size) 12.2M 12.5M 12.4M Note that the timing is slower because there is no threading in the --path-walk case (yet). Also, the shallow pack cases are really not using the --path-walk logic right now because it is disabled until some additions are made to the path walk API. The cases where the --path-walk option really shines is when the default name-hash is overwhelmed with unhelpful collisions. An open source example can be found in the microsoft/fluentui repo [1] at a certain commit [2]. [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 Running the tests on this repo results in the following comparison table: Pack Type Hash v1 Hash v2 Path Walk --------------------------------------------------- thin pack (time) 0.36s 0.12s 0.08s (size) 1.2M 22.0K 18.4K big pack (time) 2.00s 2.90s 2.21s (size) 20.4M 25.9M 19.5M shallow pack (time) 1.41s 1.80s 1.65s (size) 34.4M 33.7M 33.6M Notice in particular that in the small thin pack, the time performance has improved from 0.36s for --name-hash-version=1 to 0.08s and this is likely due to the improved size of the resulting pack: 18.4K instead of 1.2M. The relatively new --name-hash-version=2 is competitive with --path-walk (0.12s and 22.0K) but not quite as successful. Finally, running this on a copy of the Linux kernel repository results in these data points: Pack Type Hash v1 Hash v2 Path Walk --------------------------------------------------- thin pack (time) 0.03s 0.13s 0.03s (size) 4.6K 4.6K 4.6K big pack (time) 15.29s 12.32s 13.92s (size) 201.1M 159.1M 158.5M shallow pack (time) 10.88s 22.93s 22.74s (size) 269.2M 273.8M 267.7M Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:38 -07:00
Derrick Stolee	ecf9ba20e3	p2000: add performance test for patch-mode commands The previous three changes contributed performance improvements to 'git apply', 'git add -p', and 'git reset -p' when using a sparse index. The improvement to 'git apply' also improved 'git checkout -p'. Add performance tests to demonstrate this (and to help validate that performance remains good in the future). In the truncated test output below, we see that the full checkout performance changes within noise expectations, but the sparse index cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for 'git reset -p'. 'git checkout -p' improves immediatley by 91% because it does not need any change to its builtin. Test HEAD~4 HEAD~3 HEAD~2 HEAD~1 ------------------------------------------------------------------------------------- 2000.118: ... git add -p (full-v3) 0.79 0.79 +0.0% 0.82 +3.8% 0.82 +3.8% 2000.119: ... git add -p (full-v4) 0.74 0.76 +2.7% 0.74 +0.0% 0.76 +2.7% 2000.120: ... git add -p (sparse-v3) 1.94 1.28 -34.0% 0.07 -96.4% 0.07 -96.4% 2000.121: ... git add -p (sparse-v4) 1.93 1.28 -33.7% 0.06 -96.9% 0.06 -96.9% 2000.122: ... git checkout -p (full-v3) 1.18 1.18 +0.0% 1.18 +0.0% 1.19 +0.8% 2000.123: ... git checkout -p (full-v4) 1.10 1.12 +1.8% 1.11 +0.9% 1.11 +0.9% 2000.124: ... git checkout -p (sparse-v3) 1.31 0.11 -91.6% 0.11 -91.6% 0.11 -91.6% 2000.125: ... git checkout -p (sparse-v4) 1.29 0.11 -91.5% 0.11 -91.5% 0.11 -91.5% 2000.126: ... git reset -p (full-v3) 0.81 0.80 -1.2% 0.83 +2.5% 0.83 +2.5% 2000.127: ... git reset -p (full-v4) 0.78 0.77 -1.3% 0.77 -1.3% 0.78 +0.0% 2000.128: ... git reset -p (sparse-v3) 1.58 0.92 -41.8% 0.91 -42.4% 0.07 -95.6% 2000.129: ... git reset -p (sparse-v4) 1.58 0.92 -41.8% 0.92 -41.8% 0.07 -95.6% It is worth noting that if our test was more involved and had multiple hunks to evaluate, then the time spent in 'git apply' would dominate due to multiple index loads and writes. As it stands, we need the sparse index improvement in 'git add -p' itself to confirm this performance improvement. Since the change for 'git add -i' is identical, we avoid a second test case for that similar operation. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:02:47 -07:00
Junio C Hamano	cc14ba68d7	Merge branch 'ps/meson-build-perf-bench' The build procedure based on Meson learned to drive the benchmarking tests. * ps/meson-build-perf-bench: meson: wire up benchmarking options meson: wire up benchmarks t/perf: fix benchmarks with out-of-tree builds t/perf: use configured PERL_PATH t/perf: fix benchmarks with alternate repo formats	2025-05-05 14:56:25 -07:00
Junio C Hamano	87b0875425	Merge branch 'jk/p5332-testfix' A test fix. * jk/p5332-testfix: p5332: drop "+" from --stdin-packs input	2025-04-29 14:21:32 -07:00
Junio C Hamano	8bb81ccfad	Merge branch 'js/git-perf-env-override' Developer support fix.. * js/git-perf-env-override: perf: do allow `GIT_PERF_*` to be overridden again	2025-04-29 14:21:26 -07:00
Patrick Steinhardt	5756ccd181	t/perf: fix benchmarks with out-of-tree builds The "perf-lib.sh" script is sourced by all of our benchmarking suites to make available common infrastructure. The script assumes that build and source directory are the same, which works for our Makefile. But the assumption breaks with both CMake and Meson, where the build directory can be located in an arbitrary place. Adapt the script so that it works with out-of-tree builds. Most importantly, this requires us to figure out the location of the build directory: - When running benchmarks via our Makefile the build directory is the same as the source directory. We already know to derive the test directory ("t/") via `$(pwd)/..`, which works because we chdir into "t/perf" before executing benchmarks. We can thus derive the build directory by appending another "/.." to that path. - When running benchmarks via Meson the build directory is located at an arbitrary location. The build system thus has to make the path known by exporting the `GIT_BUILD_DIR` environment variable. This change prepares us for wiring up benchmarks in Meson. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-28 13:13:52 -07:00
Patrick Steinhardt	d84b990883	t/perf: use configured PERL_PATH Our benchmarks use a couple of Perl scripts to compute results. These Perl scripts get executed directly, and as the shebang is hardcoded to "/usr/bin/perl" this will fail on any system where the Perl interpreter is located in a different path. Our build infrastructure already lets users configure the location of Perl, which ultimately gets written into the GIT-BUILD-OPTIONS file. This file is being sourced by "test-lib.sh", and consequently we already have the "PERL_PATH" variable available that contains its configured location. Use "PERL_PATH" to execute Perl scripts, which makes them work on more esoteric systems like NixOS. Furthermore, adapt the shebang to use env(1) to execute Perl so that users who have Perl in PATH, but in a non-standard location can execute the script directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-28 13:13:51 -07:00
Patrick Steinhardt	5a6b9c8155	t/perf: fix benchmarks with alternate repo formats Many of our benchmarks operate on a user-defined repository that we copy over before running the benchmarked logic. To keep unintentional side effects caused by on-disk state at bay we skip copying some files. This includes for example hooks, but also the repo's configuration. It is quite sensible to not copy over the configuration, as it is quite easy to inadvertently carry over configuration that may significantly impact the performance measurements. But we cannot fully ignore the configuration either, as it may contain information about the repository format. This will cause failures when for example using a repository with SHA256 object format or the reftable ref format. Fix the issue by parsing the reference and object formats from the source repository and passing them to git-init(1). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-28 13:13:51 -07:00
Jeff King	1aa50636fd	p5332: drop "+" from --stdin-packs input This perf script creates a midx by running "git multi-pack-index write" with the "--stdin-packs" option. We feed that stdin by running "find" on .git/objects/pack, using sed to strip off everything but the basename. But that sed invocation also does something peculiar: it adds a "+" to the start of each pack name. This causes the multi-pack-index command to barf. The modified name does not match any pack it knows about, so it ends up with an empty list of packs to put in the midx. And thus nothing matches the --preferred-pack option we pass, which causes it die(). The fix is to remove the extra "+" (which also lets us simplify the sed invocation a bit, as it is now just stripping the leading directories). But that leaves the mystery of why it was ever there in the first place. The answer is that an earlier iteration of the patch series had a concept of "disjoint" packs in the midx. And one of its patches here: https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/ taught read_packs_from_stdin() to treat a leading "+" as marking a disjoint pack. But in the second version of the series, which was ultimately merged, that disjoint concept went away, and the code to parse "+" did likewise. The regular regression tests were adjusted to match, but this case in t/perf was forgotten. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-22 11:08:24 -07:00
Johannes Schindelin	32b74b9809	perf: do allow `GIT_PERF_` to be overridden again A common way to run Git's performance benchmarks on repositories other than Git's own repository (which is not exactly large when compared to actually large repositories) is to run them like this: GIT_PERF_LARGE_REPO=/path/to/my/large/repo \ ./p1234-.sh -ivx Contrary to developers' common expectations, this failed to work when Git was built with a different `GIT_PERF_LARGE_REPO` value specified at build time: That build-time option would have been written to the `GIT-BUILD-OPTIONS` file, which in turn would have been sourced by `test-lib.sh`, which in turn would have been sourced by `perf-lib.sh`, which in turn would have been sourced by the perf test script, _overriding_ the environment variable specified in the way illustrated above. Since perf tests are not run as part of the build, this most likely unintended behavior was not caught and certainly not fixed, as the `GIT_PERF_` values would have been empty at build-time. However, in `4638e8806e` (Makefile: use common template for GIT-BUILD-OPTIONS, 2024-12-06), a subtle change of behavior was introduced: Whereas before, a couple of build-time options (the `GIT_PERF_` ones included) were written to `GIT-BUILD-OPTIONS` only when their values were non-empty. With this commit, they are also written when they are empty. The consequence is that above-mentioned way to run the perf tests will not only fail to pick up the desired `GIT_PERF_` settings when they were specified differently while building Git, instead the desired settings will be only respected when specified _while building_ Git. Let's work around the original issue, i.e. let `GIT_PERF_` environment variables override what is recorded in `GIT-BUILD-OPTIONS`. Note that this is just the tip of the iceberg, there are a couple of `GIT_TEST_` options that may want a similar fix in `test-lib.sh`. Due to time constraints on my side, this here patch focuses exclusively on the `GIT_PERF_` settings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-20 14:13:05 -07:00
Philippe Blain	1665f12fa0	p7821: fix instructions for testing with threads In `7b31b55db1` (perf: amend the grep tests to test grep.threads, 2017-12-29), p7821 was tweaked to test the performance of 'git grep' under different number of threads. These tests are run if GIT_PERF_GREP_THREADS is set to a list of thread numbers, but the comment at the top of the file instead mentions GIT_PERF_7821_THREADS. Fix the comment. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-14 14:48:12 -07:00
Philippe Blain	3d358ad524	p9210: fix 'scalar clone' when running from a detached HEAD In p9210-scalar-clone.sh, we test using 'scalar clone' to clone $GIT_PERF_LARGE_REPO (copied locally as 'to-clone'), which defaults to the git.git checkout we are running the test from. When --branch is not specified (as in this test), 'scalar clone' tries to get the default branch of the remote repository by parsing the output of 'git ls-remote --symref $URL HEAD', as implemented in scalar.c:remote_default_branch. When the git.git checkout we are running the test from is in detached HEAD, this fails and we fall back to using the name of the currently checked out branch in the newly initialized repository, which in this case is the value returned earlier in cmd_clone by repo_default_branch_name. We then invoke 'git checkout -t origin/$branch', with $branch being the name we got from remote_default_branch. This invocation fails if '$branch' does not exist as a branch in the current git.git checkout. Fix this by creating a local branch in 'to-clone' in the setup test "enable server-side partial clone", making sure to use '-B' in case a branch named 'test-branch' already exists. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-28 20:30:56 -07:00
Philippe Blain	d17cd9768c	p7821: fix test_perf invocation for prereqs Since `5dccd9155f` (t/perf: add iteration setup mechanism to perf-lib, 2022-04-04), perf tests need to declare their prerequisites with '--prereq', after the test title. p7821 was forgotten in that commit, such that running that test on a machine where the PCRE prereq is not satisfied aborts the test with: error: bug in the test script: test_wrapper_ needs 2 positional parameters Fix this by correcting the two 'test_perf' invocations in that test suite. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-28 20:30:56 -07:00
Junio C Hamano	092180990d	Merge branch 'ad/set-default-target-in-makefiles' Correct the default target in Documentation/Makefile, and future-proof all Makefiles from similar breakages by declaring the default target (which happens to be "all") upfront. * ad/set-default-target-in-makefiles: Makefile: set default goals in makefiles	2025-02-25 14:19:36 -08:00
Adam Dinwoodie	5309c1e9fb	Makefile: set default goals in makefiles Explicitly set the default goal at the very top of various makefiles. This is already present in some makefiles, but not all of them. In particular, this corrects a regression introduced in `a38edab7c8` (Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06). That commit added some config files as build targets for the Documentation directory, and put the target configuration in a sensible place. Unfortunately, that sensible place was above any other build target definitions, meaning the default goal changed to being those configuration files only, rather than the HTML and man page documentation. Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Helped-by: Junio C Hamano <gitster@pobox.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:02:26 -08:00
Junio C Hamano	aae91a86fb	Merge branch 'ds/name-hash-tweaks' "git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version	2025-02-12 10:08:51 -08:00
Derrick Stolee	7f9870794f	test-tool: add helper for name-hash values Add a new test-tool helper, name-hash, to output the value of the name-hash algorithms for the input list of strings, one per line. Since the name-hash values can be stored in the .bitmap files, it is important that these hash functions do not change across Git versions. Add a simple test to t5310-pack-bitmaps.sh to provide some testing of the current values. Due to how these functions are implemented, it would be difficult to change them without disturbing these values. The paths used for this test are carefully selected to demonstrate some of the behavior differences of the two current name hash versions, including which conditions will cause them to collide. Create a performance test that uses test_size to demonstrate how collisions occur for these hash algorithms. This test helps inform someone as to the behavior of the name-hash algorithms for their repo based on the paths at HEAD. My copy of the Git repository shows modest statistics around the collisions of the default name-hash algorithm: Test this tree -------------------------------------------------- 5314.1: paths at head 4.5K 5314.2: distinct hash value: v1 4.1K 5314.3: maximum multiplicity: v1 13 5314.4: distinct hash value: v2 4.2K 5314.5: maximum multiplicity: v2 9 Here, the maximum collision multiplicity is 13, but around 10% of paths have a collision with another path. In a more interesting example, the microsoft/fluentui [1] repo had these statistics at time of committing: Test this tree -------------------------------------------------- 5314.1: paths at head 19.5K 5314.2: distinct hash value: v1 8.2K 5314.3: maximum multiplicity: v1 279 5314.4: distinct hash value: v2 17.8K 5314.5: maximum multiplicity: v2 44 [1] https://github.com/microsoft/fluentui That demonstrates that of the nearly twenty thousand path names, they are assigned around eight thousand distinct values. 279 paths are assigned to a single value, leading the packing algorithm to sort objects from those paths together, by size. With the v2 name hash function, the maximum multiplicity lowers to 44, leaving some room for further improvement. In a more extreme example, an internal monorepo had a much worse collision rate: Test this tree -------------------------------------------------- 5314.1: paths at head 227.3K 5314.2: distinct hash value: v1 72.3K 5314.3: maximum multiplicity: v1 14.4K 5314.4: distinct hash value: v2 166.5K 5314.5: maximum multiplicity: v2 138 Here, we can see that the v2 name hash function provides somem improvements, but there are still a number of collisions that could lead to repacking problems at this scale. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	30696be71f	p5313: add size comparison test As custom options are added to 'git pack-objects' and 'git repack' to adjust how compression is done, use this new performance test script to demonstrate their effectiveness in performance and size. The recently-added --name-hash-version option allows for testing different name hash functions. Version 2 intends to preserve some of the locality of version 1 while more often breaking collisions due to long filenames. Distinguishing objects by more of the path is critical when there are many name hash collisions and several versions of the same path in the full history, giving a significant boost to the full repack case. The locality of the hash function is critical to compressing something like a shallow clone or a thin pack representing a push of a single commit. This can be seen by running pt5313 on the open source fluentui repository [1]. Most commits will have this kind of output for the thin and big pack cases, though certain commits (such as [2]) will have problematic thin pack size for other reasons. [1] https://github.com/microsoft/fluentui [2] a637a06df05360ce5ff21420803f64608226a875 Checked out at the parent of [2], I see the following statistics: Test HEAD --------------------------------------------------------------- 5313.2: thin pack with version 1 0.37(0.44+0.02) 5313.3: thin pack size with version 1 1.2M 5313.4: big pack with version 1 2.04(7.77+0.23) 5313.5: big pack size with version 1 20.4M 5313.6: shallow fetch pack with version 1 1.41(2.94+0.11) 5313.7: shallow pack size with version 1 34.4M 5313.8: repack with version 1 95.70(676.41+2.87) 5313.9: repack size with version 1 439.3M 5313.10: thin pack with version 2 0.12(0.12+0.06) 5313.11: thin pack size with version 2 22.0K 5313.12: big pack with version 2 2.80(5.43+0.34) 5313.13: big pack size with version 2 25.9M 5313.14: shallow fetch pack with version 2 1.77(2.80+0.19) 5313.15: shallow pack size with version 2 33.7M 5313.16: repack with version 2 33.68(139.52+2.58) 5313.17: repack size with version 2 160.5M To make comparisons easier, I will reformat this output into a different table style: \| Test \| V1 Time \| V2 Time \| V1 Size \| V2 Size \| \|--------------\|---------\|---------\|---------\|---------\| \| Thin Pack \| 0.37 s \| 0.12 s \| 1.2 M \| 22.0 K \| \| Big Pack \| 2.04 s \| 2.80 s \| 20.4 M \| 25.9 M \| \| Shallow Pack \| 1.41 s \| 1.77 s \| 34.4 M \| 33.7 M \| \| Repack \| 95.70 s \| 33.68 s \| 439.3 M \| 160.5 M \| The v2 hash function successfully differentiates the CHANGELOG.md files from each other, which leads to significant improvements in the thin pack (simulating a push of this commit) and the full repack. There is some bloat in the "big pack" scenario and essentially the same results for the shallow pack. In the case of the Git repository, these numbers show some of the issues with this approach: \| Test \| V1 Time \| V2 Time \| V1 Size \| V2 Size \| \|--------------\|---------\|---------\|---------\|---------\| \| Thin Pack \| 0.02 s \| 0.02 s \| 1.1 K \| 1.1 K \| \| Big Pack \| 1.69 s \| 1.95 s \| 13.5 M \| 14.5 M \| \| Shallow Pack \| 1.26 s \| 1.29 s \| 12.0 M \| 12.2 M \| \| Repack \| 29.51 s \| 29.01 s \| 237.7 M \| 238.2 M \| Here, the attempts to remove conflicts in the v2 function seem to cause slight bloat to these sizes. This shows that the Git repository benefits a lot from cross-path delta pairs. The results are similar with the nodejs/node repo: \| Test \| V1 Time \| V2 Time \| V1 Size \| V2 Size \| \|--------------\|---------\|---------\|---------\|---------\| \| Thin Pack \| 0.02 s \| 0.02 s \| 1.6 K \| 1.6 K \| \| Big Pack \| 4.61 s \| 3.26 s \| 56.0 M \| 52.8 M \| \| Shallow Pack \| 7.82 s \| 7.51 s \| 104.6 M \| 107.0 M \| \| Repack \| 88.90 s \| 73.75 s \| 740.1 M \| 764.5 M \| Here, the v2 name-hash causes some size bloat more often than it reduces the size, but it also universally improves performance time, which is an interesting reversal. This must mean that it is helping to short-circuit some delta computations even if it is not finding the most efficient ones. The performance improvement cannot be explained only due to the I/O cost of writing the resulting packfile. The Linux kernel repository was the initial target of the default name hash value, and its naming conventions are practically build to take the most advantage of the default name hash values: \| Test \| V1 Time \| V2 Time \| V1 Size \| V2 Size \| \|--------------\|----------\|----------\|---------\|---------\| \| Thin Pack \| 0.17 s \| 0.07 s \| 4.6 K \| 4.6 K \| \| Big Pack \| 17.88 s \| 12.35 s \| 201.1 M \| 159.1 M \| \| Shallow Pack \| 11.05 s \| 22.94 s \| 269.2 M \| 273.8 M \| \| Repack \| 727.39 s \| 566.95 s \| 2.5 G \| 2.5 G \| Here, the thin and big packs gain some performance boosts in time, with a modest gain in the size of the big pack. The shallow pack, however, is more expensive to compute, likely because similarly-named files across different directories are farther apart in the name hash ordering in v2. The repack also gains benefits in computation time but no meaningful change to the full size. Finally, an internal Javascript repo of moderate size shows significant gains when repacking with --name-hash-version=2 due to it having many name hash collisions. However, it's worth noting that only the full repack case has significant differences from the v1 name hash: \| Test \| V1 Time \| V2 Time \| V1 Size \| V2 Size \| \|-----------\|-----------\|----------\|---------\|---------\| \| Thin Pack \| 8.28 s \| 7.28 s \| 16.8 K \| 16.8 K \| \| Big Pack \| 12.81 s \| 11.66 s \| 29.1 M \| 29.1 M \| \| Shallow \| 4.86 s \| 4.06 s \| 42.5 M \| 44.1 M \| \| Repack \| 3126.50 s \| 496.33 s \| 6.2 G \| 855.6 M \| Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Junio C Hamano	73b7e03e9e	Merge branch 'jk/describe-perf' "git describe" optimization. * jk/describe-perf: describe: split "found all tags" and max_candidates logic describe: stop traversing when we run out of names describe: stop digging for max_candidates+1 t/perf: add tests for git-describe t6120: demonstrate weakness in disjoint-root handling	2024-12-15 17:54:25 -08:00
Junio C Hamano	e5b71577a6	Merge branch 'tb/use-test-file-size-more' Use the right helper program to measure file size in performance tests. * tb/use-test-file-size-more: t/perf: use 'test_file_size' in more places	2024-12-04 10:14:45 +09:00
Taylor Blau	3f97f1bce6	t/perf: use 'test_file_size' in more places The perf test suite prefers to use test_file_size over 'wc -c' when inside of a test_size block. One advantage is that accidentally writign "wc -c file" (instead of "wc -c <file") does not inadvertently break the tests (since the former will include the filename in the output of wc). Both of the two uses of test_size use "wc -c", but let's convert those to the more conventional test_file_size helper instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-22 09:44:34 +09:00
Jeff King	bb0830c682	t/perf: add tests for git-describe We don't have a perf script for git-describe, despite it often being accused of slowness. Let's add a few simple tests to start with. Rather than use the existing tags from our test repo, we'll make our own so that we have a known quantity and position. We'll add a "new" tag near the tip of HEAD, and an "old" one that is at the very bottom. And then our tests are: 1. Describing HEAD naively requires walking all the way down to the old tag as we collect candidates. This gives us a baseline for what "slow" looks like. 2. Doing the same with --candidates=1 can potentially be fast, because we can quie after finding "new". But we don't, and it's also slow. 3. Likewise we should be able to quit when there are no more tags to find. This can happen naturally if a repo has few tags, but also if you restrict the set of tags with --match. Here are the results running against linux.git. Note that I have a commit-graph built for the repo, so "slow" here is ~700ms. Without a commit graph it's more like 9s! Test HEAD -------------------------------------------------------------- 6100.2: describe HEAD 0.70(0.66+0.04) 6100.3: describe HEAD with one max candidate 0.70(0.66+0.04) 6100.4: describe HEAD with one tag 0.70(0.64+0.06) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-07 13:28:22 +09:00
Andrew Kreimer	050e0ef6ea	t/perf: fix typos Fix typos via codespell. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-10 13:31:13 -07:00
Derrick Stolee	4b707a6e99	p1500: add is-base performance tests The previous two changes introduced a commit walking heuristic for finding the most likely base branch for a given source. This algorithm walks first-parent histories until reaching a collision. This walk _should_ be very fast. Exceptions include cases where a commit-graph file does not exist, leading to a full walk of all reachable commits to compute generation numbers, or a case where no collision in the first-parent history exists, leading to a walk of all first-parent history to the root commits. The p1500 test script guarantees a complete commit-graph file during its setup, so we will not test that scenario. Do create a new root commit in an effort to test the scenario of parallel first-parent histories. Even with the extra root commit, these tests take no longer than 0.02 seconds on my machine for the Git repository. However, the results are slightly more interesting in a copy of the Linux kernel repository: Test --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.12 1500.3: ahead-behind counts: git branch 0.12 1500.4: ahead-behind counts: git tag 0.12 1500.5: contains: git for-each-ref --merged 0.04 1500.6: contains: git branch --merged 0.04 1500.7: contains: git tag --merged 0.04 1500.8: is-base check: test-tool reach (refs) 0.03 1500.9: is-base check: test-tool reach (tags) 0.03 1500.10: is-base check: git for-each-ref 0.03 1500.11: is-base check: git for-each-ref (disjoint-base) 0.07 Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-14 10:10:06 -07:00
Taylor Blau	0b7500dc66	t/perf: implement performance tests for pseudo-merge bitmaps Implement a straightforward performance test demonstrating the benefit of pseudo-merge bitmaps by measuring how long it takes to count reachable objects in a few different scenarios: - without bitmaps, to demonstrate a reasonable baseline - with bitmaps, but without pseudo-merges - with bitmaps and pseudo-merges Results from running this test on git.git are as follows: Test this tree ----------------------------------------------------------------------------------- 5333.2: git rev-list --count --all --objects (no bitmaps) 3.54(3.45+0.08) 5333.3: git rev-list --count --all --objects (no pseudo-merges) 0.43(0.40+0.03) 5333.4: git rev-list --count --all --objects (with pseudo-merges) 0.12(0.11+0.01) On a private repository which is much larger, and has many spikey parts of history that aren't merged into the 'master' branch, the results are as follows: Test this tree --------------------------------------------------------------------------------------- 5333.1: git rev-list --count --all --objects (no bitmaps) 122.29(121.31+0.97) 5333.2: git rev-list --count --all --objects (no pseudo-merges) 21.88(21.30+0.58) 5333.3: git rev-list --count --all --objects (with pseudo-merges) 5.05(4.77+0.28) Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:44 -07:00
Beat Bolli	108e18acc3	t/perf: avoid redundant use of cat Take care to redirect stdin, otherwise the output of wc would also contain the file name. Signed-off-by: Beat Bolli <dev+git@drbeat.li> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-16 11:08:56 -07:00
Junio C Hamano	0fea6b73f1	Merge branch 'tb/multi-pack-verbatim-reuse' Streaming spans of packfile data used to be done only from a single, primary, pack in a repository with multiple packfiles. It has been extended to allow reuse from other packfiles, too. * tb/multi-pack-verbatim-reuse: (26 commits) t/perf: add performance tests for multi-pack reuse pack-bitmap: enable reuse from all bitmapped packs pack-objects: allow setting `pack.allowPackReuse` to "single" t/test-lib-functions.sh: implement `test_trace2_data` helper pack-objects: add tracing for various packfile metrics pack-bitmap: prepare to mark objects from multiple packs for reuse pack-revindex: implement `midx_pair_to_pack_pos()` pack-revindex: factor out `midx_key_to_pack_pos()` helper midx: implement `midx_preferred_pack()` git-compat-util.h: implement checked size_t to uint32_t conversion pack-objects: include number of packs reused in output pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse pack-objects: prepare `write_reused_pack()` for multi-pack reuse pack-objects: pass `bitmapped_pack`'s to pack-reuse functions pack-objects: keep track of `pack_start` for each reuse pack pack-objects: parameterize pack-reuse routines over a single pack pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature ewah: implement `bitmap_is_empty()` pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions ...	2024-01-12 16:09:56 -08:00
Junio C Hamano	ec5ab1482d	Merge branch 'js/update-urls-in-doc-and-comment' Stale URLs have been updated to their current counterparts (or archive.org) and HTTP links are replaced with working HTTPS links. * js/update-urls-in-doc-and-comment: doc: refer to internet archive doc: update links for andre-simon.de doc: switch links to https doc: update links to current pages	2023-12-18 14:10:12 -08:00
Taylor Blau	ba47d88795	t/perf: add performance tests for multi-pack reuse To ensure that we don't regress either the size or runtime performance of multi-pack reuse, add a performance test to measure both of these. The test partitions the objects in GIT_TEST_PERF_LARGE_REPO into 1, 10, and 100 packs, and then tries to perform a "clone" at each stage with both single- and multi-pack reuse enabled. Note that the `repack_into_n_chunks()` function in this new test script differs from the existing `repack_into_n()`. The former partitions the repository into N equal-sized chunks, while the latter produces N packs of five commits each (plus their objects), and then another pack with the remainder. On git.git, I can produce the following results on my machine: Test this tree -------------------------------------------------------------------------------- 5332.3: clone for 1-pack scenario (single-pack reuse) 1.57(2.99+0.15) 5332.4: clone size for 1-pack scenario (single-pack reuse) 231.8M 5332.5: clone for 1-pack scenario (multi-pack reuse) 1.79(2.96+0.21) 5332.6: clone size for 1-pack scenario (multi-pack reuse) 231.7M 5332.9: clone for 10-pack scenario (single-pack reuse) 3.89(16.75+0.35) 5332.10: clone size for 10-pack scenario (single-pack reuse) 209.9M 5332.11: clone for 10-pack scenario (multi-pack reuse) 1.56(2.99+0.17) 5332.12: clone size for 10-pack scenario (multi-pack reuse) 224.4M 5332.15: clone for 100-pack scenario (single-pack reuse) 8.24(54.31+0.59) 5332.16: clone size for 100-pack scenario (single-pack reuse) 278.3M 5332.17: clone for 100-pack scenario (multi-pack reuse) 2.13(2.44+0.33) 5332.18: clone size for 100-pack scenario (multi-pack reuse) 357.9M Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-14 14:38:09 -08:00
Junio C Hamano	98d0a1f93e	Merge branch 'vd/for-each-ref-unsorted-optimization' "git for-each-ref --no-sort" still sorted the refs alphabetically which paid non-trivial cost. It has been redefined to show output in an unspecified order, to allow certain optimizations to take advantage of. * vd/for-each-ref-unsorted-optimization: t/perf: add perf tests for for-each-ref ref-filter.c: use peeled tag for '*' format fields for-each-ref: clean up documentation of --format ref-filter.c: filter & format refs in the same callback ref-filter.c: refactor to create common helper functions ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()' ref-filter.h: add functions for filter/format & format-only ref-filter.h: move contains caches into filter ref-filter.h: add max_count and omit_empty to ref_format ref-filter.c: really don't sort when using --no-sort	2023-12-09 16:37:50 -08:00
Josh Soref	d05b08cd52	doc: switch links to https These sites offer https versions of their content. Using the https versions provides some protection for users. Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-26 10:07:05 +09:00
Victoria Dye	294bfc2441	t/perf: add perf tests for for-each-ref Add performance tests for 'for-each-ref'. The tests exercise different combinations of filters/formats/options, as well as the overall performance of 'git for-each-ref \| git cat-file --batch-check' to demonstrate the performance difference vs. 'git for-each-ref' with "%(fieldname)" format specifiers. All tests are run against a repository with 40k loose refs - 10k commits, each having a unique: - branch - custom ref (refs/custom/special_) - annotated tag pointing at the commit - annotated tag pointing at the other annotated tag (i.e., a nested tag) After those tests are finished, the refs are packed with 'pack-refs --all' and the same tests are rerun. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-16 14:03:01 +09:00
Patrick Steinhardt	13420028e5	global: convert trivial usages of `test <expr> -a/-o <expr>` Our coding guidelines say to not use `test` with `-a` and `-o` because it can easily lead to bugs. Convert trivial cases where we still use these to instead instead concatenate multiple invocations of `test` via `&&` and `\|\|`, respectively. While not all of the converted instances can cause ambiguity, it is worth getting rid of all of them regardless: - It becomes easier to reason about the code as we do not have to argue why one use of `-a`/`-o` is okay while another one isn't. - We don't encourage people to use these expressions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-11 09:21:00 +09:00
Shuqi Liang	f9815878c1	check-attr: integrate with sparse-index Set the requires-full-index to false for "check-attr". Add a test to ensure that the index is not expanded whether the files are outside or inside the sparse-checkout cone when the sparse index is enabled. The `p2000` tests demonstrate a ~63% execution time reduction for 'git check-attr' using a sparse index. Test before after ----------------------------------------------------------------------- 2000.106: git check-attr -a f2/f4/a (full-v3) 0.05 0.05 +0.0% 2000.107: git check-attr -a f2/f4/a (full-v4) 0.05 0.05 +0.0% 2000.108: git check-attr -a f2/f4/a (sparse-v3) 0.04 0.02 -50.0% 2000.109: git check-attr -a f2/f4/a (sparse-v4) 0.04 0.01 -75.0% Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-11 09:44:52 -07:00
Junio C Hamano	a813d9e239	Merge branch 'sl/worktree-sparse' "git worktree" learned to work better with sparse index feature. * sl/worktree-sparse: worktree: integrate with sparse-index	2023-06-23 11:21:16 -07:00
Shuqi Liang	8fac776f44	worktree: integrate with sparse-index The index is read in 'worktree.c' at two points: 1.The 'validate_no_submodules' function, which checks if there are any submodules present in the worktree. 2.The 'check_clean_worktree' function, which verifies if a worktree is 'clean', i.e., there are no untracked or modified but uncommitted files. This is done by running the 'git status' command, and an error message is thrown if the worktree is not clean. Given that 'git status' is already sparse-aware, the function is also sparse-aware. Hence we can just set the requires-full-index to false for "git worktree". Add tests that verify that 'git worktree' behaves correctly when the sparse index is enabled and test to ensure the index is not expanded. The `p2000` tests demonstrate a ~20% execution time reduction for 'git worktree' using a sparse index: (Note:the p2000 test results didn't reflect the huge speedup because of the index reading time is minuscule comparing to the filesystem operations.) Test before after ----------------------------------------------------------------------- 2000.102: git worktree add....(full-v3) 3.15 2.82 -10.5% 2000.103: git worktree add....(full-v4) 3.14 2.84 -9.6% 2000.104: git worktree add....(sparse-v3) 2.59 2.14 -16.4% 2000.105: git worktree add....(sparse-v4) 2.10 1.57 -25.2% Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Acked-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-12 12:13:58 -07:00
Shuqi Liang	48c5fbfb89	diff-tree: integrate with sparse index The index is read in 'cmd_diff_tree' at two points: 1. The first index read was added in `fd66bcc31f` (diff-tree: read the index so attribute checks work in bare repositories, 2017-12-06) to deal with reading '.gitattributes' content. `77efbb366a` (attr: be careful about sparse directories, 2021-09-08) established that, in a sparse index, we do _not_ try to load a '.gitattributes' file from within a sparse directory. 2. The second index access point is involved in rename detection, specifically when reading from stdin.This was initially added in `f0c6b2a2fd` ([PATCH] Optimize diff-tree -[CM]--stdin, 2005-05-27), where 'setup' was set to 'DIFF_SETUP_USE_SIZE_CACHE \|DIFF_SETUP_USE_CACHE'. That assignment was later modified to drop the'DIFF_SETUP_USE_CACHE' in `ff7fe37b05` (diff.c: move read_index() code back to the caller, 2018-08-13).However, 'DIFF_SETUP_USE_SIZE_CACHE' seems to be unused as of `6e0b8ed6d3` (diff.c: do not use a separate "size cache"., 2007-05-07) and nothing about 'detect_rename' otherwise indicates index usage. Hence we can just set the requires-full-index to false for "diff-tree". Add tests that verify that 'git diff-tree' behaves correctly when the sparse index is enabled and test to ensure the index is not expanded. The `p2000` tests demonstrate a ~98% execution time reduction for 'git diff-tree' using a sparse index: Test before after ----------------------------------------------------------------------- 2000.94: git diff-tree HEAD (full-v3) 0.05 0.04 -20.0% 2000.95: git diff-tree HEAD (full-v4) 0.06 0.05 -16.7% 2000.96: git diff-tree HEAD (sparse-v3) 0.59 0.01 -98.3% 2000.97: git diff-tree HEAD (sparse-v4) 0.61 0.01 -98.4% 2000.98: git diff-tree HEAD -- f2/f4/a (full-v3) 0.05 0.05 +0.0% 2000.99: git diff-tree HEAD -- f2/f4/a (full-v4) 0.05 0.04 -20.0% 2000.100: git diff-tree HEAD -- f2/f4/a (sparse-v3) 0.58 0.01 -98.3% 2000.101: git diff-tree HEAD -- f2/f4/a (sparse-v4) 0.55 0.01 -98.2% Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-18 10:40:33 -07:00
Junio C Hamano	5ca11547bb	Merge branch 'sl/diff-files-sparse' Teach "diff-files" not to expand sparse-index unless needed. * sl/diff-files-sparse: diff-files: integrate with sparse index t1092: add tests for `git diff-files`	2023-05-15 13:59:06 -07:00
Shuqi Liang	8c30be9176	diff-files: integrate with sparse index Remove full index requirement for `git diff-files`. Refactor the ensure_expanded and ensure_not_expanded functions by introducing a common helper function, ensure_index_state. Add test to ensure the index is no expanded in `git diff-files`. The `p2000` tests demonstrate a ~96% execution time reduction for 'git diff-files' and a ~97% execution time reduction for 'git diff-files' for a file using a sparse index: Test before after ----------------------------------------------------------------------- 2000.94: git diff-files (full-v3) 0.09 0.08 -11.1% 2000.95: git diff-files (full-v4) 0.09 0.09 +0.0% 2000.96: git diff-files (sparse-v3) 0.52 0.02 -96.2% 2000.97: git diff-files (sparse-v4) 0.51 0.02 -96.1% 2000.98: git diff-files -- f2/f4/a (full-v3) 0.06 0.07 +16.7% 2000.99: git diff-files -- f2/f4/a (full-v4) 0.08 0.08 +0.0% 2000.100: git diff-files -- f2/f4/a (sparse-v3) 0.46 0.01 -97.8% 2000.101: git diff-files -- f2/f4/a (sparse-v4) 0.51 0.02 -96.1% Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-09 14:26:36 -07:00
Junio C Hamano	849c8b3dbf	Merge branch 'tb/pack-revindex-on-disk' The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()	2023-04-27 16:00:59 -07:00
Junio C Hamano	7ac228c994	Merge branch 'rn/sparse-describe' "git describe --dirty" learns to work better with sparse-index. * rn/sparse-describe: describe: enable sparse index for describe	2023-04-21 15:35:04 -07:00
Junio C Hamano	d47ee0a565	Merge branch 'sl/sparse-write-tree' "git write-tree" learns to work better with sparse-index. * sl/sparse-write-tree: write-tree: integrate with sparse index	2023-04-17 18:05:11 -07:00
Taylor Blau	a8dd7e05b1	config: enable `pack.writeReverseIndex` by default Back in `e37d0b8730` (builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms] Range (min … max): 290.3 ms … 331.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms] Range (min … max): 1.6 ms … 4.6 ms 1155 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' In the more than two years since `e37d0b8730` was merged, Git's implementation of on-disk reverse indexes has been thoroughly tested, both from users enabling `pack.writeReverseIndexes`, and from GitHub's deployment of the feature. The latter has been running without incident for more than two years. This patch changes Git's behavior to write on-disk reverse indexes by default when indexing a pack, which should make the above operations faster for everybody's Git installation after a repack. (The previous commit explains some potential drawbacks of using on-disk reverse indexes in certain limited circumstances, that essentially boil down to a trade-off between time to generate, and time to access. For those limited cases, the `pack.readReverseIndex` escape hatch can be used). [1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Junio C Hamano	7727da99df	Merge branch 'ds/ahead-behind' "git for-each-ref" learns '%(ahead-behind:<base>)' that computes the distances from a single reference point in the history with bunch of commits in bulk. * ds/ahead-behind: commit-reach: add tips_reachable_from_bases() for-each-ref: add ahead-behind format atom commit-reach: implement ahead_behind() logic commit-graph: introduce `ensure_generations_valid()` commit-graph: return generation from memory commit-graph: simplify compute_generation_numbers() commit-graph: refactor compute_topological_levels() for-each-ref: explicitly test no matches for-each-ref: add --stdin option	2023-04-06 13:38:21 -07:00
Shuqi Liang	1a65b41b38	write-tree: integrate with sparse index Update 'git write-tree' to allow using the sparse-index in memory without expanding to a full one. The recursive algorithm for update_one() was already updated in `2de37c5` (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Hence we can just set the requires-full-index to false for "write-tree". The `p2000` tests demonstrate a ~96% execution time reduction for 'git write-tree' using a sparse index: Test before after ----------------------------------------------------------------- 2000.78: git write-tree (full-v3) 0.34 0.33 -2.9% 2000.79: git write-tree (full-v4) 0.32 0.30 -6.3% 2000.80: git write-tree (sparse-v3) 0.47 0.02 -95.8% 2000.81: git write-tree (sparse-v4) 0.45 0.02 -95.6% Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-04 12:50:54 -07:00

1 2 3 4 5 ...

401 Commits (next)