kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Patrick Steinhardt	5e7fe8a7b8	commit-reach: use `size_t` to track indices when computing merge bases The functions `repo_get_merge_bases_many()` and friends accepts an array of commits as well as a parameter that indicates how large that array is. This parameter is using a signed integer, which leads to a couple of warnings with -Wsign-compare. Refactor the code to use `size_t` to track indices instead and adapt callers accordingly. While most callers are trivial, there are two callers that require a bit more scrutiny: - builtin/merge-base.c:show_merge_base() subtracts `1` from the `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if the variable was `0` it would wrap. This code is fine though because its only caller will execute that code only when `argc >= 2`, and it follows that `rev_nr >= 2`, as well. - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`. Again, there is only a single caller that populates `rev_nr` with `good_revs.nr`. And because a bisection always requires at least one good revision it follws that `rev_nr >= 1`. Mark the file as -Wsign-compare-clean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:12:40 -08:00
Patrick Steinhardt	85ee0680e2	commit-reach: use `size_t` to track indices in `get_reachable_subset()` Similar as with the preceding commit, adapt `get_reachable_subset()` so that it tracks array indices via `size_t` instead of using signed integers to fix a couple of -Wsign-compare warnings. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:45 -08:00
Patrick Steinhardt	04aeeeaab1	commit-reach: fix type of `min_commit_date` The `can_all_from_reach_with_flag()` function accepts a parameter that allows callers to cut off traversal at a specified commit date. This parameter is of type `time_t`, which is a signed type, while we end up comparing it to a commit's `date` field, which is of the unsigned type `timestamp_t`. Fix the parameter to be of type `timestamp_t`. There is only a single caller in "upload-pack.c" that sets this parameter, and that caller knows to pass in a `timestamp_t` already. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:45 -08:00
Derrick Stolee	e32eaf73b0	commit-reach: add get_branch_base_for_tip Add a new reachability algorithm that intends to discover (from a heuristic) which branch was used as the starting point for a given commit. Add focused tests using the 'test-tool reach' command. In repositories that use pull requests (or merge requests) to advance one or more "protected" branches, the history of that reference can be recovered by following the first-parent history in most cases. Most are completed using no-fast-forward merges, though squash merges are quite common. Less common is rebase-and-merge, which still validates this assumption. Finally, the case that breaks this assumption is the fast-forward update (with potential rebasing). Even in this case, the previous commit commonly appears in the first-parent history of the branch. Similar assumptions can be made for a topic branch created by a single user with the intention to merge back into another branch. Using 'git commit', 'git merge', and 'git cherry-pick' from HEAD will default to having the first-parent commit be the previous commit at HEAD. This history changes only with commands such as 'git reset' or 'git rebase', where the command names also imply that the branch is starting from a new location. With this movement of branches in mind, the following heuristic is proposed as a way to determine the base branch for a given source branch: Among a list of candidate base branches, select the candidate that minimizes the number of commits in the first-parent history of the source that are not in the first-parent history of the candidate. Prior third-party solutions to this problem have used this optimization criteria, but have relied upon extracting the first-parent history and comparing those lists as tables instead of using commit-graph walks. Given current command-line interface options, this optimization criteria is not easy to detect directly. Even using the command git rev-list --count --first-parent <base>..<source> does not measure this count, as it uses full reachability from <base> to determine which commits to remove from the range '<base>..<source>'. This may lead to one asking if we should instead be using the full reachability of the candidate and only the first-parent history of the source. This, unfortunately, does not work for repositories that use long-lived branches and automation to merge across those branches. In extremely large repositories, merging into a single trunk may not be feasible. This is usually due to the desired frequency of updates (thousands of engineers doing daily work) combined with the time required to perform a validation build. These factors combine to create significant risk of semantic merge conflicts, leading to build breaks on the trunk. In response, repository maintainers can create a single Level Zero (L0) trunk and multiple Level One (L1) branches. By partitioning the engineers by organization, these engineers may see lower risk of semantic merge conflicts as well as be protected against build breaks in other L1 branches. The key to making this system work is a semi-automated process of merging L1 branches into the L0 trunk and vice-versa. In a large enough organization, these L1 branches may further split into L2 or L3 branches, but the same principles apply for merging across deeper levels. If these automated merges use a typical merge with the second parent bringing in the "new" content, then each L0 and L1 branch can track its previous positions by following first-parent history, which appear as parallel paths (until reaching the first place where the branches diverged). If we also walk to second parents, then the histories overlap significantly and cannot be distinguished except for very-recent changes. For this reason, the first-parent condition should be symmetrical across the base and source branches. Another common case for desiring the result of this optimization method is the use of release branches. When releasing a version of a repository, a branch can be used to track that release. Any updates that are worth fixing in that release can be merged to the release branch and shipped with only the necessary fixes without any new features introduced in the trunk branch. The 'maint-2.<X>' branches represent this pattern in the Git project. The microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This application doesn't need the symmetrical first-parent condition, but the use of first-parent histories does not change the results for these branches. To determine the base branch from a list of candidates, create a new method in commit-reach.c that performs a single* commit-graph walk. The core concept is to walk first-parents starting at the candidate bases and the source, tracking the "best" base to reach a given commit. Use generation numbers to ensure that a commit is walked at most once and all children have been explored before visiting it. When reaching a commit that is reachable from both a base and the source, we will then have a guarantee that this is the closest intersection of first-parent histories. Track the best base to reach that commit and return it as a result. In rare cases involving multiple root commits, the first-parent history of the source may never intersect any of the candidates and thus a null result is returned. * There are up to two walks, since we require all commits to have a computed generation number in order to avoid incorrect results. This is similar to the need for computed generation numbers in ahead_behind() as implemented in `fd67d149bd` (commit-reach: implement ahead_behind() logic, 2023-03-20). In order to track the "best" base, use a new commit slab that stores an integer. This value defaults to zero upon initialization, so use -1 to track that the source commit can reach this commit and use 'i + 1' to track that the ith base can reach this commit. When multiple bases can reach a commit, minimize the index to break ties. This allows the caller to specify an order to the bases that determines some amount of preference when the heuristic does not result in a unique result. The trickiest part of the integer slab is what happens when reaching a collision among the histories of the bases and the history of the source. This is noticed when viewing the first parent and seeing that it has a slab value that differs in sign (negative or positive). In this case, the collision commit is stored in the method variable 'branch_point' and its slab value is set to -1. The index of the best base (so far) is stored in the method variable 'best_index'. It is possible that there are multiple commits that have the branch_point as its first parent, leading to multiple updates of best_index. The result is determined when 'branch_point' is visited in the commit walk, giving the guarantee that all commits that could reach 'branch_point' were visited. Several interesting cases of collisions and different results are tested in the t6600-test-reach.sh script. Recall that this script also tests the algorithm in three possible states involving the commit-graph file and how many commits are written in the file. This provides some coverage of the need (and lack of need) for the ensure_generations_valid() method. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-14 10:10:05 -07:00
Johannes Schindelin	caaf1a2942	commit-reach(repo_get_merge_bases_many_dirty): pass on errors (Actually, this commit is only about passing on "missing commits" errors, but adding that to the commit's title would have made it too long.) The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases_many_dirty()` function is aware of that, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	5317380521	commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases_many()` function is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next stop: `repo_get_merge_bases_dirty()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	f87056ce40	commit-reach(get_octopus_merge_bases): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `get_merge_bases()` macro) is aware of that, too. Naturally, the callers need to be adjusted now, too. Next step: adjust `repo_get_merge_bases_many()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	76e2a09999	commit-reach(repo_get_merge_bases): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `repo_get_merge_bases()` macro) is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next step: adjust the callers of `get_octopus_merge_bases()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	207c40e1e4	commit-reach(repo_in_merge_bases_many): optionally expect missing commits Currently this function treats unrelated commit histories the same way as commit histories with missing commit objects. Typically, missing commit objects constitute a corrupt repository, though, and should be reported as such. The next commits will make it so, but there is one exception: In `git fetch --update-shallow` we _expect_ commit objects to be missing, and we do want to treat the now-incomplete commit histories as unrelated. To allow for that, let's introduce an additional parameter that is passed to `repo_in_merge_bases_many()` to trigger this behavior, and use it in the two callers in `shallow.c`. This commit changes behavior slightly: unless called from the `shallow.c` functions that set the `ignore_missing_commits` bit, any non-existing tip commit that is passed to `repo_in_merge_bases_many()` will now result in an error. Note: When encountering missing commits while traversing the commit history in search for merge bases, with this commit there won't be a change in behavior just yet, their children will still be interpreted as root commits. This bug will get fixed by follow-up commits. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 09:47:03 -08:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Ævar Arnfjörð Bjarmason	cb338c23d6	cocci: apply the "commit-reach.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "commit-reach.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:36 -07:00
Derrick Stolee	cbfe360b14	commit-reach: add tips_reachable_from_bases() Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use the ref-filter machinery to select references or branches (respectively) that are reachable from a set of commits presented by one or more --merged arguments. This happens within reach_filter(), which uses the revision-walk machinery to walk history in a standard way. However, the commit-reach.c file is full of custom searches that are more efficient, especially for reachability queries that can terminate early when reachability is discovered. Add a new tips_reachable_from_bases() method to commit-reach.c and call it from within reach_filter() in ref-filter.c. This affects both 'git branch' and 'git for-each-ref' as tested in p1500-graph-walks.sh. For the Linux kernel repository, we take an already-fast algorithm and make it even faster: Test HEAD~1 HEAD ------------------------------------------------------------------- 1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6% 1500.6: contains: git branch --merged 0.14 0.02 -85.7% 1500.7: contains: git tag --merged 0.15 0.03 -80.0% (Note that we remove the iterative 'git rev-list' test from p1500 because it no longer makes sense as a comparison to 'git for-each-ref' and would just waste time running it for these comparisons.) The algorithm is implemented in commit-reach.c in the method tips_reachable_from_base(). This method takes a string_list of tips and assigns the 'util' for each item with the value 1 if the base commit can reach those tips. Like other reachability queries in commit-reach.c, the fastest way to search for "can A reach B?" is to do a depth-first search up to the generation number of B, preferring to explore first parents before later parents. While we must walk all reachable commits up to that generation number when the answer is "no", the depth-first search can answer "yes" much faster than other approaches in most cases. This search becomes trickier when there are multiple targets for the depth-first search. The commits with lower generation number are more likely to be within the history of the start commit, but we don't want to waste time searching commits of low generation number if the commit target with lowest generation number has already been found. The trick here is to take the input commits and sort them by generation number in ascending order. Track the index within this order as min_generation_index. When we find a commit, if its index in the list is equal to min_generation_index, then we can increase the generation number boundary of our search to the next-lowest value in the list. With this mechanism, the number of commits to search is minimized with respect to the depth-first search heuristic. We will walk all commits up to the minimum generation number of a commit that is _not_ reachable from the start, but we will walk only the necessary portion of the depth-first search for the reachable commits of lower generation. Add extra tests for this behavior in t6600-test-reach.sh as the interesting data shape of that repository can sometimes demonstrate corner case bugs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	fd67d149bd	commit-reach: implement ahead_behind() logic Fully implement the commit-counting logic required to determine ahead/behind counts for a batch of commit pairs. This is a new library method within commit-reach.h. This method will be linked to the for-each-ref builtin in the next change. The interface for ahead_behind() uses two arrays. The first array of commits contains the list of all starting points for the walk. This includes all tip commits _and_ base commits. The second array specifies base/tip pairs by pointing to commits within the first array, by index. The second array also stores the resulting ahead/behind counts for each of these pairs. This implementation of ahead_behind() allows multiple bases, if desired. Even with multiple bases, there is only one commit walk used for counting the ahead/behind values, saving time when the base/tip ranges overlap significantly. This interface for ahead_behind() also makes it very easy to call ensure_generations_valid() on the entire array of bases and tips. This call is necessary because it is critical that the walk that counts ahead/behind values never walks a commit more than once. Without generation numbers on every commit, there is a possibility that a commit date skew could cause the walk to revisit a commit and then double-count it. For this reason, it is strongly recommended that 'git ahead-behind' is only run in a repository with a commit-graph file that covers most of the reachable commits, storing precomputed generation numbers. If no commit-graph exists, this walk will be much slower as it must walk all reachable commits in ensure_generations_valid() before performing the counting logic. It is possible to detect if generation numbers are available at run time and redirect the implementation to another algorithm that does not require this property. However, that implementation requires a commit walk per base/tip pair _and_ can be slower due to the commit date heuristics required. Such an implementation could be considered in the future if there is a reason to include it, but most Git hosts should already be generating a commit-graph file as part of repository maintenance. Most Git clients should also be generating commit-graph files as part of background maintenance or automatic GCs. Now, let's discuss the ahead/behind counting algorithm. The first array of commits are considered the starting commits. The index within that array will play a critical role. We create a new commit slab that maps commits to a bitmap. For a given commit (anywhere in the history), its bitmap stores information relative to which of the input commits can reach that commit. The ith bit will be on if the ith commit from the starting list can reach that commit. It is important to notice that these bitmaps are not the typical "reachability bitmaps" that are stored in .bitmap files. Instead of signalling which objects are reachable from the current commit, they instead signal "which starting commits can reach me?" It is also important to know that the bitmap is not necessarily "complete" until we walk that commit. We will perform a commit walk by generation number in such a way that we can guarantee the bitmap is correct when we visit that commit. At the beginning of the ahead_behind() method, we initialize the bitmaps for each of the starting commits. By enabling the ith bit for the ith starting commit, we signal "the ith commit can reach itself." We walk commits by popping the commit with maximum generation number out of the queue, guaranteeing that we will never walk a child of that commit in any future steps. As we walk, we load the bitmap for the current commit and perform two main steps. The _second_ step examines each parent of the current commit and adds the current commit's bitmap bits to each parent's bitmap. (We create a new bitmap for the parent if this is our first time seeing that parent.) After adding the bits to the parent's bitmap, the parent is added to the walk queue. Due to this passing of bits to parents, the current commit has a guarantee that the ith bit is enabled on its bitmap if and only if the ith commit can reach the current commit. The first step of the walk is to examine the bitmask on the current commit and decide which ranges the commit is in or not. Due to the "bit pushing" in the second step, we have a guarantee that the ith bit of the current commit's bitmap is on if and only if the ith starting commit can reach it. For each ahead_behind_count struct, check the base_index and tip_index to see if those bits are enabled on the current bitmap. If exactly one bit is enabled, then increment the corresponding 'ahead' or 'behind' count. This increment is the reason we _absolutely need_ to walk commits at most once. The only subtle thing to do with this walk is to check to see if a parent has all bits on in its bitmap, in which case it becomes "stale" and is marked with the STALE bit. This allows queue_has_nonstale() to be the terminating condition of the walk, which greatly reduces the number of commits walked if all of the commits are nearby in history. It avoids walking a large number of common commits when there is a deep history. We also use the helper method insert_no_dup() to add commits to the priority queue without adding them multiple times. This uses the PARENT2 flag. Thus, we must clear both the STALE and PARENT2 bits of all commits, in case ahead_behind() is called multiple times in the same process. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Abhishek Kumar	d7f92784c6	commit-graph: return 64-bit generation number In a preparatory step for introducing corrected commit dates, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Carlo Marcelo Arenas Belón	c1ea625f72	commit-reach: avoid is_descendant_of() shim `d91d6fbf26` (commit-reach: create repo_is_descendant_of(), 2020-06-17) adds a repository aware version of is_descendant_of() and a backward compatibility shim that is barely used. Update all callers to directly use the new repo_is_descendant_of() function instead; making the codebase simpler and pushing more the_repository references higher up the stack. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 16:36:53 -07:00
Junio C Hamano	b99a579f8e	Merge branch 'sb/more-repo-in-api' The in-core repository instances are passed through more codepaths. * sb/more-repo-in-api: (23 commits) t/helper/test-repository: celebrate independence from the_repository path.h: make REPO_GIT_PATH_FUNC repository agnostic commit: prepare free_commit_buffer and release_commit_memory for any repo commit-graph: convert remaining functions to handle any repo submodule: don't add submodule as odb for push submodule: use submodule repos for object lookup pretty: prepare format_commit_message to handle arbitrary repositories commit: prepare logmsg_reencode to handle arbitrary repositories commit: prepare repo_unuse_commit_buffer to handle any repo commit: prepare get_commit_buffer to handle any repo commit-reach: prepare in_merge_bases[_many] to handle any repo commit-reach: prepare get_merge_bases to handle any repo commit-reach.c: allow get_merge_bases_many_0 to handle any repo commit-reach.c: allow remove_redundant to handle any repo commit-reach.c: allow merge_bases_many to handle any repo commit-reach.c: allow paint_down_to_common to handle any repo commit: allow parse_commit* to handle any repo object: parse_object to honor its repository argument object-store: prepare has_{sha1, object}_file to handle any repo object-store: prepare read_object_file to deal with any repo ...	2019-02-05 14:26:09 -08:00
Stefan Beller	4d5430f747	commit-reach: prepare in_merge_bases[_many] to handle any repo Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-14 17:22:40 +09:00
Stefan Beller	21a9651ba3	commit-reach: prepare get_merge_bases to handle any repo Similarly to previous patches, the get_merge_base functions are used often in the code base, which makes migrating them hard. Implement the new functions, prefixed with 'repo_' and hide the old functions behind a wrapper macro. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-14 17:22:40 +09:00
Junio C Hamano	291123e69b	Merge branch 'ds/add-missing-tags' The history traversal used to implement the tag-following has been optimized by introducing a new helper. * ds/add-missing-tags: remote: make add_missing_tags() linear test-reach: test get_reachable_subset commit-reach: implement get_reachable_subset	2018-11-13 22:37:24 +09:00
Junio C Hamano	6b37389f85	Merge branch 'rj/header-cleanup' Code cleanup. * rj/header-cleanup: commit-reach.h: add missing declarations (hdr-check) ewok_rlw.h: add missing 'inline' to function definition fetch-object.h: add missing declaration (hdr-check)	2018-11-06 15:50:23 +09:00
Derrick Stolee	fcb2c0769d	commit-reach: implement get_reachable_subset The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least one commit in X. Implement get_reachable_subset() which answers this question. Given two arrays of commits, 'from' and 'to', return a commit_list with every commit from the 'to' array that is reachable from at least one commit in the 'from' array. The algorithm is a simple walk starting at the 'from' commits, using the PARENT2 flag to indicate "this commit has already been added to the walk queue". By marking the 'to' commits with the PARENT1 flag, we can determine when we see a commit from the 'to' array. We remove the PARENT1 flag as we add that commit to the result list to avoid duplicates. The order of the resulting list is a reverse of the order that the commits are discovered in the walk. There are a couple shortcuts to avoid walking more than we need: 1. We determine the minimum generation number of commits in the 'to' array. We do not walk commits with generation number below this minimum. 2. We count how many distinct commits are in the 'to' array, and decrement this count when we discover a 'to' commit during the walk. If this number reaches zero, then we can terminate the walk. Tests will be added using the 'test-tool reach' helper in a subsequent commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-03 00:12:06 +09:00
Ramsay Jones	1406725b88	commit-reach.h: add missing declarations (hdr-check) Add the necessary #includes and forward declarations to allow the header file to pass the 'hdr-check' target. Note that, since this header includes the commit-slab implementation header file (indirectly via commit-slab.h), some of the commit-slab inline functions (e.g contains_cache_at_peek()) will not compile without the complete type of 'struct commit'. Hence, we replace the forward declaration of 'struct commit' with the an #include of the 'commit.h' header file. It is possible, using the 'commit-slab-{decl,impl}.h' files, to avoid this inclusion of the 'commit.h' header. Commit `a9f1f1f9f8` ("commit-slab.h: code split", 2018-05-19) separated the commit-slab interface from its implementation, to allow for the definition of a public commit-slab data structure. This enabled us to avoid including the commit-slab implementation in a header file, which could result in the replication of the commit-slab functions in each compilation unit in which it was included. Indeed, if you compile with optimizations disabled, then run this script: $ cat -n dup-static.sh 1 #!/bin/sh 2 3 nm $1 \| grep ' t ' \| cut -d' ' -f3 \| sort \| uniq -c \| 4 sort -rn \| grep -v ' 1' $ $ ./dup-static.sh git \| grep contains 24 init_contains_cache_with_stride 24 init_contains_cache 24 contains_cache_peek 24 contains_cache_at_peek 24 contains_cache_at 24 clear_contains_cache $ you will find 24 copies of the commit-slab routines for the contains_cache. Of course, when you enable optimizations again, these duplicate static functions (mostly) disappear. Compiling with gcc at -O2, leaves two static functions, thus: $ nm commit-reach.o \| grep contains_cache 0000000000000870 t contains_cache_at_peek.isra.1.constprop.6 $ nm ref-filter.o \| grep contains_cache 00000000000002b0 t clear_contains_cache.isra.14 $ However, using a shared 'contains_cache' would result in all six of the above functions as external public functions in the git binary. At present, only three of these functions are actually called, so the trade-off seems to favour letting the compiler inline the commit-slab functions. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-29 10:14:21 +09:00
Ramsay Jones	0009d3501b	headers: normalize the spelling of some header guards Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-18 13:39:35 +09:00
Derrick Stolee	4fbcca4eff	commit-reach: make can_all_from_reach... linear The can_all_from_reach_with_flags() algorithm is currently quadratic in the worst case, because it calls the reachable() method for every 'from' without tracking which commits have already been walked or which can already reach a commit in 'to'. Rewrite the algorithm to walk each commit a constant number of times. We also add some optimizations that should work for the main consumer of this method: fetch negotitation (haves/wants). The first step includes using a depth-first-search (DFS) from each 'from' commit, sorted by ascending generation number. We do not walk beyond the minimum generation number or the minimum commit date. This DFS is likely to be faster than the existing reachable() method because we expect previous ref values to be along the first-parent history. If we find a target commit, then we mark everything in the DFS stack as a RESULT. This expands the set of targets for the other 'from' commits. We also mark the visited commits using 'assign_flag' to prevent re- walking the same commits. We still need to clear our flags at the end, which is why we will have a total of three visits to each commit. Performance was measured on the Linux repository using 'test-tool reach can_all_from_reach'. The input included rows seeded by tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as v3.[0-9]. This mimics a (very large) fetch that says "I have all major v3 releases and want all major v4 releases." The "large" case included X-rows as "v4." and Y-rows as "v3.*". This adds all release-candidate tags to the set, which does not greatly increase the number of objects that are considered, but does increase the number of 'from' commits, demonstrating the quadratic nature of the previous code. Small Case: Before: 1.52 s After: 0.26 s Large Case: Before: 3.50 s After: 0.27 s Note how the time increases between the two cases in the two versions. The new code increases relative to the number of commits that need to be walked, but not directly relative to the number of 'from' commits. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:56 -07:00
Derrick Stolee	1792bc1250	test-reach: test can_all_from_reach_with_flags The can_all_from_reach_with_flags method is used by ok_to_give_up in upload-pack.c to see if we have done enough negotiation during a fetch. This method is intentionally created to preserve state between calls to assist with stateful negotiation, such as over SSH. To make this method testable, add a new can_all_from_reach method that does the initial setup and final tear-down. We will later use this method in production code. Call the method from 'test-tool reach' for now. Since this is a many-to-many reachability query, add a new type of input to the 'test-tool reach' input format. Lines "Y:<committish>" create a list of commits to be the reachability targets from the commits in the 'X' list. In the context of fetch negotiation, the 'X' commits are the 'want' commits and the 'Y' commits are the 'have' commits. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:56 -07:00
Derrick Stolee	ba3ca1edce	commit-reach: move can_all_from_reach_with_flags There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. The can_all_from_reach_with_flags method is used in a stateful way by upload-pack.c. The parameters are very flexible, so we will be able to use its commit walking logic for many other callers. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:55 -07:00
Derrick Stolee	920f93ca1c	commit-reach: move commit_contains from ref-filter There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. All methods are direct moves, except we also make the commit_contains() method public so its consumers in ref-filter.c can still call it. We can also test this method in a test-tool in a later commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:54 -07:00
Derrick Stolee	1d614d41e5	commit-reach: move ref_newer from remote.c There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. The ref_newer() method is used by 'git push -f' to check if a force-push is necessary. By making the method public, we make it possible to test the method directly without setting up an envieronment where a 'git push' call makes sense. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:54 -07:00
Derrick Stolee	5227c38566	commit-reach: move walk methods from commit.c There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. The method declarations in commit.h are not touched by this commit and will be moved in a following commit. Many consumers need to point to commit-reach.h and that would bloat this commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:54 -07:00

29 Commits (a819a3da85655031a23abae0f75d0910697fb92c)