name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.
Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/
including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-<commandname>.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial or git help tutorial, and the
documentation of each command with man git-<commandname> or git help <commandname>.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration or git help cvs-migration if git is
installed).
Issues which are security relevant should be disclosed privately to
the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that
list the current status of various development topics to the mailing
list. The discussion following them give a good reference for
project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very
first version. He described the tool as "the stupid content tracker"
and the name as (depending on your mood):
random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronunciation of "get" may or may not be relevant.
stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
"global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
"goddamn idiotic truckload of sh*t": when it breaks