Commit-graph, midx, and pack idx files all have both a lookup table of
oids and an oid fanout table. In midx and pack idx files, we take the
final entry of the fanout table as the source of truth for the number of
entries, and then verify that the size of the lookup table matches that.
But for commit-graph files, we do the opposite: we use the size of the
lookup table as the source of truth, and then check the final fanout
entry against it.
As noted in 4169d89645 (commit-graph: check consistency of fanout
table, 2023-10-09), either is correct. But there are a few reasons to
prefer the fanout table as the source of truth:
1. The fanout entries are 32-bits on disk, and that defines the
maximum number of entries we can store. But since the size of the
lookup table is only bounded by the filesystem, it can be much
larger. And hence computing it as the commit-graph does means that
we may truncate the result when storing it in a uint32_t.
2. We read the fanout first, then the lookup table. If we're verifying
the chunks as we read them, then we'd want to take the fanout as
truth (we have nothing yet to check it against) and then we can
check that the lookup table matches what we already know.
3. It is pointlessly inconsistent with the midx and pack idx code.
Since the three have to do similar size and bounds checks, it is
easier to reason about all three if they use the same approach.
So this patch moves the assignment of g->num_commits to the fanout
parser, and then we can check the size of the lookup chunk as soon as we
try to load it.
There's already a test covering this situation, which munges the final
fanout entry to 2^32-1. In the current code we complain that it does not
agree with the table size. But now that we treat the munged value as the
source of truth, we'll complain that the lookup table is the wrong size
(again, either is correct). So we'll have to update the message we
expect (and likewise for an earlier test which does similar munging).
There's a similar test for this situation on the midx side, but rather
than making a very-large fanout value, it just truncates the lookup
table. We could do that here, too, but the very-large fanout value
actually shows an interesting corner case. On a 32-bit system,
multiplying to find the expected table size would cause an integer
overflow. Using st_mult() would detect that, but cause us to die()
rather than falling back to the non-graph code path. Checking the size
using division (as we do with existing chunk-size checks) avoids the
overflow entirely, and the test demonstrates this when run on a 32-bit
system.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.
Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/
including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-<commandname>.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial or git help tutorial, and the
documentation of each command with man git-<commandname> or git help <commandname>.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration or git help cvs-migration if git is
installed).
Those wishing to help with error message, usage and informational message
string translations (localization l10) should see po/README.md
(a po file is a Portable Object file that holds the translations).
Issues which are security relevant should be disclosed privately to
the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that
list the current status of various development topics to the mailing
list. The discussion following them give a good reference for
project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very
first version. He described the tool as "the stupid content tracker"
and the name as (depending on your mood):
random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronunciation of "get" may or may not be relevant.
stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
"global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
"goddamn idiotic truckload of sh*t": when it breaks