* js/objects-larger-than-4gb-on-windows:
ci: run expensive tests on push builds to integration branches
t5608: mark >4GB tests as EXPENSIVE
test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
test-tool synthesize: precompute pack for 4 GiB + 1
test-tool synthesize: use the unsafe hash for speed
t5608: add regression test for >4GB object clone
test-tool: add a helper to synthesize large packfiles
delta, packfile: use size_t for delta header sizes
odb, packfile: use size_t for streaming object sizes
git-zlib: handle data streams larger than 4GB
index-pack, unpack-objects: use size_t for object size
Derrick Stolee suggested [1] that expensive tests should be run at a
regular cadence rather than on every PR iteration. Gate GIT_TEST_LONG
on push builds to the integration branches (next, master, main, maint)
so that the EXPENSIVE prereq is satisfied there but not during PR
validation, where the extra minutes of wall-clock time do not justify
themselves.
[1] https://lore.kernel.org/git/e1e8837f-7374-4079-ba87-ab95dd156e33@gmail.com/
Helped-by: Derrick Stolee <derrickstolee@github.com>
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even with precomputed pack constants that reduced the helper's
runtime from minutes to seconds, the >4GB clone tests still take
200-850 seconds across CI jobs. The bottleneck is no longer the
pack generation but the clone operations themselves: transporting,
unpacking, and indexing 4 GiB of data through unpack-objects and
index-pack is inherently expensive.
As Jeff King pointed out [1], t5608 alone takes 160 seconds on his
laptop while the rest of the entire test suite finishes in under 90
seconds, and the test's disk footprint (4+ GiB source repo, then
two clones) is problematic for developers who use RAM disks for
their trash directories.
Gate the >4GB tests on the EXPENSIVE prereq (which requires
GIT_TEST_LONG to be set) in addition to SIZE_T_IS_64BIT, keeping
them out of normal local test runs.
[1] https://lore.kernel.org/git/20260501063805.GA2038915@coredump.intra.peff.net/
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a SHA-256 entry to the fast_packs[] table. The pack prefix and
deflate block structure are identical to SHA-1 (the pack format does
not encode the hash algorithm in its header). Only the suffix differs:
SHA-256 OIDs are 32 bytes instead of 20, giving a 609-byte suffix
compared to 513 for SHA-1, and a different pack checksum.
The constants were generated by running the generic path inside a
repository initialized with --object-format=sha256.
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synthesize helper hashes roughly 8 GiB of data through SHA-1 to
produce a 4 GiB + 1 pack (4 GiB for the pack checksum, 4 GiB for
the blob OID). Since the blob content is all NUL bytes, every byte
in the resulting pack file is deterministic for a given blob size and
hash algorithm.
Add a fast path that writes the pack from precomputed constants:
a 25-byte prefix (pack header, object header, zlib header, first
block header), the zero-filled bulk with periodic 5-byte deflate
block headers, and a 513-byte suffix (tree, two commits, empty tree,
pack SHA-1 checksum). This eliminates all SHA-1 and adler32
computation, making the helper purely I/O-bound.
The precomputed constants are stored in a struct fast_pack array
keyed by hash algorithm format_id, so that adding SHA-256 support
later requires only adding another array entry with its suffix.
The constants were generated by running the generic path and
extracting the non-zero bytes from the resulting pack file.
Benchmarks generating a 4 GiB + 1 pack (3 runs each, SHA1DC on
x86_64):
generic path: 88s / 81s / 140s
fast path: 14s / 13s / 15s
On CI, where t5608 currently takes 200-850 seconds depending on the
job, the fast path cuts the pack-generation phase from minutes to
seconds, leaving only the clone operations themselves.
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King pointed out on the mailing list [1] that t5608's new >4GB
test cases dominate the entire test suite runtime: 160 seconds on his
laptop when the rest of the suite finishes in under 90 seconds, and
305-850 seconds across CI jobs. The bottleneck is that the synthesize
helper hashes roughly 8 GB of data through SHA-1 (4 GB for the pack
checksum plus 4 GB for the blob OID) for a 4 GB+1 blob.
Since the helper generates known test data, collision detection is
unnecessary. Switch from repo->hash_algo to unsafe_hash_algo(), which
uses hardware-accelerated SHA-1 (via OpenSSL or Apple CommonCrypto)
when available.
Benchmarks on an x86_64 machine generating a 4 GB+1 pack (2 runs
each, interleaved):
SHA-1 backend Run 1 Run 2
SHA1DC (safe) 75s 80s
OpenSSL (unsafe) 21s 19s
The effect scales linearly. At 64 MB with 10 randomized interleaved
runs, the OpenSSL unsafe backend shows a 5.4x improvement (median
0.202s vs 1.088s) with tight variance (stdev 0.028s vs 0.095s).
The speedup is only realized when the build has a fast unsafe backend
compiled in. The CI's linux-TEST-vars job already sets
OPENSSL_SHA1_UNSAFE=YesPlease; macOS benefits from Apple CommonCrypto
when configured. On builds without a separate unsafe backend (such as
the default Windows builds), unsafe_hash_algo() returns the regular
collision-detecting implementation and the change is a no-op.
[1] https://lore.kernel.org/git/20260501063805.GA2038915@coredump.intra.peff.net/
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shift overflow bug in index-pack and unpack-objects caused incorrect
object size calculation when the encoded size required more than 32 bits
of shift. This would result in corrupted or failed unpacking of objects
larger than 4GB.
Add a test that creates a pack file containing a 4GB+ blob using the
new 'test-tool synthesize pack --reachable-large' command, then clones
the repository to verify the fix works correctly.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To test Git's behavior with very large pack files, we need a way to
generate such files quickly.
A naive approach using only readily-available Git commands would take
over 10 hours for a 4GB pack file, which is prohibitive.
Side-stepping Git's machinery and actual zlib compression by writing
uncompressed content with the appropriate zlib header makes things
much faster. The fastest method using this approach generates many
small, unreachable blob objects and takes about 1.5 minutes for 4GB.
However, this cannot be used because we need to test git clone, which
requires a reachable commit history.
Generating many reachable commits with small, uncompressed blobs takes
about 4 minutes for 4GB. But this approach 1) does not reproduce the
issues we want to fix (which require individual objects larger than
4GB) and 2) is comparatively slow because of the many SHA-1
calculations.
The approach taken here generates a single large blob (filled with NUL
bytes), along with the trees and commits needed to make it reachable.
This takes about 2.5 minutes for 4.5GB, which is the fastest option
that produces a valid, clonable repository with an object large enough
to trigger the bugs we want to test.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delta header decoding functions return unsigned long, which
truncates on Windows for objects larger than 4GB. Introduce size_t
variants get_delta_hdr_size_sz() and get_size_from_delta_sz() that
preserve the full 64-bit size, and use them in packed_object_info()
where the size is needed for streaming decisions.
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The odb_read_stream structure uses unsigned long for the size field,
which is 32-bit on Windows even in 64-bit builds. When streaming
objects larger than 4GB, the size would be truncated to zero or an
incorrect value, resulting in empty files being written to disk.
Change the size field in odb_read_stream to size_t and introduce
unpack_object_header_sz() to return sizes via size_t pointer. Since
object_info.sizep remains unsigned long for API compatibility, use
temporary variables where the types differ, with comments noting the
truncation limitation for code paths that still use unsigned long.
Widening the producers to size_t in this way introduces a handful of
silent size_t -> unsigned long narrowings on Windows, all in
builtin/pack-objects.c, where the consumers are still typed
unsigned long. Make those narrowings explicit with
cast_size_t_to_ulong() so they assert loudly the moment an object
actually exceeds ULONG_MAX bytes:
- oe_get_size_slow() returns unsigned long but holds a size_t
locally; cast at the return.
- write_reuse_object() passes a size_t into check_pack_inflate(),
whose expect parameter is unsigned long; cast at the call.
- check_object() routes a size_t through SET_SIZE() and
SET_DELTA_SIZE(), both of which take unsigned long via
oe_set_size() / oe_set_delta_size(); cast at the three call
sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the
non-delta default arm.
The cast-only treatment is deliberately a stop-gap. Properly
widening oe_set_size, oe_get_size_slow's return type,
check_pack_inflate's expect parameter, object_info.sizep,
patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series
that is too large to be reviewable, so the proper widening is
deferred to a follow-up topic. Until then,
cast_size_t_to_ulong() at least makes the truncation explicit at
the source: it documents the boundary, and on a 64-bit non-Windows
platform it is a no-op.
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When
processing data streams larger than 4GB, the `total_in` and `total_out`
fields in zlib's `z_stream` structure wrap around, which caused the
sanity checks in `zlib_post_call()` to trigger `BUG()` assertions.
The git_zstream wrapper now tracks its own 64-bit totals rather than
copying them from zlib. The sanity checks compare only the low bits,
using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for
the platform's `uLong` size.
This is based on work by LordKiRon in git-for-windows#6076.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpacking objects from a packfile, the object size is decoded
from a variable-length encoding. On platforms where unsigned long is
32-bit (such as Windows, even in 64-bit builds), the shift operation
overflows when decoding sizes larger than 4GB. The result is a
truncated size value, causing the unpacked object to be corrupted or
rejected.
Fix this by changing the size variable to size_t, which is 64-bit on
64-bit platforms, and ensuring the shift arithmetic occurs in 64-bit
space.
Declare the per-byte continuation variable `c` as size_t as well,
matching the canonical varint decoder unpack_object_header_buffer()
in packfile.c. With c as size_t the expression (c & 0x7f) << shift
is naturally size_t-typed, so the explicit cast that an earlier
iteration carried at the use site is no longer needed.
While at it, add the same overflow guard that
unpack_object_header_buffer() carries: if the cumulative shift would
exceed bitsizeof(size_t) - 7, refuse the input rather than invoking
undefined behavior. Unlike unpack_object_header_buffer(), which
labels this case "bad object header", report it as the platform
limit it actually is: a header may be perfectly well-formed and
still encode a size we cannot represent locally (notably on a
32-bit build consuming a packfile produced on a 64-bit host).
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
6cc6d1b4c6 (Documentation: update add --force option + ignore=all
config, 2026-02-06) added text describing both the ignore=none and
ignore=all behaviors. The former had minor formatting and grammatical
errors, while the latter was a bit garbled. I have tried to tweak the
wording on the latter to make it read as I think was intended, and fixed
the minor grammatical issues with both as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
a8215a2051 (send-email: add client certificate options, 2026-03-02)
added documentation for sendemail.smtpSSLClientKey that says it works
"in conjunction with `sendemail.smtpSSLClientKey`" -- referring to
itself. It appears that `sendemail.smtpSSLClientCert` was the intended
reference; fix it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various issues in the release notes -- missing/wrong articles, typo,
indentation, quote consistency, and wording improvement or corrections.
Other than the indentation fix for "The way combined list-object filter
options...", this patch is much easier to view with --color-words.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As writing version 2 MIDX files by default breaks older versions of
Git and its reimplementations, use V2 only when necessary.
* jk/midx-write-v1-by-default:
MIDX: revert the default version to v1
We introduced midx version 2 in b2ec8e90c2 (midx: do not require packs
to be sorted in lexicographic order, 2026-02-24) and now write it by
default. The rationale was that older versions should ignore the v2 midx
and fall back to using the packs (just like we do for other midx
errors). Unfortunately this is not the case, as we have a hard die()
when we see an unknown midx version.
As a result, writing a midx with Git 2.54-rc2 puts the repository into a
state that is unusable with Git 2.53. And this midx write may happen
behind the scenes as part of normal operations, like fetch.
Let's switch back to writing v1 by default to avoid regressing the case
where multiple versions of Git are used on the same repository.
There is one gotcha, though: the v2 format is required for some new
features, like midx compaction, and running "git multi-pack-index
compact" will complain when asked to write a v1 index. The user must set
midx.version to "2" to make the feature work.
So instead of always using v1, we'll base the default on whether the
requested feature requires v2. That does mean that running midx
compaction will create a repository that can't be read by older versions
of Git. But we never do that by default; only people experimenting with
the new feature will be affected.
We have to adjust the test expectation in t5319, since it will now
generate v1 files. And our "auto-select v2" is covered by the tests in
t5335, which continue to check that compaction works without having to
set midx.version manually (and also explicitly check that asking for v1
with compaction reports the problem).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- correct translation of pathspec msgs
Corrects cases where the “pathspec” is translated as if it was a
path
- correct translation of refspec msgs
Corrects cases where the “refspec” were not consistently translated
- correct translation of credential msgs
Corrects cases where the “credential” were not correctly translated
Signed-off-by: Stefan Björnelund <stefan.bjornelund.gnome@gmail.com>
Modified-by: Peter Krefting <peter@softwolves.pp.se>
Translate 198 previously fuzzy or untranslated messages, bringing the
total number of translated messages to 6226.
Reviewed-by: 依云 <lilydjwg@gmail.com>
Reviewed-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
The glossary entry is a list of terms and their definitions, so
multi-paragraph definitions need "+" continuation lines to indicate
that they are part of a single entry.
When an entry contains a sub-list (say, a bulleted list), the final "+"
may become ambiguous: is it connecting the next paragraph to the final
entry of the sub-list, or to the original list of definition paragraphs?
Asciidoc generally connects it to the former, even when we mean the
latter, and you end up with the next paragraph indented incorrectly,
like this:
glob
...defines glob...
Two consecutive asterisks ("**") in patterns matched
against full pathname may have special meaning:
- ...some special meaning of **...
- ...another special meaning of **...
- Other consecutive asterisks are considered invalid.
Glob magic is incompatible with literal magic.
That final "Glob magic is incompatible" paragraph is in the wrong spot.
It should be at the same level as "Two consecutive asterisks", as it is
not part of the final "Other consecutive asterisks" bullet point.
The same problem appears in several other spots in the glossary.
Usually we'd fix this by using "--" markers, which put the sub-list into
its own block. But there's a catch: in some of these spots we are
already in an open block, and nesting open blocks is a problem. It seems
to work for me using Asciidoc 10.2.1, but Asciidoctor 2.0.26 makes a
mess of it (our intent to open a new block seems to close the old one).
Fortunately there's a work-around: when using a "+" list-continuation,
the number of empty lines above the continuation indicates which level
of parent list to continue. So by adding an empty line after our
unordered list (before the "+"), we should be able to continue the
definition list item.
But asciidoc being asciidoc, of course that is not the end of the story.
That technique works fine for the "glob" and "attr" lists in this patch,
but under the "refs" item it works for only 1 of the 2 lists! I can't
figure out why, and this may be an asciidoctor bug. But we can work
around it by using "--" open-block markers here, since we're not
already in an open block.
So using the extra blank line for the first two instances, and "--"
markers for the second two, this patch produces identical output from
"doc-diff HEAD^ HEAD" for both --asciidoctor and --asciidoc modes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GitHub Actions started complaining about use of Node.js 20 and I was
wondering why only one job uses actions/checkout@v4, while everybody
else already uses actions/checkout@v5.
It turns out that it is caused by a semantic mismerge between
e75cd059 (ci: check formatting of our Rust code, 2025-10-15) that
added a new use of actions/checkout@v4 that happened very close to
another change 63541ed9 (build(deps): bump actions/checkout from 4
to 5, 2025-10-16) that updated all uses of actions/checkout@v4 to
use vactions/checkout@v5.
Update the leftover and the last use of actions/checkout@v4 to use
actions/checkout@v5 to help ourselves to move away from Node.js 20.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I claimed in 3c18135b (doc: am: say that --message-id adds a trailer,
2026-02-09) that `git am --message-id` adds a Git trailer. But that
isn’t the case; for the case of a commit message with a subject, body,
and no trailer block:
<subject>
<paragrah>
It just appends the line right after `paragraph`:
<subject>
<paragraph>
Message-ID: <message-id_trailer.323@msgid.xyz>
It does work for two other cases though, namely subject-only and with an
existing trailer block.
This is at best an inconsistency and arguably a bug, but we’re at the
trailing end of the release cycle now. So reverting the doc is safer
than making msg-id act as a trailer, for now.
Revert this hunk from commit 3c18135b except the only useful
change (“Also use inline-verbatim for `Message-ID`”).
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used writev() in limited code paths and supplied emulation for
platforms without working writev(), but the emulation was too
faithful to the spec to make the result useless to send even 64kB;
revert the topic and plan to restart the effort later.
* jc/no-writev-does-not-work:
Revert "compat/posix: introduce writev(3p) wrapper"
Revert "wrapper: introduce writev(3p) wrappers"
Revert "sideband: use writev(3p) to send pktlines"
Revert "cmake: use writev(3p) wrapper as needed"