In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:
$ git rev-list --missing=print-info <rev>
?<oid> [SP <token>=<value>]... LF
Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --missing=print-info <rev>
<oid> NUL missing=yes NUL [<token>=<value> NUL]...
In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:
$ git rev-list --boundary <rev>
-<oid> LF
Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --boundary <rev>
<oid> NUL boundary=yes NUL
In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:
$ git rev-list --objects <rev>
<tree/blob oid> SP [<path>] LF
Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.
Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:
$ git rev-list -z --objects <rev>
<oid> NUL [path=<path> NUL]
In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.
For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
curl supports a few options to control when and how often it should
instruct the OS to send TCP keepalives, like KEEPIDLE, KEEPINTVL, and
KEEPCNT. Until this point, there hasn't been a way for users to change
what values are used for these options, forcing them to rely on curl's
defaults.
But we do unconditionally enable TCP keepalives without giving users an
ability to tweak any fine-grained parameters. Ordinarily this isn't a
problem, particularly for users that have fast-enough connections,
and/or are talking to a server that has generous or nonexistent
thresholds for killing a connection it hasn't heard from in a while.
But it can present a problem when one or both of those assumptions fail.
For instance, I can reliably get an in-progress clone to be killed from
the remote end when cloning from some forges while using trickle to
limit my clone's bandwidth.
For those users and others who wish to more finely tune the OS's
keepalive behavior, expose configuration and environment variables which
allow setting curl's KEEPIDLE, KEEPINTVL, and KEEPCNT options.
Note that while KEEPIDLE and KEEPINTVL were added in curl 7.25.0,
KEEPCNT was added much more recently in curl 8.9.0. Per f7c094060c
(git-curl-compat: remove check for curl 7.25.0, 2024-10-23), both
KEEPIDLE and KEEPINTVL are set unconditionally. But since we may be
compiled with a curl that isn't as new as 8.9.0, only set KEEPCNT when
we have CURLOPT_TCP_KEEPCNT to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which automatically
formats placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine applies synopsis rules to
these spans.
Possible values for some variables, that were mentioned in the description
prose, are now made into enumerated list.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
sensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).
Let's follow the way Git works in general and compare the remote
names case sensitively when processing advertised remotes.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new followRemoteHEAD feature is triggered for almost every fetch,
causing us to ask the server about the remote "HEAD" and to consider
updating our local tracking HEAD symref. This patch limits the feature
only to the case when we are fetching a remote using its configured
refspecs (typically into its refs/remotes/ hierarchy). There are two
reasons for this.
One is efficiency. E.g., the fixes in 6c915c3f85 (fetch: do not ask for
HEAD unnecessarily, 2024-12-06) and 20010b8c20 (fetch: avoid ls-refs
only to ask for HEAD symref update, 2025-03-08) were aimed at reducing
the work we do when we would not be able to update HEAD anyway. But they
do not quite cover all cases. The remaining one is:
git fetch origin refs/heads/foo:refs/remotes/origin/foo
which _sometimes_ can update HEAD, but usually not. And that leads us to
the second point, which is being simple and explainable.
The code for updating the tracking HEAD symref requires both that we
learned which ref the remote HEAD points at, and that the server
advertised that ref to us. But because the v2 protocol narrows the
server's advertisement, the command above would not typically update
HEAD at all, unless it happened to point to the "foo" branch. Or even
weirder, it probably _would_ update if the server is very old and
supports only the v0 protocol, which always gives a full advertisement.
This creates confusing behavior for the user: sometimes we may try to
update HEAD and sometimes not, depending on vague rules.
One option here would be to loosen the update code to accept the remote
HEAD even if the server did not advertise that ref. I think that could
work, but it may also lead to interesting corner cases (e.g., creating a
dangling symref locally, even though the branch is not unborn on the
server, if we happen not to have fetched it).
So let's instead simplify the rules: we'll only consider updating the
tracking HEAD symref when we're doing a full fetch of the remote's
configured refs. This is easy to implement; we can just set a flag at
the moment we realize we're using the configured refspecs. And we can
drop the special case code added by 6c915c3f85 and 20010b8c20, since
this covers those cases. The existing tests from those commits still
pass.
In t5505, an incidental call to "git fetch <remote> <refspec>" updated
HEAD, which caused us to adjust the test in 3f763ddf28 (fetch: set
remote/HEAD if it does not exist, 2024-11-22). We can now adjust that
back to how it was before the feature was added.
Even though t5505 is incidentally testing our new desired behavior,
we'll add an explicit test in t5510 to make sure it is covered.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When BreakingChanges.txt was added in 57ec9254eb (docs: introduce
document to announce breaking changes, 2024-06-14) there was no
corresponding change to the Makefile to build it. Fix that by adding it
to the TECH_DOCS target.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate the paragraphs in the description of `--exclude` with a `+`
rather than an empty line to indent the whole description rather than
just the first paragraph.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merge-ort was written, I did not at first allow rename detection to
be disabled, because I suspected that most folks disabling rename
detection were doing so solely for performance reasons. Since I put a
lot of working into providing dramatic speedups for rename detection
performance as used by the merge machinery, I wanted to know if there
were still real world repositories where rename detection was
problematic from a performance perspective. We have had years now to
collect such information, and while we never received one, waiting
longer with the option disabled seems unlikely to help surface such
issues at this point. Also, there has been at least one request to
allow rename detection to be disabled for behavioral rather than
performance reasons (see the thread including
https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/
), so let's start heeding the config and command line settings.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The graph for `--ancestry-path=H D..M` should contain commit C.
Signed-off-by: Han Jiang <jhcarl0814@gmail.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This note was added to the restore command docs in 46e91b663b
(checkout: split part of it to new command 'restore', 2019-04-25),
but it is now inaccurate. The underlying builtin `add -i` implementation,
made default in 0527ccb1b5 (add -i: default to the built-in implementation,
2021-11-30), supports pathspecs, so `git restore -p <pathspec>...` has
worked for all users since then. I bisected to verify this was the commit
that added support.
Signed-off-by: Adam Johnson <me@adamj.eu>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git-reflog(1)' currently allows users to expire reflogs and
delete individual entries, it lacks functionality to completely remove
reflogs for specific references. This becomes problematic in
repositories where reflogs are not needed but continue to accumulate
entries despite setting 'core.logAllRefUpdates=false'.
Add a new 'drop' subcommand to git-reflog that allows users to delete
the entire reflog for a specified reference. Include an '--all' flag to
enable dropping all reflogs from all worktrees and an addon flag
'--single-worktree', to only drop all reflogs from the current worktree.
While here, remove an extraneous newline in the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When breaking changes are enabled we continue to install documentation
of the git-pack-redundant(1) command even though it is completely
disabled and thus inaccessible. Improve this by only installing the
documentation in case breaking changes aren't enabled.
Based-on-patch-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of Git 3.0, remove the hidden synonym for "--annotate-stdin"
for real. As this does not change the fact that it used to be
called "--stdin" in older version of Git, keep that passage in the
documentation for "--annotate-stdin".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export has a --signed-tags= option that controls how to handle tag
signatures. However, there is no equivalent for commit signatures; it
just silently strips the signature out of the commit (analogously to
--signed-tags=strip).
While signatures are generally problematic for fast-export/fast-import
(because hashes are likely to change), if they're going to support tag
signatures, there's no reason to not also support commit signatures.
So, implement a --signed-commits= option that mirrors the --signed-tags=
option.
On the fast-export side, try to be as much like signed-tags as possible,
in both implementation and in user-interface. This will change the
default behavior to '--signed-commits=abort' from what is now
'--signed-commits=strip'. In order to provide an escape hatch for users
of third-party tools that call fast-export and do not yet know of the
--signed-commits= option, add an environment variable
'FAST_EXPORT_SIGNED_COMMITS_NOABORT=1' that changes the default to
'--signed-commits=warn-strip'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --signed-tags= option takes one of five arguments specifying how to
handle signed tags during export. Among these arguments, 'strip' is to
'warn-strip' as 'verbatim' is to 'warn' (the unmentioned argument is
'abort', which stops the fast-export process entirely). That is,
signatures are either stripped or copied verbatim while exporting, with
or without a warning.
Match the pattern and rename 'warn' to 'warn-verbatim' to make it clear
that it instructs fast-export to copy signatures verbatim.
To maintain backwards compatibility, 'warn' is still recognized as
deprecated synonym of 'warn-verbatim'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation is using the historical mode for titles, which is a
setext-style (i.e., two-line) section title.
The issue with this mode is that starting block delimiters (e.g.,
`----`) can be confused with a section title when they are exactly the
same length as the preceding line. In the original documentation, this
is taken care of for English by the writer, but it is not the case for
translations where these delimiters are hidden. A translator can
generate a line that is exactly the same length as the following block
delimiter, which leads to this line being considered as a title.
To safeguard against this issue, add a blank line before and after
block delimiters where block is at root level, else add a "+" line
before block delimiters to link it to the preceding paragraph.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit bc26f7690a (clone: make it possible to specify --tags,
2025-02-06) added a new paragraph in the middle of this list item. By
adding an empty line rather than using a list continuation, we broke the
list continuation, with the new paragraph ending up funnily indented.
Restore the chain of list continuations.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some future breaking changes would remove certain parts of the
default repository, which were still described even when the
documents were built for the future with WITH_BREAKING_CHANGES.
* pw/repo-layout-doc-update:
docs: fix repository-layout when building with breaking changes
Meson-based build procedure forgot to build some docs, which has
been corrected.
* pw/build-meson-technical-and-howto-docs:
meson: fix building technical and howto docs
Since commit 8ccc75c245 (remote: announce removal of "branches/" and
"remotes/", 2025-01-22) enabling WITH_BREAKING_CHANGES when building git
removes support for reading branches from ".git/branches" and remotes
from ".git/remotes". However those locations are still documented in
gitrepository-layout.adoc even though the build does not support them.
Rectify this by adding a new document attribute "with-breaking-changes"
and use it to make the inclusion of those sections of the documentation
conditional. Note that the name of the attribute does not match the test
prerequisite WITHOUT_BREAKING_CHANGES added in c5bc9a7f94 (Makefile:
wire up build option for deprecated features, 2025-01-22). This is to
avoid the awkward double negative ifndef::without_breaking_changes for
documentation that should be included when WITH_BREAKING_CHANGES is
enabled. The test prerequisite will be renamed to match the
documentation attribute in a future patch series.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update a few more instances of Documentation/*.txt files which have been
renamed to *.adoc.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ec14d4ecb5 (builtin.h: take over documentation from
api-builtin.txt, 2017-08-02) deleted api-builtin.txt and moved the
contents into builtin.h. Most of the references were fixed in
d85e9448dd (new-command.txt: update reference to builtin docs,
2023-02-04), but one remained. Fix it.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The top-level .gitattributes file contains entries for the Documentation
tree. Documentation/.gitattributes has not been touched since it was
added in 14f9e128d3 (Define the project whitespace policy, 2008-02-10).
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point behind a compile-time switch is to ensure that we have a
mechanism to hide myriad of backward incompatible changes that may
be prepared and accumulated over time, yet make them available for
testing any time during the development toward the big version
boundary. Add a few words to stress that point.
Since the document was first written, we have added the CI job that
the document anticipated us to have. Rephrase to state the current
status.
The discussion in [*1*] made us abandon the "feature.git3" based
runtime switching of behaviour and instead adopt the compile-time
switching mechanism, but a stray sentence about runtime switching
still remained in the final text by mistake. Remove it.
[Reference]
*1* https://lore.kernel.org/git/xmqqldzel6ug.fsf@gitster.g/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's add a design doc about how we could improve handling liarge blobs
using "Large Object Promisors" (LOPs). It's a set of features with the
goal of using special dedicated promisor remotes to store large blobs,
and having them accessed directly by main remotes and clients.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
What happens to submodules during merge has been documented in a
bit more detail.
* lo/doc-merge-submodule-update:
merge-strategies.adoc: detail submodule merge
Assorted fixes and improvements to the build procedure based on
meson.
* ps/build-meson-fixes-0130:
gitlab-ci: restrict maximum number of link jobs on Windows
meson: consistently use custom program paths to resolve programs
meson: fix overwritten `git` variable
meson: prevent finding sed(1) in a loop
meson: improve handling of `sane_tool_path` option
meson: improve PATH handling
meson: drop separate version library
meson: stop linking libcurl into all executables
meson: introduce `libgit_curl` dependency
meson: simplify use of the common-main library
meson: inline the static 'git' library
meson: fix OpenSSL fallback when not explicitly required
meson: fix exec path with enabled runtime prefix
When our asciidoc files were renamed from "*.txt" to "*.adoc" in
1f010d6bdf (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
the "meson.build" file in "Documentation" was updated but the
"meson.build" files in the "technical" and "howto" subdirectories were
not. This causes the meson build to fail when configured with
-Ddocs=html. Fix this by updating the relevant "meson.build" files.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffs queued from git-diff-pairs(1) are flushed when stdin is
closed. To enable greater flexibility, allow control over when the diff
queue is flushed by writing a single NUL byte on stdin between input
file pairs. Diff output between flushes is separated by a single NUL
byte.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Through git-diff(1), a single diff can be generated from a pair of blob
revisions directly. Unfortunately, there is not a mechanism to compute
batches of specific file pair diffs in a single process. Such a feature
is particularly useful on the server-side where diffing between a large
set of changes is not feasible all at once due to timeout concerns.
To facilitate this, introduce git-diff-pairs(1) which acts as a backend
passing its NUL-terminated raw diff format input from stdin through diff
machinery to produce various forms of output such as patch or raw.
The raw format was originally designed as an interchange format and
represents the contents of the diff_queued_diff list making it possible
to break the diff pipeline into separate stages. For example,
git-diff-tree(1) can be used as a frontend to compute file pairs to
queue and feed its raw output to git-diff-pairs(1) to compute patches.
With this, batches of diffs can be progressively generated without
having to recompute renames or retrieve object context. Something like
the following:
git diff-tree -r -z -M $old $new |
git diff-pairs -p -z
should generate the same output as `git diff-tree -p -M`. Furthermore,
each line of raw diff formatted input can also be individually fed to a
separate git-diff-pairs(1) process and still produce the same output.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We renamed from .txt to .adoc all the asciidoc source files and
necessary includes. We also need to adjust the build-docdep tool to
work on files whose suffix is .adoc when computing the documentation
dependencies.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20). This left broken links in
the generated howto-index.html.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Removal of ".git/branches" and ".git/remotes" support in the
BreakingChanges document has been further clarified.
* jc/3.0-branches-remotes-update:
BreakingChanges: clarify branches/ and remotes/
"git refs migrate" can optionally be told not to migrate the reflog.
* kn/ref-migrate-skip-reflog:
builtin/refs: add '--no-reflog' flag to drop reflogs
The value of "uname -s" is by default sent over the wire as a part
of the "version" capability.
* ua/os-version-capability:
agent: advertise OS name via agent capability
t5701: add setup test to remove side-effect dependency
version: extend get_uname_info() to hide system details
version: refactor get_uname_info()
version: refactor redact_non_printables()
version: replace manual ASCII checks with isprint() for clarity
At now, we have already implemented the ref consistency checks for both
"files-backend" and "packed-backend". Although we would check some
redundant things, it won't cause trouble. So, let's integrate it into
the "git-fsck(1)" command to get feedback from the users. And also by
calling "git refs verify" in "git-fsck(1)", we make sure that the new
added checks don't break.
Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git refs verify" command. In order to provide
the user interface create a progress which makes the total task be 1.
It's hard to know how many loose refs we will check now. We might
improve this later.
Then, introduce the option to allow the user to disable checking ref
database consistency. Put this function in the very first execution
sequence of "git-fsck(1)" due to that we don't want the existing code of
"git-fsck(1)" which would implicitly check the consistency of refs to
die the program.
Last, update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is a "sorted" trait in the header of the "packed-refs" file,
it means that each entry is sorted increasingly by comparing the
refname. We should add checks to verify whether the "packed-refs" is
sorted in this case.
Update the "packed_fsck_ref_header" to know whether there is a "sorted"
trail in the header. It may seem that we could record all refnames
during the parsing process and then compare later. However, this is not
a good design due to the following reasons:
1. Because we need to store the state across the whole checking
lifetime, we would consume a lot of memory if there are many entries
in the "packed-refs" file.
2. We cannot reuse the existing compare function "cmp_packed_ref_records"
which cause repetition.
Because "cmp_packed_ref_records" needs an extra parameter "struct
snaphost", extract the common part into a new function
"cmp_packed_ref_records" to reuse this function to compare.
Then, create a new function "packed_fsck_ref_sorted" to parse the file
again and user the new fsck message "packedRefUnsorted(ERROR)" to report
to the user if the file is not sorted.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"packed-backend.c::next_record" will parse the ref entry to check the
consistency. This function has already checked the following things:
1. Parse the main line of the ref entry to inspect whether the oid is
not correct. Then, check whether the next character is oid. Then
check the refname.
2. If the next line starts with '^', it would continue to parse the
peeled oid and check whether the last character is '\n'.
As we decide to implement the ref consistency check for "packed-refs",
let's port these two checks and update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c::create_snapshot", if there is a header (the line
which starts with '#'), we will check whether the line starts with "#
pack-refs with: ". However, we need to consider other situations and
discuss whether we need to add checks.
1. If the header does not exist, we should not report an error to the
user. This is because in older Git version, we never write header in
the "packed-refs" file. Also, we do allow no header in "packed-refs"
in runtime.
2. If the header content does not start with "# packed-ref with: ", we
should report an error just like what "create_snapshot" does. So,
create a new fsck message "badPackedRefHeader(ERROR)" for this.
3. If the header content is not the same as the constant string
"PACKED_REFS_HEADER". This is expected because we make it extensible
intentionally and runtime "create_snapshot" won't complain about
unknown traits. In order to align with the runtime behavior. There is
no need to report.
As we have analyzed, we only need to check the case 2 in the above. In
order to do this, use "open_nofollow" function to get the file
descriptor and then read the "packed-refs" file via "strbuf_read". Like
what "create_snapshot" and other functions do, we could split the line
by finding the next newline in the buffer. When we cannot find a
newline, we could report an error.
So, create a function "packed_fsck_ref_next_line" to find the next
newline and if there is no such newline, use
"packedRefEntryNotTerminated(ERROR)" to report an error to the user.
Then, parse the first line to apply the checks. Update the test to
exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calls to `find_program()` in our documentation don't use our custom
program path. This variable gets populated on Windows with the location
of Git for Windows so that we can use it to provide our build tools.
Consequently, we may not be able to find all necessary binaries on
Windows.
Adapt the calls to use the program path to fix this. While at it, drop
`required: true` arguments, which are the default anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're assigning the `git` variable in three places:
- In "meson.build" to store the external Git executable.
- In "meson.build" to store the compiled Git executable.
- In "Documentation/meson.build" to store the external Git executable,
a second time.
The last case is only needed because we overwrite the original variable
with the built version. Rename the variable used for the built Git
executable so that we don't have to resolve the external Git executable
multiple times.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're searching for the sed(1) executable in a loop, which will make us
try to find it multiple times. Starting with the preceding commit we
already declare a variable for that program in the top-level build file.
Use it so that we only need to search for the program once.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Submodule merges are, in general, similar to other merges based on oid
three-way-merge. When a conflict happens, however, Git has two special
cases (introduced in 68d03e4a6e) on handling the conflict before
yielding it to the user. From the merge-ort and merge-recursive sources:
- "Case #1: a is contained in b or vice versa": both strategies try to
perform a fast-forward in the submodules if the commit referred by the
conflicted submodule is descendant of another;
- "Case #2: There are one or more merges that contain a and b in the
submodule. If there is only one, then present it as a suggestion to the
user, but leave it marked unmerged so the user needs to confirm the
resolution."
Add a small paragraph on merge-strategies.adoc describing this behavior.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we have created an empty .git/branches/ hierarchy until fairly
recently, these directories may be found in modern repositories, but
it is highly unlikely that they are being used.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct the default target in Documentation/Makefile, and
future-proof all Makefiles from similar breakages by declaring the
default target (which happens to be "all") upfront.
* ad/set-default-target-in-makefiles:
Makefile: set default goals in makefiles
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).
* pw/merge-tree-stdin-deadlock-fix:
merge-tree: fix link formatting in html docs
merge-tree: improve docs for --stdin
merge-tree: only use basic merge config
merge-tree: remove redundant code
merge-tree --stdin: flush stdout to avoid deadlock
The documentation of "git commit" and "git rebase" now refer to
commit titles as such, not "subject".
* mh/doc-commit-title-not-subject:
doc: use 'title' consistently
Two configuration variables about SSL authentication material that
weren't mentioned in the documentations are now mentioned.
* ac/doc-http-ssl-type-config:
docs: indicate http.sslCertType and sslKeyType
The "git refs migrate" subcommand converts the backend used for ref
storage. It always migrates reflog data as well as refs. Introduce an
option to exclude reflogs from migration, allowing them to be discarded
when they are unnecessary.
This is particularly useful in server-side repositories, where reflogs
are typically not expected. However, some repositories may still have
them due to historical reasons, such as bugs, misconfigurations, or
administrative decisions to enable reflogs for debugging. In such
repositories, it would be optimal to drop reflogs during the migration.
To address this, introduce the '--no-reflog' flag, which prevents reflog
migration. When this flag is used, reflogs from the original reference
backend are migrated. Since only the new reference backend remains in
the repository, all previous reflogs are permanently discarded.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As some issues that can happen with a Git client can be operating system
specific, it can be useful for a server to know which OS a client is
using. In the same way it can be useful for a client to know which OS
a server is using.
Our current agent capability is in the form of "package/version" (e.g.,
"git/1.8.3.1"). Let's extend it to include the operating system name (os)
i.e in the form "package/version-os" (e.g., "git/1.8.3.1-Linux").
Including OS details in the agent capability simplifies implementation,
maintains backward compatibility, avoids introducing a new capability,
encourages adoption across Git-compatible software, and enhances
debugging by providing complete environment information without affecting
functionality. The operating system name is retrieved using the 'sysname'
field of the `uname(2)` system call or its equivalent.
However, there are differences between `uname(1)` (command-line utility)
and `uname(2)` (system call) outputs on Windows. These discrepancies
complicate testing on Windows platforms. For example:
- `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\
.2024-02-14.20:17.UTC.x86_64
- `uname(2)` output: Windows.10.0.20348
On Windows, uname(2) is not actually system-supplied but is instead
already faked up by Git itself. We could have overcome the test issue
on Windows by implementing a new `uname` subcommand in `test-tool`
using uname(2), but except uname(2), which would be tested against
itself, there would be nothing platform specific, so it's just simpler
to disable the tests on Windows.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list --missing=" learned to accept "print-info" that gives
known details expected of the missing objects, like path and type.
* jt/rev-list-missing-print-info:
rev-list: extend print-info to print missing object type
rev-list: add print-info action to print missing object path
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time. "git backfill" is introduced
to help bulk-download necessary files beforehand.
* ds/backfill:
backfill: assume --sparse when sparse-checkout is enabled
backfill: add --sparse option
backfill: add --min-batch-size=<n> option
backfill: basic functionality and tests
backfill: add builtin boilerplate
A previous commit introduced a "promisor.acceptFromServer" configuration
variable with only "None" or "All" as valid values.
Let's introduce "KnownName" and "KnownUrl" as valid values for this
configuration option to give more choice to a client about which
promisor remotes it might accept among those that the server advertised.
In case of "KnownName", the client will accept promisor remotes which
are already configured on the client and have the same name as those
advertised by the client. This could be useful in a corporate setup
where servers and clients are trusted to not switch names and URLs, but
where some kind of control is still useful.
In case of "KnownUrl", the client will accept promisor remotes which
have both the same name and the same URL configured on the client as the
name and URL advertised by the server. This is the most secure option,
so it should be used if possible.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a server S knows that some objects from a repository are available
from a promisor remote X, S might want to suggest to a client C cloning
or fetching the repo from S that C may use X directly instead of S for
these objects.
Note that this could happen both in the case S itself doesn't have the
objects and borrows them from X, and in the case S has the objects but
knows that X is better connected to the world (e.g., it is in a
$LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections)
than S. Implementation of the latter case, which would require S to
omit in its response the objects available on X, is left for future
improvement though.
Then C might or might not, want to get the objects from X. If S and C
can agree on C using X directly, S can then omit objects that can be
obtained from X when answering C's request.
To allow S and C to agree and let each other know about C using X or
not, let's introduce a new "promisor-remote" capability in the
protocol v2, as well as a few new configuration variables:
- "promisor.advertise" on the server side, and:
- "promisor.acceptFromServer" on the client side.
By default, or if "promisor.advertise" is set to 'false', a server S will
not advertise the "promisor-remote" capability.
If S doesn't advertise the "promisor-remote" capability, then a client C
replying to S shouldn't advertise the "promisor-remote" capability
either.
If "promisor.advertise" is set to 'true', S will advertise its promisor
remotes with a string like:
promisor-remote=<pr-info>[;<pr-info>]...
where each <pr-info> element contains information about a single
promisor remote in the form:
name=<pr-name>[,url=<pr-url>]
where <pr-name> is the urlencoded name of a promisor remote and
<pr-url> is the urlencoded URL of the promisor remote named <pr-name>.
For now, the URL is passed in addition to the name. In the future, it
might be possible to pass other information like a filter-spec that the
client may use when cloning from S, or a token that the client may use
when retrieving objects from X.
It is C's responsibility to arrange how it can reach X though, so pieces
of information that are usually outside Git's concern, like proxy
configuration, must not be distributed over this protocol.
It might also be possible in the future for "promisor.advertise" to have
other values. For example a value like "onlyName" could prevent S from
advertising URLs, which could help in case C should use a different URL
for X than the URL S is using. (The URL S is using might be an internal
one on the server side for example.)
By default or if "promisor.acceptFromServer" is set to "None", C will
not accept to use the promisor remotes that might have been advertised
by S. In this case, C will not advertise any "promisor-remote"
capability in its reply to S.
If "promisor.acceptFromServer" is set to "All" and S advertised some
promisor remotes, then on the contrary, C will accept to use all the
promisor remotes that S advertised and C will reply with a string like:
promisor-remote=<pr-name>[;<pr-name>]...
where the <pr-name> elements are the urlencoded names of all the
promisor remotes S advertised.
In a following commit, other values for "promisor.acceptFromServer" will
be implemented, so that C will be able to decide the promisor remotes it
accepts depending on the name and URL it received from S. So even if
that name and URL information is not used much right now, it will be
needed soon.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first line of a commit message is variously called 'title' or
'subject'.
Prefer 'title' unless discussing email.
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the html documentation the link to the "OUTPUT" section is surrounded
by square brackets. Fix this by adding explicit link text to the cross
reference.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a section for --stdin in the list of options and document that it
implies -z so readers know how to parse the output. Also correct the
merge status documentation for --stdin as if the status is less than
zero "git merge-tree" dies before printing it.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly set the default goal at the very top of various makefiles.
This is already present in some makefiles, but not all of them.
In particular, this corrects a regression introduced in a38edab7c8
(Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06). That
commit added some config files as build targets for the Documentation
directory, and put the target configuration in a sensible place.
Unfortunately, that sensible place was above any other build target
definitions, meaning the default goal changed to being those
configuration files only, rather than the HTML and man page
documentation.
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.
* tc/clone-single-revision:
builtin/clone: teach git-clone(1) the --revision= option
parse-options: introduce die_for_incompatible_opt2()
clone: introduce struct clone_opts in builtin/clone.c
clone: add tags refspec earlier to fetch refspec
clone: refactor wanted_peer_refs()
clone: make it possible to specify --tags
clone: cut down on global variables in clone.c
All the documentation .txt files have been renamed to .adoc to help
content aware editors.
* bc/doc-adoc-not-txt:
Remove obsolete ".txt" extensions for AsciiDoc files
doc: use .adoc extension for AsciiDoc files
gitattributes: mark AsciiDoc files as LF-only
editorconfig: add .adoc extension
doc: update gitignore for .adoc extension
When 'remote.<name>.followRemoteHEAD' was added in b7f7d16562 (fetch:
add configuration for set_head behaviour, 2024-11-29), its description
was added to remote.txt in between the two paragraphs describing
'remote.<name>.serverOption'. Reunite these two paragraphs.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.
* da/help-autocorrect-one-fix:
help: add "show" as a valid configuration value
help: show the suggested command when help.autocorrect is false
"[help] autocorrect = 1" used to be a way to say "please wait for
0.1 second after suggesting a typofix of the command name before
running that command"; now it means "yes, if there is a plausible
typofix for the command name, please run it immediately".
* sc/help-autocorrect-one:
help: interpret boolean string values for help.autocorrect
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
The -X renormalize (or merge.renormalize config) option is intended to
reduce conflicts due to normalization of newer versions of history. It
does so by renormalizing files that it is about to do a three-way
content merge on. Some folks thought it would renormalize all files
throughout the tree, and the previous wording wasn't clear enough to
dispell that misconception. Update the docs to make it clear that the
merge machinery will only apply renormalization to files which need a
three-way content merge.
(Technically, the merge machinery also does renormalization on
modify/delete conflicts, in order to see if the modification was merely
a normalization; if so, it can accept the delete and not report a
conflict. But it's not clear that this piece needs to be explained to
users, and trying to distinguish it might feel like splitting hairs and
overcomplicating the explanation, so we leave it out.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not seem to centrally document exhaustively ways to spell
Boolean values.
The description in the Environment Variables of git(1) section
assumes that the reader is already familiar with how "Boolean valued
configuration variables" are specified, without referring to
anything, so there is no way for the readers to find out more.
The description of `bool` in the section on "--type
<type>" in "git config --help" might be the place to do so, but it
is not telling us all that much.
The description of Boolean valued placeholders in the pretty formats
section of "git log --help" enumerates the possible values with "etc."
implying there may be other synonyms; shrink the list of samples and
instead refer to the canonical and authoritative source of truth, which
now is git-config(1).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git v2.44.0 support for 'git archive' over HTTP protocol
was added, but it was nowhere documented how it should be
enabled in git-http-backend.
Add missing documentation.
Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.
Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).
Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.
Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].
To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.
[1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
0a01d41ee4 (http: add support for different sslcert and sslkey types.,
2023-03-20) added useful SSL config options, but did not document them.
Signed-off-by: Andrew Carter <andrew@emailcarter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.
Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.
For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.
However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.
Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.
However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.
Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.
This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.
Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.
To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are
1. git clone --filter=blob:none --sparse <url>
2. git sparse-checkout set <dir1> ... <dirN>
3. git backfill --sparse
For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3 | 110 MB | |
| 10K | 12 | 192 MB | 17.2s |
| 15K | 9 | 192 MB | 15.5s |
| 20K | 8 | 192 MB | 15.5s |
| 25K | 7 | 192 MB | 14.7s |
This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.
Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2 | 1,876 MB | |
| 10K | 11 | 2,187 MB | 46s |
| 25K | 7 | 2,188 MB | 43s |
| 50K | 5 | 2,194 MB | 44s |
| 100K | 4 | 2,194 MB | 48s |
This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.
Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.
We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.
To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:
1. git clone --filter=blob:none <url>
2. git backfill
Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.
For the Git repository, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2 | 119 MB | |
| 25K | 8 | 290 MB | 24s |
| 50K | 5 | 290 MB | 24s |
| 100K | 4 | 290 MB | 29s |
Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.
For the nodejs/node repository, we see these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 330 MB | |
| 25K | 19 | 1,222 MB | 1m 22s |
| 50K | 11 | 1,221 MB | 1m 24s |
| 100K | 7 | 1,223 MB | 1m 40s |
| 250K | 4 | 1,224 MB | 2m 23s |
| 500K | 3 | 1,216 MB | 4m 38s |
Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.
To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 105 MB | |
| 5K | 53 | 348 MB | 2m 26s |
| 10K | 28 | 365 MB | 2m 22s |
| 15K | 19 | 407 MB | 2m 21s |
| 20K | 15 | 393 MB | 2m 28s |
| 25K | 13 | 417 MB | 2m 06s |
| 50K | 8 | 509 MB | 1m 34s |
| 100K | 5 | 535 MB | 1m 56s |
| 250K | 4 | 698 MB | 1m 33s |
| 500K | 3 | 696 MB | 1m 42s |
Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.
This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.
Finally, the same experiment for the Linux repository had these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2 | 2,153 MB | |
| 25K | 63 | 6,380 MB | 14m 08s |
| 50K | 58 | 6,126 MB | 15m 11s |
| 100K | 30 | 6,135 MB | 18m 11s |
| 250K | 14 | 6,146 MB | 18m 22s |
| 500K | 8 | 6,143 MB | 33m 29s |
Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.
Based on these experiments, a batch size of 50,000 was chosen as the
default value.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.
The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.
The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.
This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin. Mark the builtin as experimental at
this time, allowing breaking changes in the near future, if necessary.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a literal value for showing the suggested autocorrection
for consistency with the rest of the help.autocorrect options.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the handling of false boolean values for help.autocorrect
consistent with the handling of value 0 by showing the suggested
commands but not running them.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.
* ps/3.0-remote-deprecation:
remote: announce removal of "branches/" and "remotes/"
builtin/pack-redundant: remove subcommand with breaking changes
ci: repurpose "linux-gcc" job for deprecations
ci: merge linux-gcc-default into linux-gcc
Makefile: wire up build option for deprecated features
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.
* jc/show-index-h-update:
show-index: the short help should say the command reads from its input
Doc updates.
* ja/doc-commit-markup-updates:
doc: migrate git-commit manpage secondary files to new format
doc: convert git commit config to new format
doc: make more direct explanations in git commit options
doc: the mode param of -u of git commit is optional
doc: apply new documentation guidelines to git commit
Introduce a new API to visit objects in batches based on a common
path, or by type.
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.
This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.
For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| --name-hash-version=1 | 127 MB | 129s |
| --name-hash-version=2 | 127 MB | 112s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.
Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differentiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences In the parent directory.
* Localization files frequently have common filenames but
differentiates via parent directories. In C#, the name
"/strings.resx.lcl" is used for these localization files and they
will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
One open-source example is the fluentui [2] repo, which uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| --name-hash-version=1 | 438 MB | 728s |
| --name-hash-version=2 | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui | 440 MB | 161 MB |
| Repo B | 6,248 MB | 856 MB |
| Repo C | 37,278 MB | 6,755 MB |
| Repo D | 131,204 MB | 7,463 MB |
Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.
It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.
By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.
Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The trailer.* configuration variables are currently only described in
git-interpret-trailers(1) but affect git-commit and git-tag as well.
Move that section into its own config/trailer.txt file and also include
it in git-config(1).
Signed-off-by: Julian Prein <julian@druckdev.xyz>
Acked-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.
Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that it is insecure to use Personal Access Tokens, which
some hosting providers take as username/password, embedded in URLs.
* mh/doc-credential-helpers-with-pat:
docs: discuss caching personal access tokens
docs: list popular credential helpers
The meson build procedure for Documentation/technical/ hierarchy was
missing necessary dependencies, which has been corrected.
* sj/meson-doc-technical-dependency-fix:
meson: fix missing deps for technical articles
Since we no longer have any AsciiDoc files that end in ".txt", don't
modify them with .gitattributes or ignore them with .gitignore.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We presently use the ".txt" extension for our AsciiDoc files. While not
wrong, most editors do not associate this extension with AsciiDoc,
meaning that contributors don't get automatic editor functionality that
could be useful, such as syntax highlighting and prose linting.
It is much more common to use the ".adoc" extension for AsciiDoc files,
since this helps editors automatically detect files and also allows
various forges to provide rich (HTML-like) rendering. Let's do that
here, renaming all of the files and updating the includes where
relevant. Adjust the various build scripts and makefiles to use the new
extension as well.
Note that this should not result in any user-visible changes to the
documentation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We presently use the ".txt" extension for our AsciiDoc files. While not
wrong, most editors do not associate this extension with AsciiDoc,
meaning that contributors don't get automatic editor functionality that
could be useful, such as syntax highlighting and prose linting.
Instead, in a future commit, we're going to move to using the more
common ".adoc" extension for these files, which many editors
intrinsically recognize as an AsciiDoc file. To avoid contributors
accidentally checking in generated files, ignore the new extension for
generated files in the documentation .gitignore files.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We centrally explain that "--no-whatever" is the way to countermand
the "--whatever" option. Explain that a configured default and the
value specified by an environment variable can be overridden by the
corresponding command line option, too.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also prevent git-commit manpage to refer to itself in the config
description by using a variable.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use imperative mood
- make use of the placeholder format to simplify style
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the synopsis to reflect the option description.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- use _<placeholder>_ instead of <placeholder> in the description
- use `backticks for keywords and more complex option
descriptions`. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need an explicit `depends: documentation_deps` so that all of our
Documentation targets know they require asciidoc.conf. This shows up
as parallel build failures with it not yet being available.
Other targets look OK already.
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A help.autocorrect value of 1 is currently interpreted as "wait 1
decisecond", which can be confusing to users who believe they are setting a
boolean value to turn the autocorrect feature on.
Interpret the value of help.autocorrect as either one of the accepted list
of special values ("never", "immediate", ...), a boolean or an integer. If
the value is 1, it is no longer interpreted as a decisecond value of 0.1s
but as a true boolean, the equivalent of "immediate". If the value is 2 or
more, continue treating it as a decisecond wait time.
False boolean string values ("off", "false", "no") are now equivalent to
"never", meaning that guessed values are still shown but nothing is
executed. True boolean string values are interpreted as "immediate".
Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a 'synopsis' block which will
automatically format placeholders in italics and keywords in
monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use backticks for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
While at it, also convert an option description to imperative mood.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Describe problems storing personal access tokens in git-credential-cache
and suggest alternatives.
Research suggests that many users are confused about this:
> the point of passwords is that (ideally) you memorise them [so]
> they're never stored anywhere in plain text. Yet GitHub's personal
> access token system seems to basically force you to store the token in
> plain text?
https://stackoverflow.com/questions/46645843/where-to-store-my-git-personal-access-token#comment89963004_46645843
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-credential-store saves credentials unencrypted on disk. It is the
least secure choice of credential helper. Nevertheless, it appears
several times more popular than any other credential helper [1].
Inform users about more secure alternatives.
[1] https://stackoverflow.com/questions/35942754/how-can-i-save-username-and-password-in-git
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2a9dfdf260 (difftool docs: de-duplicate configuration sections, 2022-09-07)
moved the difftool documentation, but missed moving this "include" line that
includes the generated list of diff tools, as referenced in the moved text.
Restore the correct position of the included list.
Signed-off-by: Adam Johnson <me@adamj.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the `CodingGuidlines`, since these placeholders are literal
they should be typeset verbatim, so fix some that aren't.
Signed-off-by: Matthew Hughes <matthewhughes934@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct verb tense, add missing words, avoid double blank lines,
and rephrase things that don’t read well to me like “Turn this linkage
to relative paths”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1bc1e94091 (doc: option value may be separate for valid reasons,
2024-11-25) added a paragraph discussing tilde-expansion of, e.g.,
~/directory/file.
The tilde character has a special meaning to asciidoc tools. In this
particular case, AsciiDoc matches up the two tildes in "e.g.
~/directory/file or ~u/d/f" and sets the text between them using
subscript. In the manpage, where subscripting is not possible, this
renders as "e.g. /directory/file oru/d/f".
These paths are literal values, which our coding guidelines want typeset
as verbatim using backticks. Do that. One effect of this is indeed that
the asciidoc tools stop interpreting tilde and other special characters.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two-line heading added in 8525e92886 (Document HOME environment
variable, 2024-12-09) uses too many tilde characters, so the heading
isn't detected as such. Both AsciiDoc and Asciidoctor end up
misrendering this in different ways.
Use the correct number of tilde characters to fix this.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure based on meson learned to generate HTML
documention pages.
* ps/build-meson-html:
Documentation: wire up sanity checks for Meson
t/Makefile: make "check-meson" work with Dash
meson: install static files for HTML documentation
meson: generate articles
Documentation: refactor "howto-index.sh" for out-of-tree builds
Documentation: refactor "api-index.sh" for out-of-tree builds
meson: generate user manual
Documentation: inline user-manual.conf
meson: generate HTML pages for all man page categories
meson: fix generation of merge tools
meson: properly wire up dependencies for our docs
meson: wire up support for AsciiDoctor
The developer documentation has been updated to give the latest
info on gitk and git-gui maintainer.
* as/gitk-git-gui-repo-update:
Update the official repo of gitk
Wire up sanity checks for Meson to verify that no man pages are missing.
This check is similar to the same check we already have for our tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we generate man pages, articles and user manual with Meson the
only thing that is still missing in an installation of HTML documents is
a couple of static files. Wire these up to finalize Meson's support for
generating HTML documentation.
Diffing an installation that uses our Makefile with an installation that
uses Meson only surfaces a couple of discepancies now:
- Meson doesn't install "everyday.html" and "git-remote-helpers.html".
These files are marked as obsolete and don't contain any useful
information anymore: they simply point to their modern equivalents.
- Meson doesn't install "*.txt" files when asking for HTML docs. I'm
not sure why our Makefiles do this in the first place, and it does
seem like the resulting installation is fully functional even
without those files.
Other than that, both layout and file contents are the exact same.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the Meson build system already knows to generate man pages and our
user manual, it does not yet generate the random assortment of articles
that we have. Plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "howto-index.sh" is used to generate an index of our how-to docs. It
receives as input the paths to these documents, which would typically be
relative to the "Documentation/" directory in Makefile-based builds. In
an out-of-tree build though it will get relative that may be rooted
somewhere else entirely.
The file paths do end up in the generated index, and the expectation is
that they should always start with "howto/". But for out-of-tree builds
we would populate it with the paths relative to the build directory,
which is wrong.
Fix the issue by using `$(basename "$file")` to generate the path. While
at it, move the script into "howto/" to align it with the location of
the comparable "api-index.sh" script.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "api-index.sh" script generates an index of API-related
documentation. The script does not handle out-of-tree builds and thus
cannot be used easily by Meson.
Refactor it to be independent of locations by both accepting a source
directory where the API docs live as well as a path to an output file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation contains a user manual that gives people a short
introduction to Git. Our Makefile knows to generate the manual into
three different formats: an HTML page, a PDF and an info page. The Meson
build instructions don't yet generate any of these.
While wiring up all these formats I hit a couple of road blocks with how
we generate our info pages. Even though I eventually resolved these, it
made me question whether anybody actually uses info pages in the first
place. Checking through a couple of downstream consumers I couldn't find
a single user of either the info pages nor of our PDF manual in Arch
Linux, Debian, Fedora, Ubuntu, FreeBSD or OpenBSDFedora. So it's rather
safe to assume that there aren't really any users out there, and thus
the added complexity does not seem worth it.
Wire up support for building the user manual in HTML format and
conciously skip over the other two formats. This is basically a form of
silent deprecation: if people out there use the other two formats they
will eventually complain about them missing in Meson, which means we can
wire them up at a later point. If they don't we can phase out these
formats eventually.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating our user manual we set up a bit of extra configuration
compared to our normal configuration. This is done by having an extra
"user-manual.conf" file that Asciidoc seems to pull in automatically due
to matching filenames with "user-manual.txt". This dependency is quite
hidden though and thus easy to miss. Furthermore, it seems that Asciidoc
does not know to pull it in for out-of-tree builds where we use relative
paths.
The setup in AsciiDoctor is somewhat different: instead of having two
sets of configuration, we condition the use of manual-specific configs
based on whether the document type is "book". And as we only build our
user manual with that type this is sufficient.
Use the same trick for our user manual by inlining the configuration
into "asciidoc.conf.in" and making it conditional on whether or not
"doctype-book" is defined.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating HTML pages for our man pages we only generate them for
category 1 in Meson, which are the pages corresponding to our built-in
commands. I cannot tell why I added this filter though: our Makefile
installs all man pages, so a Meson-based build misses out on many of
them.
Fix this by removing the filter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our buildsystems generate a list of diff and merge tools that ultimately
end up in our documentation. And while Meson does wire up the logic, it
tries to use the TOOL_MODE environment variable to set up the mode. This
is wrong though: the mode is set via an argument that we have fixed to
'diff' mode by accident.
Fix this such that merge tools are properly generated.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of Meson documentation targets use `meson.current_source_dir()`
to resolve inputs. This has the downside that it does not automagically
make Meson track these inputs as a dependency. After all, string
arguments really can be anything, even if they happen to match an actual
filesystem path.
Adapt these build targets to instead use inputs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While our Makefile supports both Asciidoc and AsciiDoctor, our Meson
build instructions only support the former. Wire up support for the
latter, as well.
Our Makefile always favors Asciidoc, but Meson will automatically figure
out which of both to use based on whether they are installed or not. To
keep compatibility with our Makefile it favors Asciidoc over Asciidoctor
in case both are available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Point out:
- current maintaner
- contribution flow is via the mailing list
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's wait for git-gui, gitk, and possibly po/ and delay the tagging
of the -rc1. Many people are already offline for the end-of-year
holidays and it is a slow week, and 'master' front has too many new
things graduated from 'next' a bit too early for me to feel
comfortable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git refs migrate" learned to also migrate the reflog data across
backends.
* kn/reflog-migration:
refs: mark invalid refname message for translation
refs: add support for migrating reflogs
refs: allow multiple reflog entries for the same refname
refs: introduce the `ref_transaction_update_reflog` function
refs: add `committer_info` to `ref_transaction_add_update()`
refs: extract out refname verification in transactions
refs/files: add count field to ref_lock
refs: add `index` field to `struct ref_udpate`
refs: include committer info in `ref_update` struct
A topic to optionally build with meson, which has graduated to
'master' recently, broke Documentation pipeline with asciidoctor
for the normal Makefile build as well as meson-based one, which
have been corrected.
* ma/asciidoctor-build-fixes:
asciidoctor-extensions.rb.in: inject GIT_DATE
asciidoctor-extensions.rb.in: add missing word
asciidoctor-extensions.rb.in: delete existing <refmiscinfo/>
A topic to optionally build with meson, which has graduated to
'master' recently, has regressed the normal Makefile build, which
is being corrected.
* ps/build-hotfix:
meson: add options to override build information
GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
GIT-VERSION-GEN: fix overriding GIT_VERSION
Makefile: introduce template for GIT-VERSION-GEN
Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Makefile: stop including "GIT-VERSION-FILE" in docs
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.
* js/range-diff-diff-merges:
range-diff: introduce the convenience option `--remerge-diff`
range-diff: optionally include merge commits' diffs in the analysis
After a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06), we no longer inject GIT_DATE when building with
Asciidoctor.
Replace the <date/> tag in the XML to inject the value of GIT_DATE.
Unlike <refmiscinfo/> as handled in a recent commit, we have no reason
to expect that this tag might be missing, so there's no need for "maybe
remove, then add" and we can just outright replace the one that
Asciidoctor has generated based on the mtime of the source file.
Compared to pre-a38edab7c8, we now end up injecting this also in the
build of Git.3pm, which until now has been using the mtime of Git.pm.
That is arguably even a good change since it results in more
reproducible builds.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06) stopped providing an attribute value "Git $(GIT_VERSION)" to
asciidoc/Asciidoctor over the command line. Instead, we now provide the
attribute to asciidoc through a generated asciidoc.conf, where the value
is generated as "Git @GIT_VERSION@".
In the similar mechanism for Asciidoctor, we forgot the "Git" prefix.
Restore it.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After the recent a38edab7c8 (Makefile: generate doc versions via
GIT-VERSION-GEN, 2024-12-06), building with Asciidoctor results in
manpages where the headers no longer contain "Git Manual" and the
footers no longer identify the built Git version.
Before a38edab7c8, we used to just provide a few attributes to
Asciidoctor (and asciidoc). Commit 7a30134358 (asciidoctor-extensions:
provide `<refmiscinfo/>`, 2019-09-16) noted that older versions of
Asciidoctor didn't propagate those attributes into the built XML files,
so we started injecting them ourselves from this script. With newer
versions of Asciidoctor, we'd end up with some harmless duplication
among the tags in the final XML.
Post-a38edab7c8, we don't provide these attributes and Asciidoctor
inserts empty-ish values. After our additions from 7a30134358, we get
<refmiscinfo class="source"> </refmiscinfo>
<refmiscinfo class="manual"> </refmiscinfo>
<refmiscinfo class="source">2.47.1.[...]</refmiscinfo>
<refmiscinfo class="manual">Git Manual</refmiscinfo>
When these are handled, it appears to be first come first served,
meaning that our additions have no effect and we regress as described in
the first paragraph.
Remove existing "source" or "manual" <refmiscinfo/> tags before adding
ours. I considered removing all <refmiscinfo/> to get a nice clean
slate, instead of just those two that we want to replace to be a bit
more precise. I opted for the latter. Maybe one day, Asciidoctor learns
to insert something useful there which `xmlto` can pick up and make good
use of -- let's not interfere.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/build-hotfix:
meson: add options to override build information
GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
GIT-VERSION-GEN: fix overriding GIT_VERSION
Makefile: introduce template for GIT-VERSION-GEN
Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Makefile: stop including "GIT-VERSION-FILE" in docs
The short help text given by "git show-index -h" says
$ git show-index -h
usage: git show-index [--object-format=<hash-algorithm>]
--[no-]object-format <hash-algorithm>
specify the hash algorithm to use
The command takes a pack .idx file from its standard input. The
user has to _know_ this, as there is no indication from this output.
Give a hint that the data to work on is fed from its standard input.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We inject various different kinds of build information into build
artifacts, like the version string or the commit from which Git was
built. Add options to let users explicitly override this information
with Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GIT-VERSION-GEN tries to derive the version that Git is being built from
via multiple different sources in the following order:
1. A file called "version" in the source tree's root directory, if it
exists.
2. The current commit in case Git is built from a Git repository.
3. Otherwise, we use a fallback version stored in a variable which is
bumped whenever a new Git version is getting tagged.
It used to be possible to override the version by overriding the
`GIT_VERSION` Makefile variable (e.g. `make GIT_VERSION=foo`). This
worked somewhat by chance, only: `GIT-VERSION-GEN` would write the
actual Git version into `GIT-VERSION-FILE`, not the overridden value,
but when including the file into our Makefile we would not override the
`GIT_VERSION` variable because it has already been set by the user. And
because our Makefile used the variable to propagate the version to our
build tools instead of using `GIT-VERSION-FILE` the resulting build
artifacts used the overridden version.
But that subtle mechanism broke with 4838deab65 (Makefile: refactor
GIT-VERSION-GEN to be reusable, 2024-12-06) and subsequent commits
because the version information is not propagated via the Makefile
variable anymore, but instead via the files that `GIT-VERSION-GEN`
started to write. And as the script never knew about the `GIT_VERSION`
environment variable in the first place it uses one of the values listed
above instead of the overridden value.
Fix this issue by making `GIT-VERSION-GEN` handle the case where
`GIT_VERSION` has been set via the environment.
Note that this requires us to introduce a new GIT_VERSION_OVERRIDE
variable that stores a potential user-provided value, either via the
environment or via "config.mak". Ideally we wouldn't need it and could
just continue to use GIT_VERSION for this. But unfortunately, Makefiles
will first include all sub-Makefiles before figuring out whether it
needs to re-make any of them [1]. Consequently, if there already is a
GIT-VERSION-FILE, we would have slurped in its value of GIT_VERSION
before we call GIT-VERSION-GEN, and because GIT-VERSION-GEN now uses
that value as an override it would mean that the first generated value
for GIT_VERSION will remain unchanged.
Furthermore we have to move the include for "GIT-VERSION-FILE" after the
includes for "config.mak" and related so that GIT_VERSION_OVERRIDE can
be set to the value provided by "config.mak".
[1]: https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new template to call GIT-VERSION-GEN. This will allow us to
iterate on how exactly the script is called in subsequent commits
without having to adapt all call sites every time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the callsites of GIT-VERSION-GEN generate the target file with a
"+" suffix first and then move the file into place when the new contents
are different compared to the old contents. This allows us to avoid a
needless rebuild by not updating timestamps of the target file when its
contents will remain unchanged anyway.
In fact though, this exact logic is already handled in GIT-VERSION-GEN,
so doing this manually is pointless. This is a leftover from an earlier
version of 4838deab65 (Makefile: refactor GIT-VERSION-GEN to be
reusable, 2024-12-06), where the script didn't handle that logic for us.
Drop the needless indirection.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include "GIT-VERSION-FILE" in our docs Makefile, but don't actually
use the "GIT_VERSION" variable that it provides. This is a leftover from
the conversion to make "GIT-VERSION-GEN" generate version information
in-place by substituting placeholders in 4838deab65 (Makefile: refactor
GIT-VERSION-GEN to be reusable, 2024-12-06) and subsequent commits,
where all usages of the variable were removed.
Stop including the file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the input rev_info has UNINTERESTING starting points, we want to be
sure that the UNINTERESTING flag is passed appropriately through the
objects. To match how this is done in places such as 'git pack-objects', we
use the mark_edges_uninteresting() method.
This method has an option for using the "sparse" walk, which is similar in
spirit to the path-walk API's walk. To be sure to keep it independent, add a
new 'prune_all_uninteresting' option to the path_walk_info struct.
To check how the UNINTERSTING flag is spread through our objects, extend the
'test-tool path-walk' command to output whether or not an object has that
flag. This changes our tests significantly, including the removal of some
objects that were previously visited due to the incomplete implementation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rev_info that is specified for a path-walk traversal may specify
visiting tag refs (both lightweight and annotated) and also may specify
indexed objects (blobs and trees). Update the path-walk API to walk
these objects as well.
When walking tags, we need to peel the annotated objects until reaching
a non-tag object. If we reach a commit, then we can add it to the
pending objects to make sure we visit in the commit walk portion. If we
reach a tree, then we will assume that it is a root tree. If we reach a
blob, then we have no good path name and so add it to a new list of
"tagged blobs".
When the rev_info includes the "--indexed-objects" flag, then the
pending set includes blobs and trees found in the cache entries and
cache-tree. The cache entries are usually blobs, though they could be
trees in the case of a sparse index. The cache-tree stores
previously-hashed tree objects but these are cleared out when staging
objects below those paths. We add tests that demonstrate this.
The indexed objects come with a non-NULL 'path' value in the pending
item. This allows us to prepopulate the 'path_to_lists' strmap with
lists for these paths.
The tricky thing about this walk is that we will want to combine the
indexed objects walk with the commit walk, especially in the future case
of walking objects during a command like 'git repack'.
Whenever possible, we want the objects from the index to be grouped with
similar objects in history. We don't want to miss any paths that appear
only in the index and not in the commit history.
Thus, we need to be careful to let the path stack be populated initially
with only the root tree path (and possibly tags and tagged blobs) and go
through the normal depth-first search. Afterwards, if there are other
paths that are remaining in the paths_to_lists strmap, we should then
iterate through the stack and visit those objects recursively.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We add the ability to filter the object types in the path-walk API so
the callback function is called fewer times.
This adds the ability to ask for the commits in a list, as well. We
re-use the empty string for this set of objects because these are passed
directly to the callback function instead of being part of the
'path_stack'.
Future changes will add the ability to visit annotated tags.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.
Store and output a 'batch_nr' value so we can demonstrate that the paths are
grouped together in a batch and not following some other ordering. This
allows us to test the depth-first behavior of the path-walk API. However, we
purposefully do not test the order of the objects in the batch, so the
output is compared to the expected output through a sort.
It is important to mention that the behavior of the API will change soon as
we start to handle UNINTERESTING objects differently, but these tests will
demonstrate the change in behavior.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of a few planned applications, introduce the most basic form
of a path-walk API. It currently assumes that there are no UNINTERESTING
objects, and does not include any complicated filters. It calls a function
pointer on groups of tree and blob objects as grouped by path. This only
includes objects the first time they are discovered, so an object that
appears at multiple paths will not be included in two batches.
These batches are collected in 'struct type_and_oid_list' objects, which
store an object type and an oid_array of objects.
The data structures are documented in 'struct path_walk_context', but in
summary the most important are:
* 'paths_to_lists' is a strmap that connects a path to a
type_and_oid_list for that path. To avoid conflicts in path names,
we make sure that tree paths end in "/" (except the root path with
is an empty string) and blob paths do not end in "/".
* 'path_stack' is a string list that is added to in an append-only
way. This stores the stack of our depth-first search on the heap
instead of using recursion.
* 'path_stack_pushed' is a strmap that stores path names that were
already added to 'path_stack', to avoid repeating paths in the
stack. Mostly, this saves us from quadratic lookups from doing
unsorted checks into the string_list.
The coupling of 'path_stack' and 'path_stack_pushed' is protected by the
push_to_stack() method. Call this instead of inserting into these
structures directly.
The walk_objects_by_path() method initializes these structures and
starts walking commits from the given rev_info struct. The commits are
used to find the list of root trees which populate the start of our
depth-first search.
The core of our depth-first search is in a while loop that continues
while we have not indicated an early exit and our 'path_stack' still has
entries in it. The loop body pops a path off of the stack and "visits"
the path via the walk_path() method.
The walk_path() method gets the list of OIDs from the 'path_to_lists'
strmap and executes the callback method on that list with the given path
and type. If the OIDs correspond to tree objects, then iterate over all
trees in the list and run add_children() to add the child objects to
their own lists, adding new entries to the stack if necessary.
In testing, this depth-first search approach was the one that used the
least memory while iterating over the object lists. There is still a
chance that repositories with too-wide path patterns could cause memory
pressure issues. Limiting the stack size could be done in the future by
limiting how many objects are being considered in-progress, or by
visiting blob paths earlier than trees.
There are many future adaptations that could be made, but they are left for
future updates when consumers are ready to take advantage of those features.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>