In the next commit we will start to parse more commits via the
commit-graph. This change will lead to a segfault though because we try
to access the tree of a commit via `repo_get_commit_tree()`, but:
- The commit has been parsed via the commit-graph, and thus its
`maybe_tree` field is not yet populated.
- We cannot use the commit-graph to populate the commit's tree because
we're in the process of writing the commit-graph.
The consequence is that we'll get a `NULL` pointer for the tree in
`write_graph_chunk_data()`.
In theory we are already mindful of this situation, as we explicitly use
`repo_parse_commit_no_graph()` to parse the commit without the help of
the commit-graph. But that doesn't do the trick as the commit is already
marked as parsed, so the function will not re-populate it. And as the
commit-graph has been closed, neither will `get_commit_tree_oid()` be
able to load the tree for us.
It seems like this issue can only be hit under artificial circumstances:
the error was hit via `git_test_write_commit_graph_or_die()`, which is
run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:
$ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
--test-args=-ix -i
...
++ git -c commit.verbose=true commit --amend
hint: Waiting for your editor to close the file...
./test-lib.sh: line 1012: 55895 Segmentation fault (core dumped) git -c commit.verbose=true commit --amend
To the best of my knowledge, this is the only case where we end up
writing a commit-graph in the same process that might have already
consulted the commit-graph to look up arbitrary objects. But regardless
of that, this feels like a bigger accident that is just waiting to
happen.
Make the code more robust by extending `repo_parse_commit_no_graph()` to
unparse a commit first in case we detect it's coming from a graph. This
ensures that we will re-read the object without it, and thus we will
populate `maybe_tree` properly.
This fix shouldn't have any performance consequences: the function is
only ever called in the "commit-graph.c" code, and we'll only re-parse
the commit at most once.
Add an exclusion to our Coccinelle rules so that it doesn't complain
about us accessing `maybe_tree` directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.
The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.
This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
is then marked as uninteresting.
But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:
- We parse all objects, including trees and blobs, even though we
don't even need the contents of them.
- More importantly though, `parse_object()` will cause us to check
whether the object ID matches its contents.
Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.
Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.
The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s]
Range (min … max): 62.191 s … 63.010 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s]
Range (min … max): 38.176 s … 38.551 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.
Despite the speedup though, this also leads to a massive reduction in
allocations. Before:
HEAP SUMMARY:
in use at exit: 352,480,441 bytes in 97,185 blocks
total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
And after:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s]
Range (min … max): 9.095 s … 9.106 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s]
Range (min … max): 9.019 s … 9.188 s 3 runs
Summary
git-receive-pack (rev = HEAD~) ran
1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
This will be improved in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'jx/zh_CN' of github.com:jiangxin/git:
l10n: zh_CN: standardize glossary terms
l10n: zh_CN: updated translation for 2.53
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Add preferred Chinese terminology notes and align existing translations
to the updated glossary. AI-assisted review was used to check and
improve legacy translations.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Replace mixed usage of standard (ASCII) colons ':' with full-width
(wide) colons ':' in Chinese translations to ensure typographic
consistency, as reported by CAESIUS-TIM [1].
Full-width punctuation is preferred in Chinese localization for better
readability and adherence to typesetting conventions.
[1]: https://github.com/git-l10n/git-po/issues/884
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
* 'master' of https://github.com/j6t/git-gui:
git-gui: mark *.po files at any directory level as UTF-8
git-gui i18n: Update Bulgarian translation (558t)
git-gui i18n: Update Bulgarian translation (557t)
When a commit is viewed in Gitk that changes a file in po/glossary, the
patch text shows mojibake instead of correctly decoded UTF-8 text.
Gitk retrieves the encoding attribute to decide how to treat the bytes
that make up the patch text. There is an attribute definition that all
files are US-ASCII, and a later attribute definition overrides this.
But the override, which specifies UTF-8, applies only to *.po files in
directory po/ and does not apply to subdirectories.
Widen the pattern to apply to all directory levels.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
- Translate new string (558t)
- Add graves for disambiguation
- Improve glossary translation (96t) and synchonize with git
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Upstream symbolic link support on Windows from Git-for-Windows.
* js/symlink-windows:
mingw: special-case index entries for symlinks with buggy size
mingw: emulate `stat()` a little more faithfully
mingw: try to create symlinks without elevated permissions
mingw: add support for symlinks to directories
mingw: implement basic `symlink()` functionality (file symlinks only)
mingw: implement `readlink()`
mingw: allow `mingw_chdir()` to change to symlink-resolved directories
mingw: support renaming symlinks
mingw: handle symlinks to directories in `mingw_unlink()`
mingw: add symlink-specific error codes
mingw: change default of `core.symlinks` to false
mingw: factor out the retry logic
mingw: compute the correct size for symlinks in `mingw_lstat()`
mingw: teach dirent about symlinks
mingw: let `mingw_lstat()` error early upon problems with reparse points
mingw: drop the separate `do_lstat()` function
mingw: implement `stat()` with symlink support
mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
Dscho observed that SVN tests are taking too much time in CI leak
checking tasks, but most time is spent not in our code but in libsvn
code (which happen to be written in Perl), whose leaks have little
value to discover for us. Skip SVN, P4, and CVS tests in the leak
checking tasks.
* js/ci-leak-skip-svn:
ci: skip CVS and P4 tests in leaks job, too
ci(*-leaks): skip the git-svn tests to save time
"git bugreport" and "git version --build-options" learned to
include use of 'gettext' feature, to make it easier to diagnose
problems around l10n.
* jx/build-options-gettext:
help: report on whether or not gettext is enabled
Remove implicit reliance on the_repository global in the APIs
around tree objects and make it explicit which repository to work
in.
* rs/tree-wo-the-repository:
cocci: remove obsolete the_repository rules
cocci: convert parse_tree functions to repo_ variants
tree: stop using the_repository
tree: use repo_parse_tree()
path-walk: use repo_parse_tree_gently()
pack-bitmap-write: use repo_parse_tree()
delta-islands: use repo_parse_tree()
bloom: use repo_parse_tree()
add-interactive: use repo_parse_tree_indirect()
tree: add repo_parse_tree*()
environment: move access to core.maxTreeDepth into repo settings
The logic that avoids reusing MIDX files with a wrong checksum was
broken, which has been corrected.
* tb/midx-write-corrupt-checksum-fix:
midx-write.c: assume checksum-invalid MIDXs require an update
t/t5319-multi-pack-index.sh: drop early 'test_done'
"git repack --geometric" did not work with promisor packs, which
has been corrected.
* ps/geometric-repacking-with-promisor-remotes:
builtin/repack: handle promisor packs with geometric repacking
repack-promisor: extract function to remove redundant packs
repack-promisor: extract function to finalize repacking
repack-geometry: extract function to compute repacking split
builtin/pack-objects: exclude promisor objects with "--stdin-packs"
The latest release candidate notes say that there is a new contributor:
Jean-Noël Avila via GitGitGadget, ...
But this is a familiar face, just in a G.G. Gadget trench coat.
Also map the rest of the idents in the history.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>