kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Jeff King	86b008ee61	t: add library for munging chunk-format files When testing corruption of files using the chunk format (like commit-graphs and midx files), it's helpful to be able to modify bytes in specific chunks. This requires being able both to read the table-of-contents (to find the chunk to modify) but also to adjust it (to account for size changes in the offsets of subsequent chunks). We have some tests already which corrupt chunk files, but they have some downsides: 1. They are very brittle, as they manually compute the expected size of a particular instance of the file (e.g., see the definitions starting with NUM_OBJECTS in t5319). 2. Because they rely on manual offsets and don't read the table-of-contents, they're limited to overwriting bytes. But there are many interesting corruptions that involve changing the sizes of chunks (especially smaller-than-expected ones). This patch adds a perl script which makes such corruptions easy. We'll use it in subsequent patches. Note that we could get by with just a big "perl -e" inside the helper function. I chose to put it in a separate script for two reasons. One, so we don't have to worry about the extra layer of shell quoting. And two, the script is kind of big, and running the tests with "-x" would repeatedly dump it into the log output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	570b8b8836	chunk-format: note that pair_chunk() is unsafe The pair_chunk() function is provided as an easy helper for parsing chunks that just want a pointer to a set of bytes. But every caller has a hidden bug: because we return only the pointer without the matching chunk size, the callers have no clue how many bytes they are allowed to look at. And as a result, they may read off the end of the mmap'd data when the on-disk file does not match their expectations. Since chunk files are typically used for local-repository data like commit-graph files and midx's, the security implications here are pretty mild. The worst that can happen is that you hand somebody a corrupted repository tarball, and running Git on it does an out-of-bounds read and crashes. So it's worth being more defensive, but we don't need to drop everything and fix every caller immediately. I noticed the problem because the pair_chunk_fn() callback does not look at its chunk_size argument, and wanted to annotate it to silence -Wunused-parameter. We could do that now, but we'd lose the hint that this code should be audited and fixed. So instead, let's set ourselves up for going down that path: 1. Provide a pair_chunk() function that does return the size, which prepares us for fixing these cases. 2. Rename the existing function to pair_chunk_unsafe(). That gives us an easy way to grep for cases which still need to be fixed, and the name should cause anybody adding new calls to think twice before using it. There are no callers of the "safe" version yet, but we'll add some in subsequent patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Victoria Dye	2cdb796101	files-backend.c: avoid stat in 'loose_fill_ref_dir' Modify the 'readdir' loop in 'loose_fill_ref_dir' to, rather than 'stat' a file to determine whether it is a directory or not, use 'get_dtype'. Currently, the loop uses 'stat' to determine whether each dirent is a directory itself or not in order to construct the appropriate ref cache entry. If 'stat' fails (returning a negative value), the dirent is silently skipped; otherwise, 'S_ISDIR(st.st_mode)' is used to check whether the entry is a directory. On platforms that include an entry's d_type in in the 'dirent' struct, this extra 'stat' check is redundant. We can use the 'get_dtype' method to extract this information on platforms that support it (i.e. where NO_D_TYPE_IN_DIRENT is unset), and derive it with 'stat' on platforms that don't. Because 'stat' is an expensive call, this confers a modest-but-noticeable performance improvement when iterating over large numbers of refs (approximately 20% speedup in 'git for-each-ref' in a 30k ref repo). Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set to 1 to replicate the existing handling of symlink dirents. This unfortunately requires calling 'stat' on the associated entry regardless of platform, but symlinks in the loose ref store are highly unlikely since they'd need to be created manually by a user. Note that this patch also changes the condition for skipping creation of a ref entry from "when 'stat' fails" to "when the d_type is anything other than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because the platform doesn't support d_type in dirents or some other reason) or DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be skipped. However, it will also be skipped if it is any other valid d_type (e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not handle these properly anyway, so we can safely constrain accepted types to directories and regular files. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:53:14 -07:00
Victoria Dye	aa79636fe7	dir.[ch]: add 'follow_symlink' arg to 'get_dtype' Add a 'follow_symlink' boolean option to 'get_type()'. If 'follow_symlink' is enabled, DT_LNK (in addition to DT_UNKNOWN) d_types triggers the stat-based d_type resolution, using 'stat' instead of 'lstat' to get the type of the followed symlink. Note that symlinks are not followed recursively, so a symlink pointing to another symlink will still resolve to DT_LNK. Update callers in 'diagnose.c' to specify 'follow_symlink = 0' to preserve current behavior. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:53:13 -07:00
Victoria Dye	6dc1004333	dir.[ch]: expose 'get_dtype' Move 'get_dtype()' from 'diagnose.c' to 'dir.c' and add its declaration to 'dir.h' so that it is accessible to callers in other files. The function and its documentation are moved verbatim except for a small addition to the description clarifying what the 'path' arg represents. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:53:13 -07:00
Victoria Dye	5305474ec4	ref-cache.c: fix prefix matching in ref iteration Update 'cache_ref_iterator_advance' to skip over refs that are not matched by the given prefix. Currently, a ref entry is considered "matched" if the entry name is fully contained within the prefix: * prefix: "refs/heads/v1" * entry: "refs/heads/v1.0" OR if the prefix is fully contained in the entry name: * prefix: "refs/heads/v1.0" * entry: "refs/heads/v1" The first case is always correct, but the second is only correct if the ref cache entry is a directory, for example: * prefix: "refs/heads/example" * entry: "refs/heads/" Modify the logic in 'cache_ref_iterator_advance' to reflect these expectations: 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and ref cache entry do not overlap at all. Skip this entry. 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches inside this entry if it is a directory. Skip if the entry is not a directory, otherwise iterate over it. 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating that the cache entry (directory or not) is fully contained by or equal to the prefix. Iterate over this entry. Note that condition 2 relies on the names of directory entries having the appropriate trailing slash. The existing function documentation of 'create_dir_entry' explicitly calls out the trailing slash requirement, so this is a safe assumption to make. This bug generally doesn't have any user-facing impact, since it requires: 1. using a non-empty prefix without a trailing slash in an iteration like 'for_each_fullref_in', 2. the callback to said iteration not reapplying the original filter (as for-each-ref does) to ensure unmatched refs are skipped, and 3. the repository having one or more refs that match part of, but not all of, the prefix. However, there are some niche scenarios that meet those criteria (specifically, 'rev-parse --bisect' and '(log\|show\|shortlog) --bisect'). Add tests covering those cases to demonstrate the fix in this patch. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:53:13 -07:00
John Cai	e95bafc52f	merge-ort: initialize repo in index state initialize_attr_index() does not initialize the repo member of attr_index. Starting in `44451a2e5e` (attr: teach "--attr-source=<tree>" global option to "git", 2023-05-06), this became a problem because istate->repo gets passed down the call chain starting in git_check_attr(). This gets passed all the way down to replace_refs_enabled(), which segfaults when accessing r->gitdir. Fix this by initializing the repository in the index state. Signed-off-by: John Cai <johncai86@gmail.com> Helped-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 14:42:02 -07:00
Sergey Organov	7c446ac790	completion: complete '--dd' '--dd' only makes sense for 'git log' and 'git show', so add it to __git_log_show_options which is referenced in the completion for these two commands. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:47:29 -07:00
Sergey Organov	c8e5cb0658	diff-merges: introduce '--dd' option This option provides a shortcut to request diff with respect to first parent for any kind of commit, universally. It's implemented as pure synonym for "--diff-merges=first-parent --patch". Gives user quick and universal way to see what changes, exactly, were brought to a branch by merges as well as by regular commits. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:47:29 -07:00
Sergey Organov	be3820c60c	diff-merges: improve --diff-merges documentation * Put descriptions of convenience shortcuts first, so they are the first things reader observes rather than lengthy detailed stuff. * Get rid of very long line containing all the --diff-merges formats by replacing them with <format>, and putting each supported format on its own line. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:47:29 -07:00
Štěpán Němec	cebfaaa333	doc/cat-file: make synopsis and description less confusing The DESCRIPTION's "first form" is actually the 1st, 2nd, 3rd and 5th form in SYNOPSIS, the "second form" is the 4th one. Interestingly, this state of affairs was introduced in `97fe725075` (cat-file docs: fix SYNOPSIS and "-h" output, 2021-12-28) with the claim of "Now the two will match again." ("the two" being DESCRIPTION and SYNOPSIS)... The description also suffers from other correctness and clarity issues, e.g., the "first form" paragraph discusses -p, -s and -t, but leaves out -e, which is included in the corresponding SYNOPSIS section; the second paragraph mentions <format>, which doesn't occur in SYNOPSIS at all, and of the three batch options, really only describes the behavior of --batch-check. Also the mention of "drivers" seems an implementation detail not adding much clarity in a short summary (and isn't expanded upon in the rest of the man page, either). Rather than trying to maintain one-to-one (or N-to-M) correspondence between the DESCRIPTION and SYNOPSIS forms, creating duplication and providing opportunities for error, shorten the former into a concise summary describing the two general modes of operation: batch and non-batch, leaving details to the subsequent manual sections. While here, fix a grammar error in the description of -e and make the following further minor improvements: NAME: shorten ("content or type and size" isn't the whole story; say "details" and leave the actual details to later sections) SYNOPSIS and --help: move the (--textconv \| --filters) form before --batch, closer to the other non-batch forms Signed-off-by: Štěpán Němec <stepnem@smrk.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:46:33 -07:00
谢致邦 (XIE Zhibang)	1627e6b4e4	doc: correct the 50 characters soft limit (+) The soft limit of the first line of the commit message should be "no more than 50 characters" or "50 characters or less", but not "less than 50 character". This is an addition to commit `c2c349a15c` (doc: correct the 50 characters soft limit, 2023-09-28). Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:07:26 -07:00
Elijah Newren	5fbcdb2082	documentation: add missing parenthesis Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:47 -07:00
Elijah Newren	798cddfa51	documentation: add missing quotes Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:47 -07:00
Elijah Newren	845c6ca90e	documentation: add missing fullstops Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:47 -07:00
Elijah Newren	4d542687fc	documentation: add some commas where they are helpful Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:44 -07:00
Elijah Newren	42bdb80a08	documentation: fix whitespace issues Get rid of extraneous whitespace, replace tab-after-fullstop with space, etc. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	2150b6fb47	documentation: fix capitalization Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	f4e1851a29	documentation: fix punctuation Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	9a9fd289cc	documentation: use clearer prepositions Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	0cac690e1a	documentation: add missing hyphens Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	f22fdf33af	documentation: remove unnecessary hyphens Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	0a4f051f93	documentation: add missing article Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	3771d00257	documentation: fix choice of article Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	03b3431e6a	documentation: whitespace is already generally plural Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	6cc668c0ab	documentation: fix singular vs. plural Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	401a4e257e	documentation: fix verb vs. noun Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	af181e4dbd	documentation: fix adjective vs. noun Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	5676b04a44	documentation: fix verb tense Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	7f7e6bbe06	documentation: employ consistent verb tense for a list Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	ce14cc0b00	documentation: fix subject/verb agreement Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	859a6d6045	documentation: remove extraneous words Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	8936352242	documentation: add missing words Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	dbe33c5ad0	documentation: fix apostrophe usage Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:29 -07:00
Elijah Newren	384f7d17d2	documentation: fix typos Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:06:24 -07:00
Elijah Newren	82e81edf71	documentation: fix small error Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:04:21 -07:00
Elijah Newren	cf6cac2005	documentation: wording improvements Diff best viewed with --color-diff. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:04:21 -07:00
Andy Koppe	2b09d16aba	pretty: fix ref filtering for %(decorate) formats Mark pretty formats containing "%(decorate" as requiring decoration in userformat_find_requirements(), same as "%d" and "%D". Without this, cmd_log_init_finish() didn't invoke load_ref_decorations() with the decoration_filter it puts together, and hence filtering options such as --decorate-refs were quietly ignored. Amend one of the %(decorate) checks in t4205-log-pretty-formats.sh to test this. Signed-off-by: Andy Koppe <andy.koppe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 11:25:13 -07:00
Jeff King	c1b754d059	repack: free existing_cruft array after use We allocate an array of packed_git pointers so that we can sort the list of cruft packs, but we never free the array, causing a small leak. Note that we don't need to free the packed_git structs themselves; they're owned by the repository object. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 10:27:34 -07:00
Junio C Hamano	ffbf6a748d	doc: update list archive reference to use lore.kernel.org No disrespect to other mailing list archives, but the local part of their URLs will become pretty much meaningless once the archives go out of service, and we learned the lesson hard way when $gmane stopped serving. Let's point into https://lore.kernel.org/ for an article that can be found there, because the local part of the URL has the Message-Id: that can be used to find the same message in other archives, even if lore goes down. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-06 16:46:59 -07:00
Jeff King	badf2fe1c3	daemon: free listen_addr before returning We build up a string list of listen addresses from the command-line arguments, but never free it. This causes t5811 to complain of a leak (though curiously it seems to do so only when compiled with gcc, not with clang). To handle this correctly, we have to do a little refactoring: - there are two exit points from the main function, depending on whether we are entering the main loop or serving a single client (since rather than a traditional fork model, we re-exec ourselves with the extra "--serve" argument to accommodate Windows). We don't need --listen at all in the --serve case, of course, but it is passed along by the parent daemon, which simply copies all of the command-line options it got. - we just "return serve()" to run the main loop, giving us no chance to do any cleanup So let's use a "ret" variable to store the return code, and give ourselves a single exit point at the end. That gives us one place to do cleanup. Note that this code also uses the "use a no-dup string-list, but allocate strings we add to it" trick, meaning string_list_clear() will not realize it should free them. We can fix this by switching to a "dup" string-list, but using the "append_nodup" function to add to it (this is preferable to tweaking the strdup_strings flag before clearing, as it puts all the subtle memory-ownership code together). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 14:54:58 -07:00
Jeff King	8ef8da4842	revision: clear decoration structs during release_revisions() The point of release_revisions() is to free memory associated with the rev_info struct, but we have several "struct decoration" members that are left untouched. Since the previous commit introduced a function to do that, we can just call it. We do have to provide some specialized callbacks to map the void pointers onto real ones (the alternative would be casting the existing function pointers; this generally works because "void *" is usually interchangeable with a struct pointer, but it is technically forbidden by the standard). Since the line-log code does not expose the type it stores in the decoration (nor of course the function to free it), I put this behind a generic line_log_free() entry point. It's possible we may need to add more line-log specific bits anyway (running t4211 shows a number of other leaks in the line-log code). While this doubtless cleans up many leaks triggered by the test suite, the only script which becomes leak-free is t4217, as it does very little beyond a simple traversal (its existing leak was from the use of --children, which is now fixed). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 14:54:57 -07:00
Jeff King	771868243c	decorate: add clear_decoration() function There's not currently any way to free the resources associated with a decoration struct. As a result, we have several memory leaks which cannot easily be plugged. Let's add a "clear" function and make use of it in the example code of t9004. This removes the only leak from that script, so we can mark it as passing the leak sanitizer. Curiously this leak is found only when running SANITIZE=leak with clang, but not with gcc. But it is a bog-standard leak: we allocate some memory in a local variable struct, and then exit main() without releasing it. I'm not sure why gcc doesn't find it. After this patch, both compilers report it as leak-free. Note that the clear function takes a callback to free the individual entries. That's not needed for our example (which is just decorating with ints), but will be for real callers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 14:54:55 -07:00
Taylor Blau	3c1e2c2113	builtin/repack.c: avoid making cruft packs preferred When doing a `--geometric` repack, we make sure that the preferred pack (if writing a MIDX) is the largest pack that we didn't repack. That has the effect of keeping the preferred pack in sync with the pack containing a majority of the repository's reachable objects. But if the repository happens to double in size, we'll repack everything. Here we don't specify any `--preferred-pack`, and instead let the MIDX code choose. In the past, that worked fine, since there would only be one pack to choose from: the one we just wrote. But it's no longer necessarily the case that there is one pack to choose from. It's possible that the repository also has a cruft pack, too. If the cruft pack happens to come earlier in lexical order (and has an earlier mtime than any non-cruft pack), we'll pick that pack as preferred. This makes it impossible to reuse chunks of the reachable pack verbatim from pack-objects, so is sub-optimal. Luckily, this is a somewhat rare circumstance to be in, since we would have to repack the entire repository during a `--geometric` repack, and the cruft pack would have to sort ahead of the pack we just created. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 13:26:11 -07:00
Taylor Blau	37dc6d8104	builtin/repack.c: implement support for `--max-cruft-size` Cruft packs are an alternative mechanism for storing a collection of unreachable objects whose mtimes are recent enough to avoid being pruned out of the repository. When cruft packs were first introduced back in `b757353676` (builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and `a7d493833f` (builtin/pack-objects.c: --cruft with expiration, 2022-05-20), the recommended workflow consisted of: - Repacking periodically, either by packing anything loose in the repository (via `git repack -d`) or producing a geometric sequence of packs (via `git repack --geometric=<d> -d`). - Every so often, splitting the repository into two packs, one cruft to store the unreachable objects, and another non-cruft pack to store the reachable objects. Repositories may (out of band with the above) choose periodically to prune out some unreachable objects which have aged out of the grace period by generating a pack with `--cruft-expiration=<approxidate>`. This allowed repositories to maintain relatively few packs on average, and quarantine unreachable objects together in a cruft pack, avoiding the pitfalls of holding unreachable objects as loose while they age out (for more, see some of the details in `3d89a8c118` (Documentation/technical: add cruft-packs.txt, 2022-05-20)). This all works, but can be costly from an I/O-perspective when frequently repacking a repository that has many unreachable objects. This problem is exacerbated when those unreachable objects are rarely (if every) pruned. Since there is at most one cruft pack in the above scheme, each time we update the cruft pack it must be rewritten from scratch. Because much of the pack is reused, this is a relatively inexpensive operation from a CPU-perspective, but is very costly in terms of I/O since we end up rewriting basically the same pack (plus any new unreachable objects that have entered the repository since the last time a cruft pack was generated). At the time, we decided against implementing more robust support for multiple cruft packs. This patch implements that support which we were lacking. Introduce a new option `--max-cruft-size` which allows repositories to accumulate cruft packs up to a given size, after which point a new generation of cruft packs can accumulate until it reaches the maximum size, and so on. To generate a new cruft pack, the process works like so: - Sort a list of any existing cruft packs in ascending order of pack size. - Starting from the beginning of the list, group cruft packs together while the accumulated size is smaller than the maximum specified pack size. - Combine the objects in these cruft packs together into a new cruft pack, along with any other unreachable objects which have since entered the repository. Once a cruft pack grows beyond the size specified via `--max-cruft-size` the pack is effectively frozen. This limits the I/O churn up to a quadratic function of the value specified by the `--max-cruft-size` option, instead of behaving quadratically in the number of total unreachable objects. When pruning unreachable objects, we bypass the new code paths which combine small cruft packs together, and instead start from scratch, passing in the appropriate `--max-pack-size` down to `pack-objects`, putting it in charge of keeping the resulting set of cruft packs sized correctly. This may seem like further I/O churn, but in practice it isn't so bad. We could prune old cruft packs for whom all or most objects are removed, and then generate a new cruft pack with just the remaining set of objects. But this additional complexity buys us relatively little, because most objects end up being pruned anyway, so the I/O churn is well contained. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 13:26:11 -07:00
Taylor Blau	b5b1f4c0ec	builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE The repack builtin takes a `--max-pack-size` command-line argument which it uses to feed into any of the pack-objects children that it may spawn when generating a new pack. This option is parsed with OPT_STRING, meaning that we'll accept anything as input, punting on more fine-grained validation until we get down into pack-objects. This is fine, but it's wasteful to spend an entire sub-process just to figure out that one of its option is bogus. Instead, parse the value of `--max-pack-size` with OPT_MAGNITUDE in 'git repack', and then pass the known-good result down to pack-objects. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 13:18:54 -07:00
Štěpán Němec	f0a39ba504	t/README: fix multi-prerequisite example With the broken quoting the test wouldn't even parse correctly, but there's also the '==' instead of POSIX '=' (of the shells I tested, busybox ash, bash and ksh (93 and OpenBSD) accept '==', dash and zsh do not), and 'print 2' from Python 2 days. (I assume the test failing due to 3 != 4 is intentional or immaterial.) Fixes: `93a5724613` ("test-lib: Add support for multiple test prerequisites") Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 12:55:38 -07:00
Štěpán Němec	72fac03522	doc/gitk: s/sticked/stuck/ The terminology was changed in `b0d12fc9b2` (Use the word 'stuck' instead of 'sticked'). Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 12:55:38 -07:00
Štěpán Němec	a62a7060a5	git-jump: admit to passing merge mode args to ls-files There's even an example of such usage in the README. Fixes: `67ba13e5a4` ("git-jump: pass "merge" arguments to ls-files") Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 12:55:38 -07:00
Štěpán Němec	043465a6cf	doc/diff-options: improve wording of the log.diffMerges mention Fix the grammar ("which default value is") and reword to match other similar descriptions (say "configuration variable" instead of "parameter", link to git-config(1)). Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 12:55:38 -07:00

... 4 5 6 7 8 ...

71557 Commits (36c9c44fa4b5c745b24a2e6444de20df9f4a1f5c) All Branches Search

71557 Commits (36c9c44fa4b5c745b24a2e6444de20df9f4a1f5c)

All Branches