kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	d61ff9c237	Merge branch 'ps/object-file-cleanup' into ps/object-store-cleanup * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"	2025-04-24 11:37:21 -07:00
Junio C Hamano	29860f3282	Merge branch 'ja/doc-reset-mv-rm-markup-updates' Doc mark-up updates. * ja/doc-reset-mv-rm-markup-updates: doc: add markup for characters in Guidelines doc: fix asciidoctor synopsis processing of triple-dots doc: convert git-mv to new documentation format doc: move synopsis git-mv commands in the synopsis section doc: convert git-rm to new documentation format doc: fix synopsis analysis logic doc: convert git-reset to new documentation format	2025-04-23 13:58:51 -07:00
Josh Heinrichs	eb2d7beb0e	maintenance: fix launchctl calendar intervals When using the launchctl scheduler, the weekly job runs daily, and the daily job runs on the first six days of each month. This appears to be due to specifying "Day" in the calendar intervals, which according to launchd.plist(5) is for specifying days of the month rather than days of the week. The behaviour of running a job on the 0th day is undocumented, but in my testing appears to be the same as not specifying "Day" in the calendar interval, in which case the job will run daily. Use "Weekday" in the calendar intervals, which is the correct way to schedule jobs to run on specific days of the week. Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com> Acked-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-23 12:58:52 -07:00
Ahelenia Ziemiańska	86eef3541e	global: mark usage strings and string tables const Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-21 21:01:19 -07:00
Usman Akinyemi	b502a648ef	builtin/difftool: remove unnecessary if statement Since we already teach the `repo_config()` in "f29f1990b5 (config: teach repo_config to allow `repo` to be NULL, 2025-03-08)" to allow `repo` to be NULL, no need to check if `repo` is NULL before calling `repo_config()`. Suggested-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-20 14:17:22 -07:00
Usman Akinyemi	2e4e439ec2	builtin/add: remove unnecessary if statement Since we already teach the `repo_config()` in "f29f1990b5 (config: teach repo_config to allow `repo` to be NULL, 2025-03-08)" to allow `repo` to be NULL, no need to check if `repo` is NULL before calling `repo_config()`. Suggested-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-20 14:17:20 -07:00
Junio C Hamano	72801dfde1	Merge branch 'ua/update-update-server-info' Code simplification. * ua/update-update-server-info: builtin/update-server-info: remove unnecessary if statement	2025-04-17 10:28:19 -07:00
Junio C Hamano	c3ebf18eb2	Merge branch 'en/merge-recursive-debug' Remove remnants of the recursive merge strategy backend, which was superseded by the ort merge strategy. * en/merge-recursive-debug: builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm merge-recursive.[ch]: thoroughly debug these merge, sequencer: switch recursive merges over to ort sequencer: switch non-recursive merges over to ort merge-ort: enable diff-algorithms other than histogram builtin/merge-recursive: switch to using merge_ort_generic() checkout: replace merge_trees() with merge_ort_nonrecursive()	2025-04-17 10:28:18 -07:00
Junio C Hamano	fe7ae3b87e	Merge branch 'kn/blame-porcelain-unblamable' "git blame --porcelain" mode now talks about unblamable lines and lines that are blamed to an ignored commit. * kn/blame-porcelain-unblamable: blame: print unblamable and ignored commits in porcelain mode	2025-04-17 10:28:18 -07:00
Junio C Hamano	b45113f581	Merge branch 'jk/fetch-follow-remote-head-fix' "git fetch [<remote>]" with only the configured fetch refspec should be the only thing to update refs/remotes/<remote>/HEAD, but the code was overly eager to do so in other cases. * jk/fetch-follow-remote-head-fix: fetch: make set_head() call easier to read fetch: don't ask for remote HEAD if followRemoteHEAD is "never" fetch: only respect followRemoteHEAD with configured refspecs	2025-04-17 10:28:17 -07:00
Patrick Steinhardt	791aeddfa2	parse-options: detect mismatches in integer signedness It was reported that "t5620-backfill.sh" fails on s390x and sparc64 in a test that exercises the "--min-batch-size" command line option. The symptom was that the option didn't seem to have an effect: we didn't fetch objects with a batch size of 20, but instead fetched all objects at once. As it turns out, the root cause is that `--min-batch-size` uses `OPT_INTEGER()` to parse the command line option. While this macro expects the caller to pass a pointer to an integer, we instead pass a pointer to a `size_t`. This coincidentally works on most platforms, but it breaks apart on the mentioned platforms because they are big endian. This issue isn't specific to git-backfill(1): there are a couple of other places where we have the same type confusion going on. This indicates that the issue really is the interface that the parse-options subsystem provides -- it is simply too easy to get this wrong as there isn't any kind of compiler warning, and things just work on the most common systems. Address the systemic issue by introducing two new build asserts `BARF_UNLESS_SIGNED()` and `BARF_UNLESS_UNSIGNED()`. As the names already hint at, those macros will cause a compiler error when passed a value that is not signed or unsigned, respectively. Adapt `OPT_INTEGER()`, `OPT_UNSIGNED()` as well as `OPT_MAGNITUDE()` to use those asserts. This uncovers a small set of sites where we indeed have the same bug as in git-backfill(1). Adapt all of them to use the correct option. Reported-by: Todd Zullinger <tmz@pobox.com> Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:16 -07:00
Patrick Steinhardt	09705696f7	parse-options: introduce precision handling for `OPTION_INTEGER` The `OPTION_INTEGER` option type accepts a signed integer. The type of the underlying integer is a simple `int`, which restricts the range of values accepted by such options. But there is a catch: because the caller provides a pointer to the value via the `.value` field, which is a simple void pointer. This has two consequences: - There is no check whether the passed value is sufficiently long to store the entire range of `int`. This can lead to integer wraparound in the best case and out-of-bounds writes in the worst case. - Even when a caller knows that they want to store a value larger than `INT_MAX` they don't have a way to do so. In practice this doesn't tend to be a huge issue because users typically don't end up passing huge values to most commands. But the parsing logic is demonstrably broken, and it is too easy to get the calling convention wrong. Improve the situation by introducing a new `precision` field into the structure. This field gets assigned automatically by `OPT_INTEGER_F()` and tracks the size of the passed value. Like this it becomes possible for the caller to pass arbitrarily-sized integers and the underlying logic knows to handle it correctly by doing range checks. Furthermore, convert the code to use `strtoimax()` intstead of `strtol()` so that we can also parse values larger than `LONG_MAX`. Note that we do not yet assert signedness of the passed variable, which is another source of bugs. This will be handled in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:15 -07:00
Patrick Steinhardt	785c17df78	parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()` With the preceding commit, `OPT_INTEGER()` has learned to support unit factors. Consequently, the major differencen between `OPT_INTEGER()` and `OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of them do support them now. Instead, the difference is that one handles signed and the other handles unsigned integers. Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to `OPT_UNSIGNED()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:15 -07:00
Patrick Steinhardt	d012ceb5f3	global: use designated initializers for options While we expose macros for most of our different option types understood by the "parse-options" subsystem, not every combination of fields that has one as that would otherwise quickly lead to an explosion of macros. Instead, we just initialize structures manually for those variants of fields that don't have a macro. Callsites that open-code these structure initialization don't use designated initializers though and instead just provide values for each of the fields that they want to initialize. This has three significant downsides: - Callsites need to specify all values up to the last field that they care about. This often includes fields that should simply be left at their default zero-initialized state, which adds distraction. - Any reader not deeply familiar with the layout of the structure has a hard time figuring out what the respective initializers mean. - Reordering or introducing new fields in the middle of the structure is impossible without adapting all callsites. Convert all sites to instead use designated initializers, which we have started using in our codebase quite a while ago. This allows us to skip any default-initialized fields, gives the reader context by specifying the field names and allows us to reorder or introduce new fields where we want to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:15 -07:00
Ramsay Jones	c9a51775a3	builtin/gc.c: correct RAM calculation when using sysinfo The man page for sysinfo(2) on Linux states that (from v2.3.48) the sizes of the memory and swap fields, of the returned structure, are given as multiples of 'mem_unit' bytes. In earlier versions (prior to v2.3.23 on i386 in particular), the 'mem_unit' field was not part of the structure, and all sizes were measured in bytes. The man page does not discuss the motivation for this change, but it is possible that the change was intended for the, relatively rare, 32-bit platform with more than 4GB of memory. The total_ram() function makes the assumption that the 'totalram' field of the 'struct sysinfo' is measured in bytes, or alternatively that the 'mem_unit' field is always equal to one. Having writen a program to call the sysinfo() function and print the structure fields, it seems that, on Linux x84_64 and i686 anyway, the 'mem_unit' field is indeed set to one (note that the 32-bit system had only 2GB ram). However, cygwin also has an sysinfo() implementation, which gives the following values: $ ./sysinfo uptime: 21381 loads: 0, 0, 0 total ram: 2074637 free ram: 843237 shared ram: 0 buffer ram: 0 total swap: 327680 free swap: 306932 procs: 15 total high: 0 free high: 0 mem_unit: 4096 total ram: 8497713152 $ [This laptop has 8GB ram, so a little bit seems to be missing. ;) ] Modify the total_ram() function to allow for the possibility that the memory size is not specified in bytes (ie 'mem_unit' is greater than one). Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-16 20:43:45 -07:00
Junio C Hamano	a271b05066	Merge branch 'ps/cat-file-filter-batch' "git cat-file --batch" and friends learned to allow "--filter=" to omit certain objects, just like the transport layer does. * ps/cat-file-filter-batch: builtin/cat-file: use bitmaps to efficiently filter by object type builtin/cat-file: deduplicate logic to iterate over all objects pack-bitmap: introduce function to check whether a pack is bitmapped pack-bitmap: add function to iterate over filtered bitmapped objects pack-bitmap: allow passing payloads to `show_reachable_fn()` builtin/cat-file: support "object:type=" objects filter builtin/cat-file: support "blob:limit=" objects filter builtin/cat-file: support "blob:none" objects filter builtin/cat-file: wire up an option to filter objects builtin/cat-file: introduce function to report object status builtin/cat-file: rename variable that tracks usage	2025-04-16 13:54:21 -07:00
Junio C Hamano	47478802da	Merge branch 'kn/non-transactional-batch-updates' Updating multiple references have only been possible in all-or-none fashion with transactions, but it can be more efficient to batch multiple updates even when some of them are allowed to fail in a best-effort manner. A new "best effort batches of updates" mode has been introduced. * kn/non-transactional-batch-updates: update-ref: add --batch-updates flag for stdin mode refs: support rejection in batch updates during F/D checks refs: implement batch reference update support refs: introduce enum-based transaction error types refs/reftable: extract code from the transaction preparation refs/files: remove duplicate duplicates check refs: move duplicate refname update check to generic layer refs/files: remove redundant check in split_symref_update()	2025-04-16 13:54:19 -07:00
Junio C Hamano	01a6e244f9	Merge branch 'ps/maintenance-reflog-expire' "git maintenance" learns a new task to expire reflog entries. * ps/maintenance-reflog-expire: builtin/maintenance: introduce "reflog-expire" task builtin/gc: split out function to expire reflog entries builtin/reflog: make functions regarding `reflog_expire_options` public builtin/reflog: stop storing per-reflog expiry dates globally builtin/reflog: stop storing default reflog expiry dates globally reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options`	2025-04-16 13:54:19 -07:00
Junio C Hamano	1a1661bd41	Merge branch 'jt/rev-list-z' "git rev-list" learns machine-parsable output format that delimits each field with NUL. * jt/rev-list-z: rev-list: support NUL-delimited --missing option rev-list: support NUL-delimited --boundary option rev-list: support delimiting objects with NUL bytes rev-list: refactor early option parsing rev-list: inline `show_object_with_name()` in `show_object()`	2025-04-16 13:54:18 -07:00
Junio C Hamano	743d3a54f2	Merge branch 'ab/rm-sign-compare' Some warnings from "-Wsign-compare" for builtin/rm.c have been squelched. * ab/rm-sign-compare: rm: fix sign comparison warnings	2025-04-16 13:54:17 -07:00
Junio C Hamano	518ed014f6	Merge branch 'jt/ref-transaction-abort-fix' A ref transaction corner case fix. * jt/ref-transaction-abort-fix: builtin/fetch: avoid aborting closed reference transaction	2025-04-16 13:54:17 -07:00
Junio C Hamano	7b03646f85	Merge branch 'js/comma-semicolon-confusion' Code clean-up. * js/comma-semicolon-confusion: detect-compiler: detect clang even if it found CUDA clang: warn when the comma operator is used compat/regex: explicitly mark intentional use of the comma operator wildmatch: avoid using of the comma operator diff-delta: avoid using the comma operator xdiff: avoid using the comma operator unnecessarily clar: avoid using the comma operator unnecessarily kwset: avoid using the comma operator unnecessarily rebase: avoid using the comma operator unnecessarily remote-curl: avoid using the comma operator unnecessarily	2025-04-15 13:50:17 -07:00
Junio C Hamano	a8c207797f	Merge branch 'jt/clone-guess-remote-head-fix' "git clone" still gave the message about the default branch name; this message has been turned into an advice message that can be turned off. * jt/clone-guess-remote-head-fix: advice: allow disabling default branch name advice builtin/clone: suppress unexpected default branch advice remote: allow `guess_remote_head()` to suppress advice	2025-04-15 13:50:16 -07:00
Junio C Hamano	d690c44846	Merge branch 'ds/maintenance-loose-objects-batchsize' The job to coalesce loose objects into packfiles in "git maintenance" now has configurable batch size. * ds/maintenance-loose-objects-batchsize: maintenance: add loose-objects.batchSize config maintenance: force progress/no-quiet to children	2025-04-15 13:50:16 -07:00
Junio C Hamano	03633a288c	Merge branch 'kn/reflog-drop' "git reflog" learns "drop" subcommand, that discards the entire reflog data for a ref. * kn/reflog-drop: reflog: implement subcommand to drop reflogs reflog: improve error for when reflog is not found	2025-04-15 13:50:15 -07:00
Junio C Hamano	ee847e0034	Merge branch 'ps/object-wo-the-repository' The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`	2025-04-15 13:50:15 -07:00
Patrick Steinhardt	68cd492a3e	object-store: merge "object-store-ll.h" and "object-store.h" The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:37 -07:00
Patrick Steinhardt	70c0f9db4e	object-file: split up concerns of `HASH_*` flags The functions `hash_object_file()`, `write_object_file()` and `index_fd()` reuse the same set of flags to alter their behaviour. This not only adds confusion, but given that every function only supports a subset of the flags it becomes very hard to see which flags can be passed to what function. Last but not least, this entangles the implementation of all three function families. Split up concerns by creating separate flags for each of the function families. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:36 -07:00
Patrick Steinhardt	d9f517d051	object-file: split out functions relating to object store subsystem While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:36 -07:00
Patrick Steinhardt	1a99fe8010	object-file: move `safe_create_leading_directories()` into "path.c" The `safe_create_leading_directories()` function and its relatives are located in "object-file.c", which is not a good fit as they provide generic functionality not related to objects at all. Move them into "path.c", which already hosts `safe_create_dir()` and its relative `safe_create_dir_in_gitdir()`. "path.c" is free of `the_repository`, but the moved functions depend on `the_repository` to read the "core.sharedRepository" config. Adapt the function signature to accept a repository as argument to fix the issue and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:35 -07:00
Patrick Steinhardt	d1fa670de0	object-file: move `mkdir_in_gitdir()` into "path.c" The `mkdir_in_gitdir()` function is similar to `safe_create_dir()`, but the former is hosted in "object-file.c" whereas the latter is hosted in "path.c". The latter code unit makes way more sense though as the logic has nothing to do with object files in particular. Move the file into "path.c". While at it, we: - Rename the function to `safe_create_dir_in_gitdir()` so that the function names are similar to one another. - Remove the dependency on `the_repository` by making the callers pass the repository instead. Adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:34 -07:00
Jean-Noël Avila	1d5378a8c4	doc: convert git-mv to new documentation format - Switch the synopsis to a synopsis block which will automatically format placeholders in italics and keywords in monospace - Use _<placeholder>_ instead of <placeholder> in the description - Use `backticks` for keywords and more complex option descriptions. The new rendering engine will apply synopsis rules to these spans. Unfortunately, there's an inconsistency in the synopsis style, where the ellipsis is used to indicate that the option can be repeated, but it can also be used in Git's three-dot notation to indicate a range of commits. The rendering engine will not be able to distinguish between these two cases. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-14 14:43:52 -07:00
Jean-Noël Avila	8d34d3379f	doc: move synopsis git-mv commands in the synopsis section This also entails changing the help output for the command to match the new synopsis. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-14 14:43:52 -07:00
Jeff King	f9356f9cb4	fetch: make set_head() call easier to read We ignore any error returned from set_head(), but `638060dcb9` (fetch set_head: refactor to use remote directly, 2025-01-26) left its call in a noop "if" conditional as a sort of note-to-self. When `c834d1a7ce` (fetch: only respect followRemoteHEAD with configured refspecs, 2025-03-18) added a "do_set_head" flag, it was rolled into the same conditional, putting set_head() on the right-hand side of a short-circuit AND. That's not wrong, but it really hides the point of the line, which is (maybe) calling the function. Instead, let's have a full if() block for the flag, and then our comment (with some rewording) will be sufficient to clarify the error handling. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-09 09:03:47 -07:00
Usman Akinyemi	9ec327d922	builtin/update-server-info: remove unnecessary if statement Since we already teach the `repo_config()` in `f29f1990` (config: teach repo_config to allow `repo` to be NULL, 2025-03-08) to allow `repo` to be NULL, no need to check if `repo` is NULL before calling `repo_config()`. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 14:47:37 -07:00
Junio C Hamano	0dfca98881	Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanup * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`	2025-04-08 14:28:17 -07:00
Elijah Newren	170e30d695	builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM This environment variable existed to allow the testsuite to reuse all the merge-related tests in the testsuite while easily flipping between the 'recursive' and the 'ort' backends. Now that we have removed merge-recursive and remapped 'recursive' to mean 'ort', we don't need this scaffolding anymore. Remove it from these three builtins. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 13:59:14 -07:00
Elijah Newren	75cd9ae05f	merge, sequencer: switch recursive merges over to ort More precisely, replace calls to merge_recursive() with merge_ort_recursive(). Also change t7615 to quit calling out recursive; it is not needed anymore, and we are in fact using ort now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 13:59:12 -07:00
Elijah Newren	77c029493a	builtin/merge-recursive: switch to using merge_ort_generic() Switch from merge-recursive to merge-ort. Adjust the following testcases due to the switch: * t6430: most of the test differences here were due to improved D/F conflict handling explained in more detail in `ef52778708` (merge tests: expect improved directory/file conflict handling in ort, 2020-10-26). These changes weren't made to this test back in that commit simply because I had been looking at `git merge` rather than `git merge-recursive`. The final test in this testsuite, though, was expunged because it was looking for specific output, and the calls to output_commit_title() were discarded from merge_ort_internal() in its adaptation from merge_recursive_internal(); see `8119214f4e` (merge-ort: implement merge_incore_recursive(), 2020-12-16). * t6434: This test is built entirely around rename/delete conflicts, which had a suboptimal handling under merge-recursive. As explained in more detail in commits `1f3c9ba707` ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and `727c75b23f` ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26), rename/delete conflicts should each have two entries in the index rather than just one. Adjust the expectations for all the tests in this testcase to see the two entries per rename/delete conflict. * t6424: merge-recursive had a special check-if-toplevel-trees-match check that it ran at the beginning on both the merge-base and the other side being merged in. In such a case, it exited early and printed an "Already up to date." message. merge-ort got rid of this, and instead checks the merge base tree matching the other side throughout the tree instead of just at the toplevel, allowing it to avoid recursing into various subtrees. As part of that, it got rid of the specialty toplevel message. That message hasn't been missed for years from `git merge`, so I don't think it is necessary to keep it just for `git merge-recursive`, especially since the latter is rarely used. (git itself only references it in the testsuite, whereas it used to power one of the three rebase backends that existed once upon a time.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 13:59:11 -07:00
Elijah Newren	b5dff2bd61	checkout: replace merge_trees() with merge_ort_nonrecursive() Replace the use of merge_trees() from merge-recursive.[ch] with the merge-ort equivalent. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 13:59:11 -07:00
Junio C Hamano	23ee5065c2	Merge branch 'tb/incremental-midx-part-2' Incrementally updating multi-pack index files. * tb/incremental-midx-part-2: midx: implement writing incremental MIDX bitmaps pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators pack-bitmap.c: keep track of each layer's type bitmaps ewah: implement `struct ewah_or_iterator` pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs pack-bitmap.c: compute disk-usage with incremental MIDXs pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs pack-bitmap.c: open and store incremental bitmap layers pack-revindex: prepare for incremental MIDX bitmaps Documentation: describe incremental MIDX bitmaps Documentation: remove a "future work" item from the MIDX docs	2025-04-08 11:43:14 -07:00
Junio C Hamano	c6b3824a19	Merge branch 'tb/refspec-fetch-cleanup' Code clean-up. * tb/refspec-fetch-cleanup: refspec: replace `refspec_item_init()` with fetch/push variants refspec: remove refspec_item_init_or_die() refspec: replace `refspec_init()` with fetch/push variants refspec: treat 'fetch' as a Boolean value	2025-04-08 11:43:13 -07:00
Karthik Nayak	221e8fcb7f	update-ref: add --batch-updates flag for stdin mode When updating multiple references through stdin, Git's update-ref command normally aborts the entire transaction if any single update fails. This atomic behavior prevents partial updates. Introduce a new batch update system, where the updates the performed together similar but individual updates are allowed to fail. Add a new `--batch-updates` flag that allows the transaction to continue even when individual reference updates fail. This flag can only be used in `--stdin` mode and builds upon the batch update support added to the refs subsystem in the previous commits. When enabled, failed updates are reported in the following format: rejected SP (<old-oid> \| <old-target>) SP (<new-oid> \| <new-target>) SP <rejection-reason> LF Update the documentation to reflect this change and also tests to cover different scenarios where an update could be rejected. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:59:49 -07:00
Karthik Nayak	76e760b999	refs: introduce enum-based transaction error types Replace preprocessor-defined transaction errors with a strongly-typed enum `ref_transaction_error`. This change: - Improves type safety and function signature clarity. - Makes error handling more explicit and discoverable. - Maintains existing error cases, while adding new error cases for common scenarios. This refactoring paves the way for more comprehensive error handling which we will utilize in the upcoming commits to add batch reference update support. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:57:20 -07:00
Patrick Steinhardt	8e0a1ec076	builtin/maintenance: introduce "reflog-expire" task By default, git-maintenance(1) uses the "gc" task to ensure that the repository is well-maintained. This can be changed, for example by either explicitly configuring which tasks should be enabled or by using the "incremental" maintenance strategy. If so, git-maintenance(1) does not know to expire reflog entries, which is a subtask that git-gc(1) knows to perform for the user. Consequently, the reflog will grow indefinitely unless the user manually trims it. Introduce a new "reflog-expire" task that plugs this gap: - When running the task directly, then we simply execute `git reflog expire --all`, which is the same as git-gc(1). - When running git-maintenance(1) with the `--auto` flag, then we only run the task in case the "HEAD" reflog has at least N reflog entries that would be discarded. By default, N is set to 100, but this can be configured via "maintenance.reflog-expire.auto". When a negative integer has been provided we always expire entries, zero causes us to never expire entries, and a positive value specifies how many entries need to exist before we consider pruning the entries. Note that the condition for the `--auto` flags is merely a heuristic and optimized for being fast. This is because `git maintenance run --auto` will be executed quite regularly, so scanning through all reflogs would likely be too expensive in many repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:27 -07:00
Patrick Steinhardt	3fef24ac3f	builtin/gc: split out function to expire reflog entries We're about to introduce a new task for git-maintenance(1) that knows to expire reflog entries. The logic will be shared with git-gc(1), which already knows how to do this. Pull out the common logic into a separate function so that we can share the implementation between both builtins. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:27 -07:00
Patrick Steinhardt	d20fc193b6	builtin/reflog: make functions regarding `reflog_expire_options` public Make functions that are required to manage `reflog_expire_options` available elsewhere by moving them into "reflog.c" and exposing them in the corresponding header. The functions will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:27 -07:00
Patrick Steinhardt	964f364de9	builtin/reflog: stop storing per-reflog expiry dates globally As described in the preceding commit, the per-reflog expiry dates are stored in a global pair of variables. Refactor the code so that they are contained in `struct reflog_expire_options` to make the structure useful in other contexts. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:26 -07:00
Patrick Steinhardt	8565827570	builtin/reflog: stop storing default reflog expiry dates globally When expiring reflog entries, it is possible to configure expiry dates that depend on the name of the reflog. This requires us to store a couple of different expiry dates: - The default expiry date for reflog entries that aren't otherwise specified. - The per-reflog expiry date. - The currently active set of expiry dates for a given reference. While the last item is stored in `struct reflog_expire_options`, the other items aren't, which makes it hard to reuse the structure in other places. Refactor the code so that the default expiry date is stored as part of the structure. The per-reflog expiry dates will be adapted accordingly in the subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:26 -07:00
Patrick Steinhardt	2ed8008399	reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options` We're about to expose `struct cmd_reflog_expire_cb` via "reflog.h" so that we can also use this structure in "builtin/gc.c". Once we make it accessible to a wider scope though it becomes awkwardly named, as it isn't only useful in the context of a callback. Instead, the function is containing all kinds of options relevant to whether or not a reflog entry should be expired. Rename the structure to `reflog_expire_options` to prepare for this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:25 -07:00
Karthik Nayak	4d253071dd	blame: print unblamable and ignored commits in porcelain mode The 'git-blame(1)' command allows users to ignore specific revisions via the '--ignore-rev <rev>' and '--ignore-revs-file <file>' flags. These flags are often combined with the 'blame.markIgnoredLines' and 'blame.markUnblamableLines' config options. These config options prefix ignored and unblamable lines with a '?' and '*', respectively. However, this option was never extended to the porcelain mode of 'git-blame(1)'. Since the documentation does not indicate this exclusion, it is a bug. Fix this by printing 'ignored' and 'unblamable' respectively for the options when using the porcelain modes. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Toon Claes <toon@iotcl.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:50:18 -07:00
Patrick Steinhardt	8002e8ee18	builtin/cat-file: use bitmaps to efficiently filter by object type While it is now possible to filter objects by type, this mechanism is for now mostly a convenience. Most importantly, we still have to iterate through the whole packfile to find all objects of a specific type. This can be prohibitively expensive depending on the size of the packfiles. It isn't really possible to do better than this when only considering a packfile itself, as the order of objects is not fixed. But when we have a packfile with a corresponding bitmap, either because the packfile itself has one or because the multi-pack index has a bitmap for it, then we can use these bitmaps to improve the runtime. While bitmaps are typically used to compute reachability of objects, they also contain one bitmap per object type that encodes which object has what type. So instead of reading through the whole packfile(s), we can use the bitmaps and iterate through the type-specific bitmap. Typically, only a subset of packfiles will have a bitmap. But this isn't really much of a problem: we can use bitmaps when available, and then use the non-bitmap walk for every packfile that isn't covered by one. Overall, this leads to quite a significant speedup depending on how many objects of a certain type exist. The following benchmarks have been executed in the Chromium repository, which has a 50GB packfile with almost 25 million objects. As expected, there isn't really much of a change in performance without an object filter: Benchmark 1: cat-file with no-filter (revision = HEAD~) Time (mean ± σ): 89.675 s ± 4.527 s [User: 40.807 s, System: 10.782 s] Range (min … max): 83.052 s … 96.084 s 10 runs Benchmark 2: cat-file with no-filter (revision = HEAD) Time (mean ± σ): 88.991 s ± 2.488 s [User: 42.278 s, System: 10.305 s] Range (min … max): 82.843 s … 91.271 s 10 runs Summary cat-file with no-filter (revision = HEAD) ran 1.01 ± 0.06 times faster than cat-file with no-filter (revision = HEAD~) We still have to scan through all objects as we yield all of them, so using the bitmap in this case doesn't really buy us anything. What is noticeable in this benchmark is that we're I/O-bound, not CPU-bound, as can be seen from the user/system runtimes, which combined are way lower than the overall benchmarked runtime. But when we do use a filter we can see a significant improvement: Benchmark 1: cat-file with filter=object:type=commit (revision = HEAD~) Time (mean ± σ): 86.444 s ± 4.081 s [User: 36.830 s, System: 11.312 s] Range (min … max): 80.305 s … 93.104 s 10 runs Benchmark 2: cat-file with filter=object:type=commit (revision = HEAD) Time (mean ± σ): 2.089 s ± 0.015 s [User: 1.872 s, System: 0.207 s] Range (min … max): 2.073 s … 2.119 s 10 runs Summary cat-file with filter=object:type=commit (revision = HEAD) ran 41.38 ± 1.98 times faster than cat-file with filter=object:type=commit (revision = HEAD~) This is because we don't have to scan through all packfiles anymore, but can instead directly look up relevant objects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:52 -07:00
Patrick Steinhardt	d5ec7027bc	builtin/cat-file: deduplicate logic to iterate over all objects Pull out a common function that allows us to iterate over all objects in a repository. Right now the logic is trivial and would only require two function calls, making this refactoring a bit pointless. But in the next commit we will iterate on this logic to make use of bitmaps, so this is about to become a bit more complex. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:52 -07:00
Patrick Steinhardt	3d45483846	pack-bitmap: allow passing payloads to `show_reachable_fn()` The `show_reachable_fn` callback is used by a couple of functions to present reachable objects to the caller. The function does not provide a way for the caller to pass a payload though, which is functionality that we'll require in a subsequent commit. Change the callback type to accept a payload and adapt all callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:51 -07:00
Patrick Steinhardt	8fa9fe171a	builtin/cat-file: support "object:type=" objects filter Implement support for the "object:type=" filter in git-cat-file(1), which causes us to omit all objects that don't match the provided object type. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:51 -07:00
Patrick Steinhardt	dbe1b32d59	builtin/cat-file: support "blob:limit=" objects filter Implement support for the "blob:limit=" filter in git-cat-file(1), which causes us to omit all blobs that are bigger than a certain size. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:50 -07:00
Patrick Steinhardt	3794e9bf98	builtin/cat-file: support "blob:none" objects filter Implement support for the "blob:none" filter in git-cat-file(1), which causes us to omit all blobs. Note that this new filter requires us to read the object type via `oid_object_info_extended()` in `batch_object_write()`. But as we try to optimize away reading objects from the database the `data->info.typep` pointer may not be set. We thus have to adapt the logic to conditionally set the pointer in cases where the filter is given. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:50 -07:00
Patrick Steinhardt	eb83e4c64b	builtin/cat-file: wire up an option to filter objects In batch mode, git-cat-file(1) enumerates all objects and prints them by iterating through both loose and packed objects. This works without considering their reachability at all, and consequently most options to filter objects as they exist in e.g. git-rev-list(1) are not applicable. In some situations it may still be useful though to filter objects based on properties that are inherent to them. This includes the object size as well as its type. Such a filter already exists in git-rev-list(1) with the `--filter=` command line option. While this option supports a couple of filters that are not applicable to our usecase, some of them are quite a neat fit. Wire up the filter as an option for git-cat-file(1). This allows us to reuse the same syntax as in git-rev-list(1) so that we don't have to reinvent the wheel. For now, we die when any of the filter options has been passed by the user, but they will be wired up in subsequent commits. Further note that the filters that we are about to introduce don't significantly speed up the runtime of git-cat-file(1). While we can skip emitting a lot of objects in case they are uninteresting to us, the majority of time is spent reading the packfile, which is bottlenecked by I/O and not the processor. This will change though once we start to make use of bitmaps, which will allow us to skip reading the whole packfile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:50 -07:00
Patrick Steinhardt	1914ae0d70	builtin/cat-file: introduce function to report object status We have multiple callsites that report the status of an object, for example when the objec tis missing or its name is ambiguous. We're about to add a couple more such callsites to report on "excluded" objects. Prepare for this by introducing a new function `report_object_status()` that encapsulates the functionality. Note that this function also flushes stdout, which is a requirement so that request-response style batched modes can learn about the status before proceeding to the next object. We already flush correctly at all existing callsites, even though the flush in `batch_one_object()` only comes after the switch statement. That flush is now redundant, and we could in theory deduplicate it by moving it into all branches that don't use `report_object_status()`. But that doesn't quite feel sensible: - The duplicate flush should ultimately just be a no-op for us and thus shouldn't impact performance significantly. - By keeping the flush in `report_object_status()` we ensure that all future callers get semantics correct. So let's just be pragmatic and live with the duplicated flush. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:49 -07:00
Patrick Steinhardt	84a1d0039a	builtin/cat-file: rename variable that tracks usage The usage strings for git-cat-file(1) that we pass to `parse_options()` and `usage_msg_optf()` are stored in a variable called `usage`. This variable shadows the declaration of `usage()`, which we'll want to use in a subsequent commit. Rename the variable to `builtin_catfile_usage`, which is in line with how the variable is typically called in other builtins. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:49 -07:00
Junio C Hamano	8a753b9a44	Merge branch 'jh/hash-init-fixes' An earlier code refactoring of the hash machinery missed a few required calls to init_fn. * jh/hash-init-fixes: index-pack, unpack-objects: restore missing ->init_fn	2025-04-07 14:23:18 -07:00
Junio C Hamano	58a8c38226	Merge branch 'tb/combine-cruft-below-size' "git repack" learned "--combine-cruft-below-size" option that controls how cruft-packs are combined. * tb/combine-cruft-below-size: repack: begin combining cruft packs with `--combine-cruft-below-size` repack: avoid combining cruft packs with `--max-cruft-size` t/t7704-repack-cruft.sh: consolidate `write_blob()` t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests	2025-04-07 14:23:18 -07:00
Junio C Hamano	477cc3b6c7	Merge branch 'jc/name-rev-stdin' Using "git name-rev --stdin" as an example, improve the framework to prepare tests to pretend to be in the future where the breaking changes have already happened. * jc/name-rev-stdin: name-rev: remove "--stdin" support t6120: further modernize t6120: avoid hiding "git" exit status t: introduce WITH_BREAKING_CHANGES prerequisite t: extend test_lazy_prereq t: document test_lazy_prereq	2025-04-07 14:23:17 -07:00
Arnav Bhate	d2fc29380a	rm: fix sign comparison warnings There are multiple places in loops, where a signed and an unsigned data type are compared. Git uses a mix of signed and unsigned types to store lengths of arrays. This sometimes leads to using a signed index for an array whose length is stored in an unsigned variable or vice versa. get_ours_cache_pos is a special case where i, though derived from a signed variable is never negative. Move this part to the caller side and make i an unsigned argument of the function. Rename i to pos to make it descriptive, now that it is a function argument. Replace signed data types with unsigned data types and vice versa wherever necessary. Where both signed and unsigned data types have been used, define a new variable in the scope of the for loop for use as the iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS. Signed-off-by: Arnav Bhate <bhatearnav@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-29 01:04:40 -07:00
Junio C Hamano	ff926a6d1b	Merge branch 'en/random-cleanups' Miscellaneous code clean-ups. * en/random-cleanups: merge-ort: remove extraneous word in comment merge-ort: fix accidental strset<->strintmap t7615: be more explicit about diff algorithm used t6423: fix a comment that accidentally reversed two commits stash: remove merge-recursive.h include	2025-03-29 16:39:10 +09:00
Junio C Hamano	27fe152e88	Merge branch 'tb/multi-cruft-pack-refresh-fix' Certain "cruft" objects would have never been refreshed when there are multiple cruft packs in the repository, which has been corrected. * tb/multi-cruft-pack-refresh-fix: builtin/pack-objects.c: freshen objects from existing cruft packs	2025-03-29 16:39:09 +09:00
Junio C Hamano	650b2e2fdb	Merge branch 'jk/fetch-ref-prefix-cleanup' In protocol v2 where the refs advertisement is constrained, we try to tell the server side not to limit the advertisement when there is no specific need to, which has been the source of confusion and recent bugs. Revamp the logic to simplify. * jk/fetch-ref-prefix-cleanup: fetch: use ref prefix list to skip ls-refs fetch: avoid ls-refs only to ask for HEAD symref update fetch: stop protecting additions to ref-prefix list fetch: ask server to advertise HEAD for config-less fetch refspec_ref_prefixes(): clean up refspec_item logic t5516: beef up exact-oid ref prefixes test t5516: drop NEEDSWORK about v2 reachability behavior t5516: prefer "oid" to "sha1" in some test titles t5702: fix typo in test name	2025-03-29 16:39:08 +09:00
Junio C Hamano	eb7923be1f	Merge branch 'en/merge-ort-prepare-to-remove-recursive' First step of deprecating and removing merge-recursive. * en/merge-ort-prepare-to-remove-recursive: am: switch from merge_recursive_generic() to merge_ort_generic() merge-ort: fix merge.directoryRenames=false t3650: document bug when directory renames are turned off merge-ort: support having merge verbosity be set to 0 merge-ort: allow rename detection to be disabled merge-ort: add new merge_ort_generic() function	2025-03-29 16:39:07 +09:00
Junio C Hamano	8d6413a1be	Merge branch 'ps/refname-avail-check-optim' The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`	2025-03-29 16:39:07 +09:00
Junio C Hamano	01d17c0530	Merge branch 'cc/signed-fast-export-import' "git fast-export \| git fast-import" learns to deal with commit and tag objects with embedded signatures a bit better. * cc/signed-fast-export-import: fast-export, fast-import: add support for signed-commits fast-export: do not modify memory from get_commit_buffer git-fast-export.adoc: clarify why 'verbatim' may not be a good idea fast-export: rename --signed-tags='warn' to 'warn-verbatim' fast-export: fix missing whitespace after switch git-fast-import.adoc: add missing LF in the BNF	2025-03-29 16:39:07 +09:00
Johannes Schindelin	38c696d66b	rebase: avoid using the comma operator unnecessarily The comma operator is a somewhat obscure C feature that is often used by mistake and can even cause unintentional code flow. Better use a semicolon instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-28 17:38:08 -07:00
Junio C Hamano	1a764cdbdc	Merge branch 'ua/some-builtins-wo-the-repository' A handful of built-in command implementations have been rewritten to use the repository instance supplied by git.c:run_builtin(), its caller. * ua/some-builtins-wo-the-repository: builtin/checkout-index: stop using `the_repository` builtin/for-each-ref: stop using `the_repository` builtin/ls-files: stop using `the_repository` builtin/pack-refs: stop using `the_repository` builtin/send-pack: stop using `the_repository` builtin/verify-commit: stop using `the_repository` builtin/verify-tag: stop using `the_repository` config: teach repo_config to allow `repo` to be NULL	2025-03-26 16:26:10 +09:00
Junio C Hamano	de35b7b3ff	Merge branch 'sj/ref-consistency-checks-more' "git fsck" becomes more careful when checking the refs. * sj/ref-consistency-checks-more: builtin/fsck: add `git refs verify` child process packed-backend: check whether the "packed-refs" is sorted packed-backend: add "packed-refs" entry consistency check packed-backend: check whether the refname contains NUL characters packed-backend: add "packed-refs" header consistency check packed-backend: check if header starts with "# pack-refs with: " packed-backend: check whether the "packed-refs" is regular file builtin/refs: get worktrees without reading head information t0602: use subshell to ensure working directory unchanged	2025-03-26 16:26:10 +09:00
Junio C Hamano	f50df872a4	Merge branch 'jt/diff-pairs' A post-processing filter for "diff --raw" output has been introduced. * jt/diff-pairs: builtin/diff-pairs: allow explicit diff queue flush builtin: introduce diff-pairs command diff: add option to skip resolving diff statuses diff: return diff_filepair from diff queue helpers	2025-03-26 16:26:09 +09:00
Justin Tobler	c039a46e99	builtin/clone: suppress unexpected default branch advice In `199f44cb2e` (builtin/clone: allow remote helpers to detect repo, 2024-02-27), clones started partially initializing the refdb before executing the remote helpers by creating a HEAD file and "refs/" directory. This has resulted in some scenarios where git-clone(1) now prints the default branch name advice message where it previously did not. A side-effect of the HEAD file already existing, is that computation of the default branch name is handled later in execution. This matters because prior to `97abaab5f6` (refs: drop `git_default_branch_name()`, 2024-05-17), the default branch value would be computed during its first execution and cached. Subsequent invocations would simply return the cached value. Since the next `git_default_branch_name()` call site, which is invoked through `guess_remote_head()`, is not configured to suppress the advice message, computing the default branch name results in the advice message being printed. Configure `guess_remote_head()` to suppress the advice message, restoring the previous behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-25 16:09:28 -07:00
Justin Tobler	d5d284df91	remote: allow `guess_remote_head()` to suppress advice The `repo_default_branch_name()` invoked through `guess_remote_head()` is configured to always display the default branch advice message. Adapt `guess_remote_head()` to accept flags and convert the `all` parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable suppression of advice messages. Call sites are updated accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-25 16:09:27 -07:00
Derrick Stolee	6540560fd6	maintenance: add loose-objects.batchSize config The 'loose-objects' task of 'git maintenance run' first deletes loose objects that exit within packfiles and then collects loose objects into a packfile. This second step uses an implicit limit of fifty thousand that cannot be modified by users. Add a new config option that allows this limit to be adjusted or ignored entirely. While creating tests for this option, I noticed that actually there was an off-by-one error due to the strict comparison in the limit check. I considered making the limit check turn true on equality, but instead I thought to use INT_MAX as a "no limit" barrier which should mean it's never possible to hit the limit. Thus, a new decrement to the limit is provided if the value is positive. (The restriction to positive values is to avoid underflow if INT_MIN is configured.) Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-23 23:06:01 -07:00
Derrick Stolee	286183da99	maintenance: force progress/no-quiet to children The --no-quiet option for 'git maintenance run' is supposed to indicate that progress should happen even while ignoring the value of isatty(2). However, Git implicitly asks child processes to check isatty(2) since these arguments are not passed through. The pass through of --no-quiet will be useful in a test in the next change. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-23 23:06:01 -07:00
Taylor Blau	27afc272c4	midx: implement writing incremental MIDX bitmaps Now that the pack-bitmap machinery has learned how to read and interact with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery (and relevant callers from within the MIDX machinery) to write such bitmaps. The details for doing so are mostly straightforward. The main changes are as follows: - find_object_pos() now makes use of an extra MIDX parameter which is used to locate the bit positions of objects which are from previous layers (and thus do not exist in the current layer's pack_order field). (Note also that the pack_order field is moved into struct write_midx_context to further simplify the callers for write_midx_bitmap()). - bitmap_writer_build_type_index() first determines how many objects precede the current bitmap layer and offsets the bits it sets in each respective type-level bitmap by that amount so they can be OR'd together. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 04:34:16 -07:00
Justin Tobler	b9fadeead7	builtin/fetch: avoid aborting closed reference transaction As part of the reference transaction commit phase, the transaction is set to a closed state regardless of whether it was successful of not. Attempting to abort a closed transaction via `ref_transaction_abort()` results in a `BUG()`. In `c92abe71df` (builtin/fetch: fix leaking transaction with `--atomic`, 2024-08-22), logic to free a transaction after the commit phase is moved to the centralized exit path. In cases where the transaction commit failed, this results in a closed transaction being aborted and signaling a bug. Free the transaction and set it to NULL when the commit fails. This allows the exit path to correctly handle the error without attempting to abort the transaction. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:59:46 -07:00
Taylor Blau	484d7adcda	repack: begin combining cruft packs with `--combine-cruft-below-size` The previous commit changed the behavior of repack's '--max-cruft-size' to specify a cruft pack-specific override for '--max-pack-size'. Introduce a new flag, '--combine-cruft-below-size' which is a replacement for the old behavior of '--max-cruft-size'. This new flag does explicitly what it says: it combines together cruft packs which are smaller than a given threshold, and leaves alone ones which are larger. This accomplishes the original intent of '--max-cruft-size', which was to avoid repacking cruft packs larger than the given threshold. The new behavior is slightly different. Instead of building up small packs together until the threshold is met, '--combine-cruft-below-size' packs up all cruft packs smaller than the threshold. This means that we may make a pack much larger than the given threshold (e.g., if you aggregate 5 packs which are each 99 MiB in size with a threshold of 100 MiB). But that's OK: the point isn't to restrict the size of the cruft packs we generate, it's to avoid working with ones that have already grown too large. If repositories still want to limit the size of the generated cruft pack(s), they may use '--max-cruft-size'. There's some minor test fallout as a result of the slight differences in behavior between the old meaning of '--max-cruft-size' and the behavior of '--combine-cruft-below-size'. In the test which is now called "--combine-cruft-below-size combines packs", we need to use the new flag over the old one to exercise that test's intended behavior. The remainder of the changes there are to improve the clarity of the comments. Suggested-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:42:07 -07:00
Taylor Blau	0855ed966c	repack: avoid combining cruft packs with `--max-cruft-size` In `37dc6d8104` (builtin/repack.c: implement support for `--max-cruft-size`, 2023-10-02), we exposed new functionality that allowed repositories to specify the behavior of when we should combine multiple cruft packs together. This feature was designed to ensure that we never repacked cruft packs which were larger than the given threshold in order to provide tighter I/O bounds for repositories that have many unreachable objects. In essence, specifying '--max-cruft-size=N' instructed 'repack' to aggregate cruft packs together (in order of ascending size) until the combine size grows past 'N', and then make a new cruft pack whose contents includes the packs we rolled up. But this isn't quite how it works in practice. Suppose for example that we have two cruft packs which are each 100MiB in size. One might expect specifying "--max-cruft-size=200M" would combine these two packs together, and then avoid repacking them until a pruning GC takes place. In reality, 'repack' would try and aggregate these together, but writing a pack that is strictly smaller than 200 MiB (since pack-objects' "--max-pack-size" provides a strict bound for packs containing more than one object). So instead we'll write out a pack that is, say, 199 MiB in size, and then another 1 MiB pack containing the balance. If we later repack the repository without adding any new unreachable objects, we'll repeat the same exercise again, making the same 199 MiB and 1 MiB packs each time. This happens because of a poor choice to bolt the '--max-cruft-size' functionality onto pack-objects' '--max-pack-size', forcing us to generate packs which are always smaller than the provided threshold and thus subject to repacking. The following commit will introduce a new flag that implements something similar to the behavior above. Let's prepare for that by making repack's '--max-cruft-size' flag behave as an cruft pack-specific override for '--max-pack-size'. Do so by temporarily repurposing the 'collapse_small_cruft_packs()' function to instead generate a cruft pack using the same instructions as if we didn't specify any maximum pack size. The calling code looks something like: if (args->max_pack_size && !cruft_expiration) { collapse_small_cruft_packs(in, args->max_pack_size, existing); } else { for_each_string_list_item(item, &existing->non_kept_packs) fprintf(in, "-%s.pack\n", item->string); for_each_string_list_item(item, &existing->cruft_packs) fprintf(in, "-%s.pack\n", item->string); } This patch makes collapse_small_cruft_packs() behave identically to the 'else' arm of the conditional above. This repurposing of 'collapse_small_cruft_packs()' is intentional, since it will set us up nicely to introduce the new behavior in the following commit. Naturally, there is some test fallout in the test which exercises the old meaning of '--max-cruft-size'. Mark that test as failing for now to be dealt with in the following commit. Likewise, add a new test which explicitly tests the behavior of '--max-cruft-size' to place a hard limit on the size of any generated cruft pack(s). Note that this is a breaking change, as it alters the user-visible behavior of '--max-cruft-size'. But I'm OK changing this behavior in this instance, since the behavior wasn't accurate to begin with. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:42:07 -07:00
Justin Tobler	340e7523c0	rev-list: support NUL-delimited --missing option The `--missing={print,print-info}` option for git-rev-list(1) prints missing objects found while performing the object walk in the form: $ git rev-list --missing=print-info <rev> ?<oid> [SP <token>=<value>]... LF Add support for printing missing objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> <oid> NUL missing=yes NUL [<token>=<value> NUL]... In this mode, values containing special characters or spaces are printed as-is without being escaped or quoted. Instead of prefixing the missing OID with '?', a separate `missing=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:03 -07:00
Justin Tobler	1c3c1ab3d2	rev-list: support NUL-delimited --boundary option The `--boundary` option for git-rev-list(1) prints boundary objects found while performing the object walk in the form: $ git rev-list --boundary <rev> -<oid> LF Add support for printing boundary objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --boundary <rev> <oid> NUL boundary=yes NUL In this mode, instead of prefixing the boundary OID with '-', a separate `boundary=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Justin Tobler	c3d59c2e70	rev-list: support delimiting objects with NUL bytes When walking objects, git-rev-list(1) prints each object entry on a separate line. Some options, such as `--objects`, may print additional information about tree and blob object on the same line in the form: $ git rev-list --objects <rev> <tree/blob oid> SP [<path>] LF Note that in this form the SP is appended regardless of whether the tree or blob object has path information available. Paths containing a newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output to use NUL-delimiters between objects and associated info in the following form: $ git rev-list -z --objects <rev> <oid> NUL [path=<path> NUL] In this form, the start of each record is signaled by an OID entry that is all hexidecimal and does not contain any '='. Additional path info from `--objects` is appended to the record as a token/value pair `path=<path>` as-is without any truncation. For now, the `--objects` flag is the only options that can be used in combination with `-z`. In a subsequent commit, NUL-delimited support for other options is added. Other options that do not make sense when used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Justin Tobler	c9907a1916	rev-list: refactor early option parsing Before invoking `setup_revisions()`, the `--missing` and `--exclude-promisor-objects` options are parsed early. In a subsequent commit, another option is added that must be parsed early. Refactor the code to parse both options in a single early pass. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Justin Tobler	1481e29112	rev-list: inline `show_object_with_name()` in `show_object()` The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Taylor Blau	459e54b549	refspec: replace `refspec_item_init()` with fetch/push variants For similar reasons as in the previous refactoring of `refspec_init()` into `refspec_init_fetch()` and `refspec_init_push()`, apply the same refactoring to `refspec_item_init()`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:45:16 -07:00
Taylor Blau	ec6829e484	refspec: remove refspec_item_init_or_die() There are two callers of this function, which ensures that a dispatched call to refspec_item_init() does not fail. In the following commit, we're going to add fetch/push-specific variants of refspec_item_init(), which will turn one function into two. To avoid introducing yet another pair of new functions (such as refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()), let's remove the thin wrapper entirely. This duplicates a single line of code among two callers, but thins the refspec.h API by one function, and prevents introducing two more in the following commit. Note that we still have a trailing Boolean argument in the function `refspec_item_init()`. The following commit will address this. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:45:16 -07:00
Taylor Blau	3809633d0a	refspec: treat 'fetch' as a Boolean value Since `6d4c057859` (refspec: introduce struct refspec, 2018-05-16), we have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly suggests that we might introduce other modes in the future, which, while possible, is highly unlikely. But these values are treated as a Boolean, and stored in a struct field called 'fetch'. So the following: if (refspec->fetch == REFSPEC_FETCH) { ... } , and if (refspec->fetch) { ... } are equivalent. Let's avoid renaming the Boolean values "true" and "false" here and remove the two REFSPEC_ macros mentioned above. Since this value is truly a Boolean and will only ever take on a value of 0 or 1, we can declare it as a single bit unsigned field. In practice this won't shrink the size of 'struct refspec', but it more clearly indicates the intent. Note that this introduces some awkwardness like: refspec_item_init_or_die(&spec, refspec, 1); , where it's unclear what the final "1" does. This will be addressed in the following commits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:45:15 -07:00
Jensen Huang	d39f04b638	index-pack, unpack-objects: restore missing ->init_fn Commit `0578f1e66a` ("global: adapt callers to use generic hash context helpers") accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1. This fixes the following error on fetch: fatal: fetch-pack: invalid index-pack output Signed-off-by: Jensen Huang <hmz007@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 12:27:33 -07:00
Jeff King	aab0f899d9	fetch: don't ask for remote HEAD if followRemoteHEAD is "never" When we are going to consider updating the refs/remotes/*/HEAD symref, we have to ask the remote side where its HEAD points. But if we know that the feature is disabled by config, we don't need to bother! This saves a little bit of work and network communication for the server. And even a little bit of effort on the client, as our local set_head() function did a bit of work matching the remote HEAD before realizing that we're not going to do anything with it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 12:21:26 -07:00
Jeff King	c834d1a7ce	fetch: only respect followRemoteHEAD with configured refspecs The new followRemoteHEAD feature is triggered for almost every fetch, causing us to ask the server about the remote "HEAD" and to consider updating our local tracking HEAD symref. This patch limits the feature only to the case when we are fetching a remote using its configured refspecs (typically into its refs/remotes/ hierarchy). There are two reasons for this. One is efficiency. E.g., the fixes in `6c915c3f85` (fetch: do not ask for HEAD unnecessarily, 2024-12-06) and `20010b8c20` (fetch: avoid ls-refs only to ask for HEAD symref update, 2025-03-08) were aimed at reducing the work we do when we would not be able to update HEAD anyway. But they do not quite cover all cases. The remaining one is: git fetch origin refs/heads/foo:refs/remotes/origin/foo which _sometimes_ can update HEAD, but usually not. And that leads us to the second point, which is being simple and explainable. The code for updating the tracking HEAD symref requires both that we learned which ref the remote HEAD points at, and that the server advertised that ref to us. But because the v2 protocol narrows the server's advertisement, the command above would not typically update HEAD at all, unless it happened to point to the "foo" branch. Or even weirder, it probably _would_ update if the server is very old and supports only the v0 protocol, which always gives a full advertisement. This creates confusing behavior for the user: sometimes we may try to update HEAD and sometimes not, depending on vague rules. One option here would be to loosen the update code to accept the remote HEAD even if the server did not advertise that ref. I think that could work, but it may also lead to interesting corner cases (e.g., creating a dangling symref locally, even though the branch is not unborn on the server, if we happen not to have fetched it). So let's instead simplify the rules: we'll only consider updating the tracking HEAD symref when we're doing a full fetch of the remote's configured refs. This is easy to implement; we can just set a flag at the moment we realize we're using the configured refspecs. And we can drop the special case code added by `6c915c3f85` and `20010b8c20`, since this covers those cases. The existing tests from those commits still pass. In t5505, an incidental call to "git fetch <remote> <refspec>" updated HEAD, which caused us to adjust the test in `3f763ddf28` (fetch: set remote/HEAD if it does not exist, 2024-11-22). We can now adjust that back to how it was before the feature was added. Even though t5505 is incidentally testing our new desired behavior, we'll add an explicit test in t5510 to make sure it is covered. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 12:21:25 -07:00
Elijah Newren	947e219fb6	am: switch from merge_recursive_generic() to merge_ort_generic() Switch from merge-recursive to merge-ort. Adjust the following testcases due to the switch: * t4151: This test left an untracked file in the way of the merge. merge-recursive could only sometimes tell when untracked files were in the way, and by the time it discovers others, it has already made too many changes to back out of the merge. So, instead of writing the results to e.g. 'file1' it would instead write them to 'file1~branch1'. This is confusing for users, because they might not notice 'file1~branch1' and accidentally add and commit 'file1'. In contrast, merge-ort correctly notices the file in the way before making any changes and aborts. Since this test didn't care about the file in the way, just remove it before calling git-am. * t4255: Usage of merge-ort allows us to change two known failures into successes. * t6427: As noted a few commits ago, the choice of conflict label for diff3 markers for the ancestor commit was previously handled by merge-recursive.c rather than by callers. Since that has now changed, `git am` needs to specify that label. Although the previous conflict label ("constructed merge base") was already fairly somewhat slanted towards `git am`, let's use wording more along the lines of the related command-line flag from `git apply` and function involved to tie it more closely to `git am`. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 09:49:08 -07:00
Junio C Hamano	77f32ba430	Merge branch 'tb/multi-cruft-pack-refresh-fix' into tb/combine-cruft-below-size * tb/multi-cruft-pack-refresh-fix: builtin/pack-objects.c: freshen objects from existing cruft packs	2025-03-17 17:00:38 -07:00
Karthik Nayak	d1270689a1	reflog: implement subcommand to drop reflogs While 'git-reflog(1)' currently allows users to expire reflogs and delete individual entries, it lacks functionality to completely remove reflogs for specific references. This becomes problematic in repositories where reflogs are not needed but continue to accumulate entries despite setting 'core.logAllRefUpdates=false'. Add a new 'drop' subcommand to git-reflog that allows users to delete the entire reflog for a specified reference. Include an '--all' flag to enable dropping all reflogs from all worktrees and an addon flag '--single-worktree', to only drop all reflogs from the current worktree. While here, remove an extraneous newline in the file. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-17 16:58:11 -07:00
Karthik Nayak	52f2dfb084	reflog: improve error for when reflog is not found The 'git reflog expire' prints the error message '<ref> points nowhere!' when used with a non-existent ref. This message is a bit confusing and vague. Modify the message to be more clear and direct. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-17 16:58:11 -07:00
Elijah Newren	e40eefba02	stash: remove merge-recursive.h include stash was modified to use merge_ort_nonrecursive() instead of merge_recursive_generic() back in commit `874cf2a604` (stash: apply stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the inclusion of merge-recursive.h unnecessary. In preparation for the removal of merge-recursive.h, remove the unnecessary include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-17 15:39:03 -07:00
Taylor Blau	08f612ba70	builtin/pack-objects.c: freshen objects from existing cruft packs Once an object is written into a cruft pack, we can only freshen it by writing a new loose or packed copy of that object with a more recent mtime. Prior to `61568efa95` (builtin/pack-objects.c: support `--max-pack-size` with `--cruft`, 2023-08-28), we typically had at most one cruft pack in a repository at any given time. So freshening unreachable objects was straightforward when already rewriting the cruft pack (and its .mtimes file). But `61568efa95` changes things: 'pack-objects' now supports writing multiple cruft packs when invoked with `--cruft` and the `--max-pack-size` flag. Cruft packs are rewritten until they reach some size threshold, at which point they are considered "frozen", and will only be modified in a pruning GC, or if the threshold itself is adjusted. Prior to this patch, however, this process breaks down when we attempt to freshen an object packed in an earlier cruft pack, and that cruft pack is larger than the threshold and thus will survive the repack. When this is the case, it is impossible to freshen objects in cruft pack(s) when those cruft packs are larger than the threshold. This is because we would avoid writing them in the new cruft pack entirely, for a couple of reasons. 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()' we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping over the packs we're going to retain (which are marked as kept in-core by 'read_cruft_objects()'). This means that we will avoid enumerating additional packed copies of objects found in any cruft packs which are larger than the given size threshold. Thus there is no opportunity to call 'create_object_entry()' whatsoever. 2. We likewise will discard the loose copy (if one exists) of any unreachable object packed in a cruft pack that is larger than the threshold. Here our call path is 'add_unreachable_loose_objects()', which uses the 'add_loose_object()' callback. That function will eventually land us in 'want_object_in_pack()' (via 'add_cruft_object_entry()'), and we'll discard the object as it appears in one of the packs which we marked as kept in-core. This means in effect that it is impossible to freshen an unreachable object once it appears in a cruft pack larger than the given threshold. Instead, we should pack an additional copy of an unreachable object we want to freshen even if it appears in a cruft pack, provided that the cruft copy has an mtime which is before the mtime of the copy we are trying to pack/freshen. This is sub-optimal in the sense that it requires keeping an additional copy of unreachable objects upon freshening, but we don't have a better alternative without the ability to make in-place modifications to existing .mtimes files. In order to implement this, we have to adjust the behavior of 'want_found_object()'. When 'pack-objects' is told that we're not going to retain any cruft packs (i.e. the set of packs marked as kept in-core does not contain a cruft pack), the behavior is unchanged. But when there is at least one cruft pack that we're holding onto, it is no longer sufficient to reject a copy of an object found in that cruft pack for that reason alone. In this case, we only want to reject a candidate object when copies of that object either: - exists in a non-cruft pack that we are retaining, regardless of that pack's mtime, or - exists in a cruft pack with an mtime at least as recent as the copy we are debating whether or not to pack, in which case freshening would be redundant. To do this, keep track of whether or not we have any cruft packs in our in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft' flag. When we end up in this new special case, we replace a call to 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject objects when we have a copy in an existing cruft pack with at least as recent an mtime as our candidate (in which case "freshening" would be redundant). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-13 11:48:04 -07:00
Patrick Steinhardt	cec2b6f55a	refs/iterator: separate lifecycle from iteration The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-12 11:31:18 -07:00

1 2 3 4 5 ...

12753 Commits (v2.50.1)