kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Phillip Wood	b103881d4f	midx repack: avoid integer overflow on 32 bit systems On a 32 bit system "git multi-pack-index --repack --batch-size=120M" failed with fatal: size_t overflow: 6038786 * 1289 The calculation to estimated size of the objects in the pack referenced by the multi-pack-index uses st_mult() to multiply the pack size by the number of referenced objects before dividing by the total number of objects in the pack. As size_t is 32 bits on 32 bit systems this calculation easily overflows. Fix this by using 64bit arithmetic instead. Also fix a potential overflow when caluculating the total size of the objects referenced by the multipack index with a batch size larger than SIZE_MAX / 2. In that case total_size += estimated_size can overflow as both total_size and estimated_size can be greater that SIZE_MAX / 2. This is addressed by using saturating arithmetic for the addition. Although estimated_size is of type uint64_t by the time we reach this sum it is bounded by the batch size which is of type size_t and so casting estimated_size to size_t does not truncate the value. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-22 14:48:36 -07:00
Johannes Sixt	bfb0fa7099	Merge branch 'top-panel-search-highlight' of github.com:bnfour/gitk * 'top-panel-search-highlight' of github.com:bnfour/gitk: gitk: do not hard-code color of search results in commit list Signed-off-by: Johannes Sixt <j6t@kdbg.org>	2025-05-22 19:15:31 +02:00
Alex Mironov	2e60aabc75	name-hash: don't add sparse directories in threaded lazy init Ensure that logic added in `5f11669586` (name-hash: don't add directories to name_hash, 2021-04-12) also applies in multithreaded hashtable init path. As per the original single-threaded change above: sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-21 14:51:08 -07:00
Karthik Nayak	368d8c86f7	t: remove unexpected SANITIZE_LEAK variables As of `1fc7ddf35b` (test-lib: unconditionally enable leak checking, 2024-11-20), both the `GIT_TEST_PASSING_SANITIZE_LEAK` and `TEST_PASSES_SANITIZE_LEAK` variables no longer have any meaning, the leak checks are enabled by default. However, some newly added tests include them by mistake. Let's clean this up. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 15:09:33 -07:00
Justin Tobler	68cb0b5253	builtin/receive-pack: add option to skip connectivity check During git-receive-pack(1), connectivity of the object graph is validated to ensure that the received packfile does not leave the repository in a broken state. This is done via git-rev-list(1) and walking the objects, which can be expensive for large repositories. Generally, this check is critical to avoid an incomplete received packfile from corrupting a repository. Server operators may have additional knowledge though around exactly how Git is being used on the server-side which can be used to facilitate more efficient connectivity computation of incoming objects. For example, if it can be ensured that all objects in a repository are connected and do not depend on any missing objects, the connectivity of newly written objects can be checked by walking the object graph containing only the new objects from the updated tips and identifying the missing objects which represent the boundary between the new objects and the repository. These boundary objects can be checked in the canonical repository to ensure the new objects connect as expected and thus avoid walking the rest of the object graph. Git itself cannot make the guarantees required for such an optimization as it is possible for a repository to contain an unreachable object that references a missing object without the repository being considered corrupt. Introduce the --skip-connectivity-check option for git-receive-pack(1) which bypasses this connectivity check to give more control to the server-side. Note that without proper server-side validation of newly received objects handled outside of Git, usage of this option risks corrupting a repository. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 11:43:36 -07:00
Justin Tobler	95262afe78	t5410: test receive-pack connectivity check As part of git-recieve-pack(1), the connectivity of objects is checked. Add a test validating that git-receive-pack(1) fails due to an incoming packfile that would leave the repository with missing objects. Instead of creating a new test file, "t5410" is generalized for receive-pack testing. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 11:43:36 -07:00
Johannes Sixt	9d60ba03d6	Merge branch 'yh/fix-non-themed-combobox' * yh/fix-non-themed-combobox: gitk: Legacy widgets doesn't have combobox	2025-05-20 19:42:52 +02:00
Junio C Hamano	8613c2bb6c	The sixteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 16:02:48 -07:00
Junio C Hamano	90eedabbf7	Merge branch 'ps/reftable-read-block-perffix' Performance regression in not-yet-released code has been corrected. * ps/reftable-read-block-perffix: reftable: fix perf regression when reading blocks of unwanted type	2025-05-19 16:02:48 -07:00
Junio C Hamano	2b3303166b	Merge branch 'ly/reftable-writer-leakfix' Leakfix. * ly/reftable-writer-leakfix: reftable/writer: fix memory leak when `writer_index_hash()` fails reftable/writer: fix memory leak when `padded_write()` fails	2025-05-19 16:02:47 -07:00
Junio C Hamano	a9dcacbf2a	Merge branch 'jk/oidmap-cleanup' Code cleanup. * jk/oidmap-cleanup: raw_object_store: drop extra pointer to replace_map oidmap: add size function oidmap: rename oidmap_free() to oidmap_clear()	2025-05-19 16:02:47 -07:00
Junio C Hamano	9af978fa04	Merge branch 'rc/t1001-test-path-is-file' Test update. * rc/t1001-test-path-is-file: t1001: replace 'test -f' with 'test_path_is_file'	2025-05-19 16:02:47 -07:00
Junio C Hamano	6660b42929	Merge branch 'ly/am-split-stgit-leakfix' Leakfix. * ly/am-split-stgit-leakfix: builtin/am: fix memory leak in `split_mail_stgit_series`	2025-05-19 16:02:46 -07:00
Junio C Hamano	effbd42255	Merge branch 'bc/make-avoid-unneeded-rebuild-with-compdb-dir' Build performance fix. * bc/make-avoid-unneeded-rebuild-with-compdb-dir: Makefile: avoid constant rebuilds with compilation database	2025-05-19 16:02:46 -07:00
Junio C Hamano	ae0b60e009	Merge branch 'ag/doc-send-email' The `send-email` documentation has been updated with OAuth2.0 related examples. * ag/doc-send-email: docs: add credential helper for outlook and gmail in OAuth list of helpers docs: improve send-email documentation send-mail: improve checks for valid_fqdn	2025-05-19 16:02:45 -07:00
Junio C Hamano	4bb72548fc	Merge branch 'sc/bundle-uri-use-all-refs-in-bundle' Bundle-URI feature did not use refs recorded in the bundle other than normal branches as anchoring points to optimize the follow-up fetch during "git clone"; now it is told to utilize all. * sc/bundle-uri-use-all-refs-in-bundle: bundle-uri: add test for bundle-uri clones with tags bundle-uri: copy all bundle references ino the refs/bundle space	2025-05-19 16:02:45 -07:00
Junio C Hamano	0b8d22fd40	Merge branch 'pw/sequencer-reflog-use-after-free' Use-after-free fix in the sequencer. * pw/sequencer-reflog-use-after-free: sequencer: rework reflog message handling sequencer: move reflog message functions	2025-05-19 16:02:44 -07:00
Ramsay Jones	187ce0222f	configure.ac: upgrade to a compilation check for sysinfo Commit `f5e3c6c57d` ("meson: do a full usage-based compile check for sysinfo", 2025-04-25) updated the 'sysinfo()' check, as part of the meson build, due to the failure of the check on Solaris. Prior to that commit, the meson build only checked the availability of the '<sys/sysinfo.h>' header file. On Solaris, both the header and the 'sysinfo()' function exist, but are completely unrelated to the same function on Linux (and cygwin). Commit `50dec7c566` ("config.mak.uname: add sysinfo() configuration for cygwin", 2025-04-17) added a similar 'sysinfo()' check to the autoconf build. This check looked for the 'sysinfo()' function itself, rather than just the header, but it will fail (incorrectly set HAVE_SYSINFO) for the same reason. In order to correctly identify the 'sysinfo()' function we require as part of 'git-gc' (used in the 'total_ram() function), we also upgrade to a compilation check, in a similar way to the meson commit. Note that since commit `c9a51775a3` ("builtin/gc.c: correct RAM calculation when using sysinfo", 2025-04-17) both the 'totalram' and 'mem_unit' fields of the 'struct sysinfo' are used, so the new check includes both of those fields in the compile check. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:34:00 -07:00
Ramsay Jones	837f637cf5	meson.build: correct setting of GIT_EXEC_PATH For the non-'runtime prefix' case, the meson build sets the GIT_EXEC_PATH build variable to an absolute path equivalent to <prefix>/libexec/git-core. In comparison, the default make build sets it to a relative path equivalent to 'libexec/git-core'. Indeed, the make build requires the use of some means outside of the Makefile (eg. config.mak[.*] or the command-line) to set GIT_EXEC_PATH to anything other than 'libexec/git-core'. For example, the make invocation: $ make gitexecdir=/some/other/bin all install will build git with GIT_EXEC_PATH set to '/some/other/bin' and install the 'library' executables to that location. However, without setting the 'gitexecdir' make variable, irrespective of the 'runtime prefix' setting, the GIT_EXEC_PATH is always set to 'libexec/git-core'. The meson built-in 'libexecdir' option can be used to provide a similar configurability. The default value for the option is 'libexec'. Attempting to set the option to '' on the command-line, will reset it to the '.' string, presumably to ensure a relative path value. This commit allows the meson build, similar to the above, to configure the project like: $ meson setup --buildtype=debugoptimized -Dprefix=$HOME -Dpcre2=disabled \ -Dlibexecdir=/some/other/bin build so that the GIT_EXEC_PATH is set to '/some/other/bin'. Absent the -Dlibexecdir argument, the GIT_EXEC_PATH is set to 'libexec/git-core'. In order to correct the value of GIT_EXEC_PATH, default the value to the static string value 'libexec/git-core', and only override if the value of the 'libexecdir' option has a value different to 'libexec' or '.'. Also, like the Makefile, add a check for an absolute path when the runtime prefix option is true (and if so, error out). Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:34:00 -07:00
Ramsay Jones	46a626c389	meson: correct path to system config/attribute files The path to the system-wide config and attributes files are not being set correctly in the meson build. Unless explicitly overridden on the command line during setup, the 'gitconfig' and 'gitattributes' options are defaulting to absolute paths in the '/etc' system directory. This is only appropriate if the <prefix> is set specifically to '/usr'. The directory in which these files are placed is generally referred to as the 'system configuration directory' or 'sysconfdir' for short. When the prefix is '/usr' then the sysconfdir is usually set to '/etc', but any other value for prefix results in the relative directory value 'etc' instead. (eg if prefix is '/usr/local', then the 'etc' relative value results in a system configuration directory of '/usr/local/etc'). When setting the 'sysconfdir' builtin option value, the meson system uses exactly this algorithm, so we can use get_option('sysconfdir') directly when setting the (non-overridden) build variables. In order to allow for overriding from the command line, remove the default values specified for the 'gitconfig' and 'gitattributes' options in the 'meson_options.txt' file. This allows the user to specify any pathname for those options, while being able to test for the unset (empty) value. An absolute pathname will be used unchanged and a relative pathname will be appended to '<prefix>/'. These values are then used to set the 'ETC_GITCONFIG' and 'ETC_GITATTRIBUTES' build variables which are, in turn, passed to the compiler as '-D' arguments. When the 'gitconfig' or 'gitattributes' options are not used, then use the built-in 'sysconfdir' and set the ETC_GITCONFIG build variable to the string "<sysconfdir>/gitconfig". Similarly, set ETC_ATTRIBUTES to "<sysconfdir>/gitattributes". Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:34:00 -07:00
Ramsay Jones	bdb38432f3	meson: correct install location of YAML.pm When executing an 'meson install' the YAML.pm file is incorrectly placed in the <prefix>/share/perl5/Git/SVN directory. The YAML.pm file should be placed in a 'Memoize' subdirectory instead. In order to correct the location, update the 'install_dir' of the relevant target in the 'perl/Git/SVN/Memoize/meson.build' file. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:34:00 -07:00
Ramsay Jones	f783b3fe74	meson.build: quote the GITWEBDIR build configuration The build configuration options with (non-empty) values, for example filesystem paths potentially containing spaces, have been set using the '.set_quoted()' method. However, the GITWEBDIR value has been set using the '.set()' method instead. In order to correctly quote the GITWEBDIR value, replace the '.set()' method with '.set_quoted()'. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:33:59 -07:00
Eli Schwartz	cddcee7f64	meson: reformat default options to workaround bug in `meson configure` Since `13cb20fc46` ("meson: fix compilation with Visual Studio", 2025-01-22) it has not been possible to list build options via `meson configure`. This is due to Meson's static analysis of build options failing to handle constant folding, and thinking we set a totally invalid default `-std=`. This is reported upstream but we anyways need to work with existing versions. It turns out there is a simple solution: turn the entire default option into a conditional branch, which means Meson sees either nothing, or everything. As a result, Git users can once again see pretty-printed options before building. Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Bug: https://github.com/mesonbuild/meson/issues/14623 Signed-off-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:32:27 -07:00
K Jayatheerth	7649d316ce	docs: replace git_config to repo_config Since this document was written, the built-in API has been updated a few times, but the document was left stale. Adjust to the current best practices by calling repo_config() on the repository instance the subcommand implementation receives as a parameter, instead of calling git_config() that used to be the common practice. Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 10:53:12 -07:00
K Jayatheerth	a1dcf6b289	docs: clarify cmd_psuh signature and explain UNUSED macro The sample program, as written, would no longer build for at least two reasons: - Since this document was first written, the convention to call a subcommand implementation has changed, and cmd_psuh() now needs to accept the fourth parameter, repository. - These days, compiler warning options for developers include one that detects and complains about unused parameters, so ones that are deliberately unused have to be marked as such. Update the old-style examples to adjust to the current practices, with explanations as needed. Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 10:52:24 -07:00
K Jayatheerth	3749b8a795	docs: remove unused mentoring mailing list reference The git-mentoring group was initially created to help newcomers with their development itches. However, in practice, most of their questions were already being addressed directly on the mailing list, and contributors consistently received helpful responses there. Remove the mentoring group details from the Documentation. Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 10:51:19 -07:00
Elijah Newren	29d7bf1951	merge-tree: add a new --quiet flag Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new --quiet flag which will make use of the new mergeability_only flag added to merge-ort in the previous commit. This option allows the merge machinery to, in the outer layer of the merge: * exit early when a conflict is detected * avoid writing (most) merged blobs/trees to the object store Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 15:09:14 -07:00
Elijah Newren	c6d5ca10e3	merge-ort: add a new mergeability_only option Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new mergeability_only option. This option allows the merge machinery to, in the "outer layer" of the merge: * exit upon first[-ish] conflict * avoid (not prevent) writing merged blobs/trees to the object store I have a number of qualifiers there, so let me explain each: "outer layer": Note that since the recursive merge of merge bases (corresponding to call_depth > 0) can conflict without the outer final merge (corresponding to call_depth == 0) conflicting, we can't short-circuit nor avoid writing merged blobs/trees to the object store during those inner merges. "first-ish conflict": The current patch only exits early from process_entries() on the first conflict it detects, but conflicts could have been detected in a previous function call, namely detect_and_process_renames(). However: * conflicts detected by detect_and_process_renames() are quite rare conflict types * the detection would still come after regular rename detection (which is the expensive part of detect_and_process_renames()), so it is not saving us much in computation time given that process_entries() directly follows detect_and_process_renames() * [this overlaps with the next bullet point] process_entries() is the place where virtually all object writing occurs (object writing is sometimes more of a concern for Forges than computation time), so exiting early here isn't saving us much in object writes either * the code changes needed to handle an earlier exit are slightly more invasive in detect_and_process_renames() than for process_entries(). Given the rareness of the even earlier conflicts, the limited savings we'd get from exiting even earlier, and in an attempt to keep this patch simpler, we don't guarantee that we actually exit on the first conflict detected. We can always revisit this decision later if we decide that a further micro-optimization to exit slightly earlier in rare cases is worthwhile. "avoid (not prevent) writing objects": The detect_and_process_renames() call can also write objects to the object store, when rename/rename conflicts involve one (or more) files that have also been modified on both sides. Because of this alternate call path leading to handle_content_merges(), our "early exit" does not prevent writing objects entirely, even within the "outer layer" (i.e. even within call_depth == 0). I figure that's fine though, since we're already writing objects for the inner merges (i.e. for call_depth > 0), which are likely going to represent vastly more objects than files involved in rename/rename+modify/modify cases in the outer merge, on average. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 15:09:14 -07:00
Elijah Newren	e42667241d	sequencer: make it clearer that commit descriptions are just comments Every once in a while, users report that editing the commit summaries in the todo list does not get reflected in the rebase operation, suggesting that users are (a) only using one-line commit messages, and (b) not understanding that the commit summaries are merely helpful comments to help them find the right hashes. It may be difficult to correct users' poor commit messages, but we can at least try to make it clearer that the commit summaries are not directives of some sort by inserting a comment character. Hopefully that leads to them looking a little further and noticing the hints at the bottom to use 'reword' or 'edit' directives. Yes, this change may look funny at first since it hardcodes '#' rather than using comment_line_str. However: * comment_line_str exists to allow disambiguation between lines in a commit message and lines that are instructions to users editing the commit message. No such disambiguation is needed for these comments that occur on the same line after existing directives * the exact "comment" character(s) on regular pick lines used aren't actually important; I could have used anything, including completely random variable length text for each line and it'd work because we ignore everything after 'pick' and the hash. * The whole point of this change is to signal to users that they should NOT be editing any part of the line after the hash (and if they do so, their edits will be ignored), while the whole point of comment_line_str is to allow highly flexible editing. So making it more general by using comment_line_str actually feels counterproductive. * The character for merge directives absolutely must be '#'; that has been deeply hardcoded for a long time (see below), and will break if some other comment character is used instead. In a desire to have pick and merge directives be similar, I use the same comment character for both. * Perhaps merge directives could be fixed to not be inflexible about the comment character used, if someone feels highly motivated, but I think that should be done in a separate follow-on patch. Here are (some of?) the locations where '#' has already been hardcoded for a long time for merges: 1) In check_label_or_ref_arg(): case TODO_LABEL: /* * '#' is not a valid label as the merge command uses it to * separate merge parents from the commit subject. / 2) In do_merge(): / * For octopus merges, the arg starts with the list of revisions to be * merged. The list is optionally followed by '#' and the oneline. / merge_arg_len = oneline_offset = arg_len; for (p = arg; p - arg < arg_len; p += strspn(p, " \t\n")) { if (!p) break; if (p == '#' && (!p[1] \|\| isspace(p[1]))) { 3) In label_oid(): if ((buf->len == the_hash_algo->hexsz && !get_oid_hex(label, &dummy)) \|\| (buf->len == 1 && label == '#') \|\| hashmap_get_from_hash(&state->labels, strihash(label), label)) { /* * If the label already exists, or if the label is a * valid full OID, or the label is a '#' (which we use * as a separator between merge heads and oneline), we * append a dash and a number to make it unique. */ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:28:27 -07:00
Derrick Stolee	ecf9ba20e3	p2000: add performance test for patch-mode commands The previous three changes contributed performance improvements to 'git apply', 'git add -p', and 'git reset -p' when using a sparse index. The improvement to 'git apply' also improved 'git checkout -p'. Add performance tests to demonstrate this (and to help validate that performance remains good in the future). In the truncated test output below, we see that the full checkout performance changes within noise expectations, but the sparse index cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for 'git reset -p'. 'git checkout -p' improves immediatley by 91% because it does not need any change to its builtin. Test HEAD~4 HEAD~3 HEAD~2 HEAD~1 ------------------------------------------------------------------------------------- 2000.118: ... git add -p (full-v3) 0.79 0.79 +0.0% 0.82 +3.8% 0.82 +3.8% 2000.119: ... git add -p (full-v4) 0.74 0.76 +2.7% 0.74 +0.0% 0.76 +2.7% 2000.120: ... git add -p (sparse-v3) 1.94 1.28 -34.0% 0.07 -96.4% 0.07 -96.4% 2000.121: ... git add -p (sparse-v4) 1.93 1.28 -33.7% 0.06 -96.9% 0.06 -96.9% 2000.122: ... git checkout -p (full-v3) 1.18 1.18 +0.0% 1.18 +0.0% 1.19 +0.8% 2000.123: ... git checkout -p (full-v4) 1.10 1.12 +1.8% 1.11 +0.9% 1.11 +0.9% 2000.124: ... git checkout -p (sparse-v3) 1.31 0.11 -91.6% 0.11 -91.6% 0.11 -91.6% 2000.125: ... git checkout -p (sparse-v4) 1.29 0.11 -91.5% 0.11 -91.5% 0.11 -91.5% 2000.126: ... git reset -p (full-v3) 0.81 0.80 -1.2% 0.83 +2.5% 0.83 +2.5% 2000.127: ... git reset -p (full-v4) 0.78 0.77 -1.3% 0.77 -1.3% 0.78 +0.0% 2000.128: ... git reset -p (sparse-v3) 1.58 0.92 -41.8% 0.91 -42.4% 0.07 -95.6% 2000.129: ... git reset -p (sparse-v4) 1.58 0.92 -41.8% 0.92 -41.8% 0.07 -95.6% It is worth noting that if our test was more involved and had multiple hunks to evaluate, then the time spent in 'git apply' would dominate due to multiple index loads and writes. As it stands, we need the sparse index improvement in 'git add -p' itself to confirm this performance improvement. Since the change for 'git add -i' is identical, we avoid a second test case for that similar operation. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:02:47 -07:00
Derrick Stolee	efab7dc1f4	reset: integrate sparse index with --patch Similar to the previous change for 'git add -p', the reset builtin checked for integration with the sparse index after possibly redirecting its logic toward the interactive logic. This means that the builtin would expand the sparse index to a full one upon read. Move this check earlier within cmd_reset() to improve performance here. Add tests to guarantee that we are not universally expanding the index. Add behavior tests to check that we are doing the same operations as a full index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:02:47 -07:00
Derrick Stolee	02ed8555f6	git add: make -p/-i aware of sparse index It is slow to expand a sparse index in-memory due to parsing of trees. We aim to minimize that performance cost when possible. 'git add -p' uses 'git apply' child processes to modify the index, but still there are some expansions that occur. It turns out that control flows out of cmd_add() in the interactive cases before the lines that confirm that the builtin is integrated with the sparse index. Moving that integration point earlier in cmd_add() allows 'git add -i' and 'git add -p' to operate without expanding a sparse index to a full one. Add test cases that confirm that these interactive add options work with the sparse index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:01:51 -07:00
Derrick Stolee	952de281fe	apply: integrate with the sparse index The sparse index allows storing directory entries in the index, marked with the skip-wortkree bit and pointing to a tree object. This may be an unexpected data shape for some implementation areas, so we are rolling it out incrementally on a builtin-per-builtin basis. This change enables the sparse index for 'git apply'. The main motivation for this change is that 'git apply' is used as a child process of 'git add -p' and expanding the sparse index for each of those child processes can lead to significant performance issues. The good news is that the actual index manipulation code used by 'git apply' is already integrated with the sparse index, so the only product change is to mark the builtin as allowing the sparse index so it isn't inflated on read. The more involved part of this change is around adding tests that verify how 'git apply' behaves in a sparse-checkout environment and whether or not the index expands in certain operations. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:00:33 -07:00
Moumita Dhar	ea8a71b40d	userdiff: extend Bash pattern to cover more shell function forms The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 11:52:41 -07:00
Jeff King	141f8c8c05	object-file: drop support for writing objects with unknown types Since "hash-object --literally" no longer supports objects with unknown types, there are now no callers of write_object_file_literally() and its helpers. Let's drop them to simplify the code. In particular, this gets rid of some ugly copy-and-paste code from write_object_file_literally(), which is a parallel implementation of write_object_file(). When the split was originally made, the two weren't that long, but commits like `63a6745a07` (object-file: update the loose object map when writing loose objects, 2023-10-01) ended up having to duplicate some tricky code. This patch drops all of that duplication and should make things less error-prone going forward. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:12 -07:00
Jeff King	f710fd7b49	hash-object: handle --literally with OPT_NEGBIT Since we recently removed the hash_literally() function, the hash-object --literally option has been simplified to just removing the INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool, we can just have the option parser remove the bit from the set of flags directly. This simplifies the helper functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	931e5ca507	hash-object: merge HASH_* and INDEX_* flags The hash-object command has its own custom flag bits that it sets based on command-line options. But since we dropped hash_literally() in the previous commit, the only thing we do with those flag bits is convert them directly into "index_flags" to pass to index_fd(). This extra layer of indirection makes the code harder to read and reason about. Let's just use the INDEX_* flags directly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	65a6a79b42	hash-object: stop allowing unknown types When passed the "--literally" option, hash-object will allow any arbitrary string for its "-t" type option. Such objects are only useful for testing or debugging, as they cannot be used in the normal way (e.g., you cannot fetch their contents!). Let's drop this feature, which will eventually let us simplify the object-writing code. This is technically backwards incompatible, but since such objects were never really functional, it seems unlikely that anybody will notice. We will retain the --literally flag, as it also instructs hash-object not to worry about other format issues (e.g., type-specific things that fsck would complain about). The documentation does not need to be updated, as it was always vague about which checks we're loosening (it uses only the phrase "any garbage"). The code change is a bit hard to verify from just the patch text. We can drop our local hash_literally() helper, but it was really just wrapping write_object_file_literally(). We now replace that with calling index_fd(), as we do for the non-literal code path, but dropping the INDEX_FORMAT_CHECK flag. This ends up being the same semantically as what the _literally() code path was doing (modulo handling unknown types, which is our goal). We'll be able to clean up these code paths a bit more in subsequent patches. The existing test is flipped to show that we now reject the unknown type. The additional "extra-long type" test is now redundant, as we bail early upon seeing a bogus type. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	b5643b60ac	t: add lib-loose.sh This commit adds a shell library for writing raw loose objects into the object database. Normally this is done with hash-object, but the specific intent here is to allow broken objects that hash-object may not support. We'll convert several cases that use "hash-object --literally" to write objects with invalid types. That works currently, but dropping this dependency will allow us to remove that feature and simplify the object-writing code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	f2ed511a2f	t/helper: add zlib test-tool It's occasionally useful when testing or debugging to be able to do raw zlib inflate/deflate operations (e.g., to check the bytes of a specific loose or packed object). Even though zlib's deflate algorithm is used by many other programs, this is surprisingly hard to do in a portable way. E.g., gzip can do this if you manually munge some header bytes. But the result is somewhat arcane, and we don't assume gzip is available anyway. Likewise, pigz will handle raw zlib, but we can't assume it is available. So let's introduce a short test helper for just doing zlib operations. We'll use it in subsequent patches to add some new tests, but it would also have come in handy a few times in the past: - The hard-coded pack data from `3b910d0c5e` (add tests for indexing packs with delta cycles, 2013-08-23) could probably be generated on the fly. - Likewise we could avoid the hard-coded data from `0b1493c2d4` (git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT, 2025-02-25). Though note this would require support for more zlib options. - It would have helped with the debugging documented in `41dfbb2dbe` (howto: add article on recovering a corrupted object, 2013-10-25). I'll leave refactoring existing tests for another day, but I hope the examples above show the general utility. I aimed for simplicity in the code. In particular, it will read all input into a memory buffer, rather than streaming. That makes the zlib loops harder to get wrong (which has been a source of subtle bugs in the past). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	d2956385a9	oid_object_info(): drop type_name strbuf We provide a mechanism for callers to get the object type as a raw string, rather than an object_type enum. This was in theory useful for returning types that are not representable in the enum, but we consider any such type to be an error, and there are no callers that use the strbuf anymore. Let's drop support to simplify the code a bit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	4ae0e9423c	fsck: stop using object_info->type_name strbuf When fsck-ing a loose object, we use object_info's type_name strbuf to record the parsed object type as a string. For most objects this is redundant with the object_type enum, but it does let us report the string when we encounter an object with an unknown type (for which there is no matching enum value). There are a few downsides, though: 1. The code to report these cases is not actually robust. Since we did not pass a strbuf to unpack_loose_header(), we only retrieved types from headers up to 32 bytes. In longer cases, we'd simply say "object corrupt or missing". 2. This is the last caller that uses object_info's type_name strbuf support. It would be nice to refactor it so that we can simplify that code. 3. Likewise, we'll check the hash of the object using its unknown type (again, as long as that type is short enough). That depends on the hash_object_file_literally() code, which we'd eventually like to get rid of. So we can simplify things by bailing immediately in read_loose_object() when we encounter an unknown type. This has a few user-visible effects: a. Instead of producing a single line of error output like this: error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 we'll now issue two lines (the first from read_loose_object() when we see the unparsable header, and the second from the fsck code, since we couldn't read the object): error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 This is a little more verbose, but this sort of error should be rare (such objects are almost impossible to work with, and cannot be transferred between repositories as they are not representable in packfiles). And as a bonus, reporting the broken header in full could help with debugging other cases (e.g., a header like "blob xyzzy\0" would fail in parsing the size, but previously we'd not have showed the offending bytes). b. An object with an unknown type will be reported as corrupt, without actually doing a hash check. Again, I think this is unlikely to matter in practice since such objects are totally unusable. We'll update one fsck test to match the new error strings. And we can remove another test that covered the case of an object with an unknown type _and_ a hash corruption. Since we'll skip the hash check now in this case, the test is no longer interesting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	b32b434bfe	oid_object_info_convert(): stop using string for object type In oid_object_info_convert(), we convert objects between their sha1 and sha256 variants. To do this, we naturally need to know the type, which we get from oid_object_info_extended() using its type_name strbuf option. But getting the value as a string (versus an object_type enum) is not helpful. Since we do not allow unknown types, the regular enum is sufficient. And the resulting code is a bit simpler, as we no longer have to manage the extra allocation nor convert the string to an enum ourselves. Note that at first glance, it might seem like we should retain the error check for "type == -1" to catch bogus types found by the underlying parser. But we don't need it, as an unknown type would have yielded an error from the call to oid_object_info_extended(), which would already have caused us to return an error. In fact, I suspect this was always impossible to trigger. Even when we were converting the string to a type enum ourselves, an invalid type would never have escaped oid_object_info_extended(), since we never passed the (now removed) OBJECT_INFO_ALLOW_UNKNOWN_TYPE option. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	aac2abeca7	cat-file: use type enum instead of buffer for -t option Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is no need to pass a strbuf into oid_object_info_extended() to record the type. The regular object_type enum is sufficient to capture all of the types we will allow. This simplifies the code a bit, and will eventually let us drop object_info's type_name strbuf support. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	ae24b032a0	object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag Since cat-file dropped its "--allow-unknown-type" option in the previous commit, there are no more uses of the internal flag that implemented it. Let's drop it. That in turn lets us drop the strbuf parameter of unpack_loose_header(), which now is always NULL. And without that, we can drop all of the additional code to inflate larger headers into the strbuf. Arguably we could drop ULHR_TOO_LONG, as no callers really care about the distinction from ULHR_BAD. But it's easy enough to retain, and it does let us produce a slightly more specific message in one instance. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	f227fc7d43	cat-file: make --allow-unknown-type a noop The cat-file command has some minor support for handling objects with "unknown" types. I.e., strings that are not "blob", "commit", "tree", or "tag". In theory this could be used for debugging or experimenting with extensions to Git. But in practice this support is not very useful: 1. You can get the type and size of such objects, but nothing else. Not even the contents! 2. Only loose objects are supported, since packfiles use numeric ids for the types, rather than strings. 3. Likewise you cannot ever transfer objects between repositories, because they cannot be represented in the packfiles used for the on-the-wire protocol. The support for these unknown types complicates the object-parsing code, and has led to bugs such as `b748ddb7a4` (unpack_loose_header(): fix infinite loop on broken zlib input, 2025-02-25). So let's drop it. The first step is to remove the user-facing parts, which are accessible only via cat-file. This is technically backwards-incompatible, but given the limitations listed above, these objects couldn't possibly be useful in any workflow. However, we can't just rip out the option entirely. That would hurt a caller who ran: git cat-file -t --allow-unknown-object <oid> and fed it normal, well-formed objects. There --allow-unknown-type was doing nothing, but we wouldn't want to start bailing with an error. So to protect any such callers, we'll retain --allow-unknown-type as a noop. The code change is fairly small (but we'll able to clean up more code in follow-on patches). The test updates drop any use of the option. We still retain tests that feed the broken objects to cat-file without --allow-unknown-type, as we should continue to confirm that those objects are rejected. Note that in one spot we can drop a layer of loop, re-indenting the body; viewing the diff with "-w" helps there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:09 -07:00
Jeff King	53eeed0a81	object-file.h: fix typo in variable declaration This should be "compat", not "comapt". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:09 -07:00
Lucas Seiki Oshiro	da692298ac	json-writer: describe the usage of jw_* functions Provide an overview of the set of functions used for manipulating `json_writer`s, by describing what functions should be used for each JSON-related task. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:33:07 -07:00
Lucas Seiki Oshiro	fba60a4841	json-writer: add docstrings to jw_* functions Add a docstring for each function that manipulates json_writers. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:33:06 -07:00
Junio C Hamano	cb96e1697a	The fifteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-15 17:27:23 -07:00

1 2 3 4 5 ...

77273 Commits (v2.50.0-rc2) All Branches Search

77273 Commits (v2.50.0-rc2)

All Branches