kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Jeff King	f2ed511a2f	t/helper: add zlib test-tool It's occasionally useful when testing or debugging to be able to do raw zlib inflate/deflate operations (e.g., to check the bytes of a specific loose or packed object). Even though zlib's deflate algorithm is used by many other programs, this is surprisingly hard to do in a portable way. E.g., gzip can do this if you manually munge some header bytes. But the result is somewhat arcane, and we don't assume gzip is available anyway. Likewise, pigz will handle raw zlib, but we can't assume it is available. So let's introduce a short test helper for just doing zlib operations. We'll use it in subsequent patches to add some new tests, but it would also have come in handy a few times in the past: - The hard-coded pack data from `3b910d0c5e` (add tests for indexing packs with delta cycles, 2013-08-23) could probably be generated on the fly. - Likewise we could avoid the hard-coded data from `0b1493c2d4` (git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT, 2025-02-25). Though note this would require support for more zlib options. - It would have helped with the debugging documented in `41dfbb2dbe` (howto: add article on recovering a corrupted object, 2013-10-25). I'll leave refactoring existing tests for another day, but I hope the examples above show the general utility. I aimed for simplicity in the code. In particular, it will read all input into a memory buffer, rather than streaming. That makes the zlib loops harder to get wrong (which has been a source of subtle bugs in the past). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Junio C Hamano	6dbc41631d	Merge branch 'ds/fix-thin-fix' "git index-pack --fix-thin" used to abort to prevent a cycle in delta chains from forming in a corner case even when there is no such cycle. * ds/fix-thin-fix: index-pack: allow revisiting REF_DELTA chains t5309: create failing test for 'git index-pack' test-tool: add pack-deltas helper	2025-05-12 14:22:49 -07:00
Junio C Hamano	a819a3da85	Merge branch 'ps/reftable-api-revamp' Overhaul of the reftable API. * ps/reftable-api-revamp: reftable/table: move printing logic into test helper reftable/constants: make block types part of the public interface reftable/table: introduce iterator for table blocks reftable/table: add `reftable_table` to the public interface reftable/block: expose a generic iterator over reftable records reftable/block: make block iterators reseekable reftable/block: store block pointer in the block iterator reftable/block: create public interface for reading blocks git-zlib: use `struct z_stream_s` instead of typedef reftable/block: rename `block_reader` to `reftable_block` reftable/block: rename `block` to `block_data` reftable/table: move reading block into block reader reftable/block: simplify how we track restart points reftable/blocksource: consolidate code into a single file reftable/reader: rename data structure to "table" reftable: fix formatting of the license header	2025-04-29 14:21:30 -07:00
Junio C Hamano	5a6de390d8	Merge branch 'az/tighten-string-array-constness' Code clean-up. * az/tighten-string-array-constness: global: mark usage strings and string tables const	2025-04-29 14:21:28 -07:00
Derrick Stolee	89d557b950	test-tool: add pack-deltas helper When trying to demonstrate certain behavior in tests, it can be helpful to create packfiles that have specific delta structures. 'git pack-objects' uses various algorithms to select deltas based on their compression rates, but that does not always demonstrate all possible packfile shapes. This becomes especially important when wanting to test 'git index-pack' and its ability to parse certain pack shapes. We have prior art in t/lib-pack.sh, where certain delta structures are produced by manually writing certain opaque pack contents. However, producing these script updates is cumbersome and difficult to do as a contributor. Instead, create a new test-tool, 'test-tool pack-deltas', that reads a list of instructions for which objects to include in a packfile and how those objects should be written in delta form. At the moment, this only supports REF_DELTAs as those are the kinds of deltas needed to exercise a bug in 'git index-pack'. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-28 15:37:25 -07:00
Junio C Hamano	2bc5414c41	Merge branch 'ps/parse-options-integers' Update parse-options API to catch mistakes to pass address of an integral variable of a wrong type/size. * ps/parse-options-integers: parse-options: detect mismatches in integer signedness parse-options: introduce precision handling for `OPTION_UNSIGNED` parse-options: introduce precision handling for `OPTION_INTEGER` parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()` parse-options: support unit factors in `OPT_INTEGER()` global: use designated initializers for options parse: fix off-by-one for minimum signed values	2025-04-24 17:25:34 -07:00
Junio C Hamano	36d8035d27	Merge branch 'ps/object-file-cleanup' Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"	2025-04-24 17:25:33 -07:00
Ahelenia Ziemiańska	86eef3541e	global: mark usage strings and string tables const Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-21 21:01:19 -07:00
Patrick Steinhardt	bc288c5929	parse-options: introduce precision handling for `OPTION_UNSIGNED` This commit is the equivalent to the preceding commit, but instead of introducing precision handling for `OPTION_INTEGER` we introduce it for `OPTION_UNSIGNED`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:16 -07:00
Patrick Steinhardt	09705696f7	parse-options: introduce precision handling for `OPTION_INTEGER` The `OPTION_INTEGER` option type accepts a signed integer. The type of the underlying integer is a simple `int`, which restricts the range of values accepted by such options. But there is a catch: because the caller provides a pointer to the value via the `.value` field, which is a simple void pointer. This has two consequences: - There is no check whether the passed value is sufficiently long to store the entire range of `int`. This can lead to integer wraparound in the best case and out-of-bounds writes in the worst case. - Even when a caller knows that they want to store a value larger than `INT_MAX` they don't have a way to do so. In practice this doesn't tend to be a huge issue because users typically don't end up passing huge values to most commands. But the parsing logic is demonstrably broken, and it is too easy to get the calling convention wrong. Improve the situation by introducing a new `precision` field into the structure. This field gets assigned automatically by `OPT_INTEGER_F()` and tracks the size of the passed value. Like this it becomes possible for the caller to pass arbitrarily-sized integers and the underlying logic knows to handle it correctly by doing range checks. Furthermore, convert the code to use `strtoimax()` intstead of `strtol()` so that we can also parse values larger than `LONG_MAX`. Note that we do not yet assert signedness of the passed variable, which is another source of bugs. This will be handled in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:15 -07:00
Patrick Steinhardt	785c17df78	parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()` With the preceding commit, `OPT_INTEGER()` has learned to support unit factors. Consequently, the major differencen between `OPT_INTEGER()` and `OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of them do support them now. Instead, the difference is that one handles signed and the other handles unsigned integers. Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to `OPT_UNSIGNED()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:15 -07:00
Patrick Steinhardt	d012ceb5f3	global: use designated initializers for options While we expose macros for most of our different option types understood by the "parse-options" subsystem, not every combination of fields that has one as that would otherwise quickly lead to an explosion of macros. Instead, we just initialize structures manually for those variants of fields that don't have a macro. Callsites that open-code these structure initialization don't use designated initializers though and instead just provide values for each of the fields that they want to initialize. This has three significant downsides: - Callsites need to specify all values up to the last field that they care about. This often includes fields that should simply be left at their default zero-initialized state, which adds distraction. - Any reader not deeply familiar with the layout of the structure has a hard time figuring out what the respective initializers mean. - Reordering or introducing new fields in the middle of the structure is impossible without adapting all callsites. Convert all sites to instead use designated initializers, which we have started using in our codebase quite a while ago. This allows us to skip any default-initialized fields, gives the reader context by specifying the field names and allows us to reorder or introduce new fields where we want to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-17 08:15:15 -07:00
Junio C Hamano	9bdd7ecf7e	Merge branch 'ps/test-wo-perl-prereq' "make test" used to have a hard dependency on (basic) Perl; tests have been rewritten help environment with NO_PERL test the build as much as possible. * ps/test-wo-perl-prereq: t5703: refactor test to not depend on Perl t5316: refactor `max_chain()` to not depend on Perl t0210: refactor trace2 scrubbing to not use Perl t0021: refactor `generate_random_characters()` to not depend on Perl t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl t/lib-t6000: refactor `name_from_description()` to not depend on Perl t/lib-gpg: refactor `sanitize_pgp()` to not depend on Perl t: refactor tests depending on Perl for textconv scripts t: refactor tests depending on Perl to print data t: refactor tests depending on Perl substitution operator t: refactor tests depending on Perl transliteration operator Makefile: stop requiring Perl when running tests meson: stop requiring Perl when tests are enabled t: adapt existing PERL prerequisites t: introduce PERL_TEST_HELPERS prerequisite t: adapt `test_readlink()` to not use Perl t: adapt `test_copy_bytes()` to not use Perl t: adapt character translation helpers to not use Perl t: refactor environment sanitization to not use Perl t: skip chain lint when PERL_PATH is unset	2025-04-16 13:54:20 -07:00
Junio C Hamano	ee847e0034	Merge branch 'ps/object-wo-the-repository' The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`	2025-04-15 13:50:15 -07:00
Patrick Steinhardt	68cd492a3e	object-store: merge "object-store-ll.h" and "object-store.h" The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:37 -07:00
Junio C Hamano	0dfca98881	Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanup * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`	2025-04-08 14:28:17 -07:00
Patrick Steinhardt	e0011188ca	reftable/table: move printing logic into test helper The logic to print individual blocks in a table is hosted in the reftable library. This is only the case due to historical reasons though because users of the library had no interfaces to read blocks one by one. Otherwise, printing individual blocks has no place in the reftable library given that the format will not be generic in the first place. We have now grown a public interface to iterate through blocks contained in a table, and thus we can finally move the logic to print them into the test helper. Move over the logic and refactor it accordingly. Note that the iterator also trivially allows us to access index sections, which we previously didn't print at all. This omission wasn't intentional though, so start dumping those sections as well so that we can assert that indices are written as expected. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:13 -07:00
Patrick Steinhardt	b648bd6549	reftable/reader: rename data structure to "table" The `struct reftable_reader` subsystem encapsulates a table that has been read from the disk. As such, the current name of that structure is somewhat hard to understand as it only talks about the fact that we read something from disk, without really giving an indicator _what_ that is. Furthermore, this naming schema doesn't really fit well into how the other structures are named: `reftable_merged_table`, `reftable_stack`, `reftable_block` and `reftable_record` are all named after what they encapsulate. Rename the subsystem to `reftable_table`, which directly gives a hint that the data structure is about handling the individual tables part of the stack. While this change results in a lot of churn, it prepares for us exposing the APIs to third-party callers now that the reftable library is a standalone library that can be linked against by other projects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:09 -07:00
Patrick Steinhardt	db8ff64a3a	t: refactor tests depending on Perl transliteration operator We have a bunch of tests that use Perl to perform character transliteration via the "y/" or "tr/" operator. These usecases can be trivially replaced with tr(1). Refactor the tests accordingly so that we can drop a couple of PERL_TEST_HELPERS prerequisites. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:47:38 -07:00
Patrick Steinhardt	21386ed6eb	t: adapt `test_readlink()` to not use Perl The `test_readlink()` helper function reads a symbolic link and returns the path it is pointing to. It is thus equivalent to the readlink(1) utility, which isn't available on all supported platforms. As such, it is implemented using Perl so that we can use it even on platforms where the shell utility isn't available. While using readlink(1) is not an option, what we can do is to implement the logic ourselves in our test-tool. Do so, which allows a bunch of tests to pass when Perl is not available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:47:37 -07:00
Junio C Hamano	45e31f0bac	Merge branch 'js/mingw-admins-are-special' "Dubious ownership" checks on Windows has been tightened up. * js/mingw-admins-are-special: test-tool path-utils: support debugging "dubious ownership" issues mingw: special-case administrators even more	2025-04-07 14:23:20 -07:00
Junio C Hamano	8d6413a1be	Merge branch 'ps/refname-avail-check-optim' The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`	2025-03-29 16:39:07 +09:00
Johannes Schindelin	5bb88e89ef	test-tool path-utils: support debugging "dubious ownership" issues This adds a new sub-sub-command for `test-tool`, simply passing through the command-line arguments to the `is_path_owned_by_current_user()` function. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-25 04:45:56 -07:00
Patrick Steinhardt	cec2b6f55a	refs/iterator: separate lifecycle from iteration The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-12 11:31:18 -07:00
Patrick Steinhardt	7d70b29c4f	hash: stop depending on `the_repository` in `null_oid()` The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:20 -07:00
Junio C Hamano	62c58891e1	Merge branch 'tz/doc-txt-to-adoc-fixes' Fallouts from recent renaming of documentation files from .txt suffix to the new .adoc suffix have been corrected. * tz/doc-txt-to-adoc-fixes: (38 commits) xdiff: .txt -> .adoc fixes unpack-trees.c: .txt -> .adoc fixes transport.h: .txt -> .adoc fixes trace2/tr2_sysenv.c: .txt -> .adoc fixes trace2.h: .txt -> .adoc fixes t6434: .txt -> .adoc fixes t6012: .txt -> .adoc fixes t/helper/test-rot13-filter.c: .txt -> .adoc fixes simple-ipc.h: .txt -> .adoc fixes setup.c: .txt -> .adoc fixes refs.h: .txt -> .adoc fixes pseudo-merge.h: .txt -> .adoc fixes parse-options.h: .txt -> .adoc fixes object-name.c: .txt -> .adoc fixes list-objects-filter-options.h: .txt -> .adoc fixes fsck.h: .txt -> .adoc fixes diffcore.h: .txt -> .adoc fixes diff.h: .txt -> .adoc fixes contrib/long-running-filter: .txt -> .adoc fixes config.c: .txt -> .adoc fixes ...	2025-03-06 14:06:31 -08:00
Junio C Hamano	feffb34257	Merge branch 'ps/path-sans-the-repository' The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions	2025-03-05 10:37:43 -08:00
Todd Zullinger	e680c62542	t/helper/test-rot13-filter.c: .txt -> .adoc fixes Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-03 13:49:25 -08:00
Junio C Hamano	ab09eddf60	Merge branch 'ps/build-meson-fixes-0130' Assorted fixes and improvements to the build procedure based on meson. * ps/build-meson-fixes-0130: gitlab-ci: restrict maximum number of link jobs on Windows meson: consistently use custom program paths to resolve programs meson: fix overwritten `git` variable meson: prevent finding sed(1) in a loop meson: improve handling of `sane_tool_path` option meson: improve PATH handling meson: drop separate version library meson: stop linking libcurl into all executables meson: introduce `libgit_curl` dependency meson: simplify use of the common-main library meson: inline the static 'git' library meson: fix OpenSSL fallback when not explicitly required meson: fix exec path with enabled runtime prefix	2025-03-03 08:53:02 -08:00
Patrick Steinhardt	ebb35369f1	meson: simplify use of the common-main library The "common-main.c" file is used by multiple executables. In order to make it easy to set it up we have created a separate library that these executables can link against. All of these executables also want to link against `libgit.a` though, which makes it necessary to specify both of these as dependencies for every executable. Simplify this a bit by declaring the library as a source dependency: instead of creating a static library, we now instead compile the common set of files into each executable separately. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-26 09:09:35 -08:00
Junio C Hamano	e565f37553	Merge branch 'ds/backfill' Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate	2025-02-18 15:30:31 -08:00
Junio C Hamano	aae91a86fb	Merge branch 'ds/name-hash-tweaks' "git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version	2025-02-12 10:08:51 -08:00
Patrick Steinhardt	f5c714e2a7	path: refactor `repo_submodule_path()` family of functions As explained in an earlier commit, we're refactoring path-related functions to provide a consistent interface for computing paths into the commondir, gitdir and worktree. Refactor the "submodule" family of functions accordingly. Note that in contrast to the other `repo_*_path()` families, we have to pass in the repository as a non-constant pointer. This is because we end up calling `repo_read_gitmodules()` deep down in the callstack, which may end up modifying the repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:22 -08:00
Derrick Stolee	bff4555767	backfill: add --sparse option One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|-------\| \| (Initial clone) \| 3 \| 110 MB \| \| \| 10K \| 12 \| 192 MB \| 17.2s \| \| 15K \| 9 \| 192 MB \| 15.5s \| \| 20K \| 8 \| 192 MB \| 15.5s \| \| 25K \| 7 \| 192 MB \| 14.7s \| This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|------\| \| (Initial clone) \| 2 \| 1,876 MB \| \| \| 10K \| 11 \| 2,187 MB \| 46s \| \| 25K \| 7 \| 2,188 MB \| 43s \| \| 50K \| 5 \| 2,194 MB \| 44s \| \| 100K \| 4 \| 2,194 MB \| 48s \| This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00
Junio C Hamano	caf17423d3	Merge branch 'tb/unsafe-hash-cleanup' The API around choosing to use unsafe variant of SHA-1 implementation has been updated in an attempt to make it harder to abuse. * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper	2025-02-03 10:23:32 -08:00
Patrick Steinhardt	0578f1e66a	global: adapt callers to use generic hash context helpers Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-31 10:06:11 -08:00
Patrick Steinhardt	7346e340f1	hash: stop typedeffing the hash context We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-31 10:06:10 -08:00
Junio C Hamano	0cbcba5455	Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanup * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper	2025-01-31 10:05:46 -08:00
Junio C Hamano	f046ab2dd4	Merge branch 'ds/path-walk-1' Introduce a new API to visit objects in batches based on a common path, or by type. * ds/path-walk-1: path-walk: drop redundant parse_tree() call path-walk: reorder object visits path-walk: mark trees and blobs as UNINTERESTING path-walk: visit tags and cached objects path-walk: allow consumer to specify object types t6601: add helper for testing path-walk API test-lib-functions: add test_cmp_sorted path-walk: introduce an object walk by path	2025-01-29 14:05:09 -08:00
Junio C Hamano	f0a371a39d	Merge branch 'jc/show-usage-help' The help text from "git $cmd -h" appear on the standard output for some $cmd and the standard error for others. The built-in commands have been fixed to show them on the standard output consistently. * jc/show-usage-help: builtin: send usage() help text to standard output oddballs: send usage() help text to standard output builtins: send usage_with_options() help text to standard output usage: add show_usage_if_asked() parse-options: add show_usage_with_options_if_asked() t0012: optionally check that "-h" output goes to stdout	2025-01-28 13:02:22 -08:00
Derrick Stolee	7f9870794f	test-tool: add helper for name-hash values Add a new test-tool helper, name-hash, to output the value of the name-hash algorithms for the input list of strings, one per line. Since the name-hash values can be stored in the .bitmap files, it is important that these hash functions do not change across Git versions. Add a simple test to t5310-pack-bitmaps.sh to provide some testing of the current values. Due to how these functions are implemented, it would be difficult to change them without disturbing these values. The paths used for this test are carefully selected to demonstrate some of the behavior differences of the two current name hash versions, including which conditions will cause them to collide. Create a performance test that uses test_size to demonstrate how collisions occur for these hash algorithms. This test helps inform someone as to the behavior of the name-hash algorithms for their repo based on the paths at HEAD. My copy of the Git repository shows modest statistics around the collisions of the default name-hash algorithm: Test this tree -------------------------------------------------- 5314.1: paths at head 4.5K 5314.2: distinct hash value: v1 4.1K 5314.3: maximum multiplicity: v1 13 5314.4: distinct hash value: v2 4.2K 5314.5: maximum multiplicity: v2 9 Here, the maximum collision multiplicity is 13, but around 10% of paths have a collision with another path. In a more interesting example, the microsoft/fluentui [1] repo had these statistics at time of committing: Test this tree -------------------------------------------------- 5314.1: paths at head 19.5K 5314.2: distinct hash value: v1 8.2K 5314.3: maximum multiplicity: v1 279 5314.4: distinct hash value: v2 17.8K 5314.5: maximum multiplicity: v2 44 [1] https://github.com/microsoft/fluentui That demonstrates that of the nearly twenty thousand path names, they are assigned around eight thousand distinct values. 279 paths are assigned to a single value, leading the packing algorithm to sort objects from those paths together, by size. With the v2 name hash function, the maximum multiplicity lowers to 44, leaving some room for further improvement. In a more extreme example, an internal monorepo had a much worse collision rate: Test this tree -------------------------------------------------- 5314.1: paths at head 227.3K 5314.2: distinct hash value: v1 72.3K 5314.3: maximum multiplicity: v1 14.4K 5314.4: distinct hash value: v2 166.5K 5314.5: maximum multiplicity: v2 138 Here, we can see that the v2 name hash function provides somem improvements, but there are still a number of collisions that could lead to repacking problems at this scale. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Taylor Blau	3339180b28	t/helper/test-hash.c: use unsafe_hash_algo() Remove a series of conditionals within the shared cmd_hash_impl() helper that powers the 'sha1' and 'sha1-unsafe' helpers. Instead, replace them with a single conditional that transforms the specified hash algorithm into its unsafe variant. Then all subsequent calls can directly use whatever function it wants to call without having to decide between the safe and unsafe variants. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-23 10:28:17 -08:00
Taylor Blau	d9213e4716	t/helper/test-tool: implement sha1-unsafe helper With the new "unsafe" SHA-1 build knob, it is convenient to have a test-tool that can exercise Git's unsafe SHA-1 wrappers for testing, similar to 't/helper/test-tool sha1'. Implement that helper by altering the implementation of that test-tool (in cmd_hash_impl(), which is generic and parameterized over different hash functions) to conditionally run the unsafe variants of the chosen hash function, and expose the new behavior via a new 'sha1-unsafe' test helper. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-23 10:28:16 -08:00
Junio C Hamano	7b39a128c8	Merge branch 'ps/the-repository' More code paths have a repository passed through the callchain, instead of assuming the primary the_repository object. * ps/the-repository: match-trees: stop using `the_repository` graph: stop using `the_repository` add-interactive: stop using `the_repository` tmp-objdir: stop using `the_repository` resolve-undo: stop using `the_repository` credential: stop using `the_repository` mailinfo: stop using `the_repository` diagnose: stop using `the_repository` server-info: stop using `the_repository` send-pack: stop using `the_repository` serve: stop using `the_repository` trace: stop using `the_repository` pager: stop using `the_repository` progress: stop using `the_repository`	2025-01-21 08:44:54 -08:00
Junio C Hamano	cb441e1ec3	Merge branch 'ps/reftable-get-random-fix' The code to compute "unique" name used git_rand() which can fail or get stuck; the callsite does not require cryptographic security. Introduce the "insecure" mode and use it appropriately. * ps/reftable-get-random-fix: reftable/stack: accept insecure random bytes wrapper: allow generating insecure random bytes	2025-01-21 08:44:53 -08:00
Junio C Hamano	b821c999ca	builtins: send usage_with_options() help text to standard output Using the show_usage_with_options_if_asked() helper we introduced earlier, fix callers of usage_with_options() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. The test in t7600 for "git merge -h" may want to be retired, as the same is covered by t0012 already, but it is specifically testing that the "-h" option gets a response even with a corrupt index file, so for now let's leave it there. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Patrick Steinhardt	1568d1562e	wrapper: allow generating insecure random bytes The `csprng_bytes()` function generates randomness and writes it into a caller-provided buffer. It abstracts over a couple of implementations, where the exact one that is used depends on the platform. These implementations have different guarantees: while some guarantee to never fail (arc4random(3)), others may fail. There are two significant failures to distinguish from one another: - Systemic failure, where e.g. opening "/dev/urandom" fails or when OpenSSL doesn't have a provider configured. - Entropy failure, where the entropy pool is exhausted, and thus the function cannot guarantee strong cryptographic randomness. While we cannot do anything about the former, the latter failure can be acceptable in some situations where we don't care whether or not the randomness can be predicted. Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt into weak cryptographic randomness. The exact behaviour of the flag depends on the underlying implementation: - `arc4random_buf()` never returns an error, so it doesn't change. - `getrandom()` pulls from "/dev/urandom" by default, which never blocks on modern systems even when the entropy pool is empty. - `getentropy()` seems to block when there is not enough randomness available, and there is no way of changing that behaviour. - `GtlGenRandom()` doesn't mention anything about its specific failure mode. - The fallback reads from "/dev/urandom", which also returns bytes in case the entropy pool is drained in modern Linux systems. That only leaves OpenSSL with `RAND_bytes()`, which returns an error in case the returned data wouldn't be cryptographically safe. This function is replaced with a call to `RAND_pseudo_bytes()`, which can indicate whether or not the returned data is cryptographically secure via its return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag is set, then we ignore the insecurity and return the data regardless. It is somewhat questionable whether we really need the flag in the first place, or whether we wouldn't just ignore the potentially-insecure data. But the risk of doing that is that we might have or grow callsites that aren't aware of the potential insecureness of the data in places where it really matters. So using a flag to opt-in to that behaviour feels like the more secure choice. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-07 09:04:18 -08:00
Patrick Steinhardt	5e7fe8a7b8	commit-reach: use `size_t` to track indices when computing merge bases The functions `repo_get_merge_bases_many()` and friends accepts an array of commits as well as a parameter that indicates how large that array is. This parameter is using a signed integer, which leads to a couple of warnings with -Wsign-compare. Refactor the code to use `size_t` to track indices instead and adapt callers accordingly. While most callers are trivial, there are two callers that require a bit more scrutiny: - builtin/merge-base.c:show_merge_base() subtracts `1` from the `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if the variable was `0` it would wrap. This code is fine though because its only caller will execute that code only when `argc >= 2`, and it follows that `rev_nr >= 2`, as well. - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`. Again, there is only a single caller that populates `rev_nr` with `good_revs.nr`. And because a bisection always requires at least one good revision it follws that `rev_nr >= 1`. Mark the file as -Wsign-compare-clean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:12:40 -08:00
Junio C Hamano	4156b6a741	Merge branch 'ps/build-sign-compare' Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings	2024-12-23 09:32:11 -08:00
Derrick Stolee	6333e7ae0b	path-walk: mark trees and blobs as UNINTERESTING When the input rev_info has UNINTERESTING starting points, we want to be sure that the UNINTERESTING flag is passed appropriately through the objects. To match how this is done in places such as 'git pack-objects', we use the mark_edges_uninteresting() method. This method has an option for using the "sparse" walk, which is similar in spirit to the path-walk API's walk. To be sure to keep it independent, add a new 'prune_all_uninteresting' option to the path_walk_info struct. To check how the UNINTERSTING flag is spread through our objects, extend the 'test-tool path-walk' command to output whether or not an object has that flag. This changes our tests significantly, including the removal of some objects that were previously visited due to the incomplete implementation. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-20 08:37:05 -08:00

1 2 3 4 5 ...

1048 Commits (maint)