kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	b3d1c85d48	Merge branch 'gc/config-context' Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config	2023-07-06 11:54:48 -07:00
Calvin Wan	91c080dff5	git-compat-util: move alloc macros to git-compat-util.h alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:42:31 -07:00
Calvin Wan	da9502ff4d	treewide: remove unnecessary includes for wrapper.h Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:41:59 -07:00
Junio C Hamano	a1264a08a1	Merge branch 'en/header-split-cache-h-part-3' Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...	2023-06-29 16:43:21 -07:00
Glen Choo	8868b1ebfb	config: pass kvi to die_bad_number() Plumb "struct key_value_info" through all code paths that end in die_bad_number(), which lets us remove the helper functions that read analogous values from "struct config_reader". As a result, nothing reads config_reader.config_kvi any more, so remove that too. In config.c, this requires changing the signature of git_configset_get_value() to 'return' "kvi" in an out parameter so that git_configset_get_<type>() can pass it to git_config_<type>(). Only numeric types will use "kvi", so for non-numeric types (e.g. git_configset_get_string()), pass NULL to indicate that the out parameter isn't needed. Outside of config.c, config callbacks now need to pass "ctx->kvi" to any of the git_config_<type>() functions that parse a config string into a number type. Included is a .cocci patch to make that refactor. The only exceptional case is builtin/config.c, where git_config_<type>() is called outside of a config callback (namely, on user-provided input), so config source information has never been available. In this case, die_bad_number() defaults to a generic, but perfectly descriptive message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure not to change the message. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-28 14:06:40 -07:00
Glen Choo	a4e7e317f8	config: add ctx arg to config_fn_t Add a new "const struct config_context ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-28 14:06:39 -07:00
Elijah Newren	a034e9106f	object-store-ll.h: split this header out of object-store.h The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:54 -07:00
Derrick Stolee	d24eda4e03	repository: create disable_replace_refs() Several builtins depend on being able to disable the replace references so we actually operate on each object individually. These currently do so by directly mutating the 'read_replace_refs' global. A future change will move this global into a different place, so it will be necessary to change all of these lines. However, we can simplify that transition by abstracting the purpose of these global assignments with a method call. We will need to keep this read_replace_refs global forever, as we want to make sure that we never use replace refs throughout the life of the process if this method is called. Future changes may present a repository-scoped version of the variable to represent that repository's core.useReplaceRefs config value, but a zero-valued read_replace_refs will always override such a setting. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-12 13:34:55 -07:00
Junio C Hamano	849c8b3dbf	Merge branch 'tb/pack-revindex-on-disk' The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()	2023-04-27 16:00:59 -07:00
Taylor Blau	9f7f10a282	t: invert `GIT_TEST_WRITE_REV_INDEX` Back in `e8c58f894b` (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we added a test knob to conditionally enable writing a ".rev" file when indexing a pack. At the time, this was used to ensure that the test suite worked even when ".rev" files were written, which served as a stress-test for the on-disk reverse index implementation. Now that reading from on-disk ".rev" files is enabled by default, the test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning. We could get rid of the option entirely, but there would be no convenient way to test Git when ".rev" files aren't in place. Instead of getting rid of the option, invert its meaning to instead disable writing ".rev" files, thereby running the test suite in a mode where the reverse index is generated from scratch. This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some spelling of "true", we are still running and exercising Git's behavior when forced to generate reverse indexes from scratch. Do so by setting it in the linux-TEST-vars CI run to ensure that we are maintaining good coverage of this now-legacy code. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Taylor Blau	a8dd7e05b1	config: enable `pack.writeReverseIndex` by default Back in `e37d0b8730` (builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms] Range (min … max): 290.3 ms … 331.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms] Range (min … max): 1.6 ms … 4.6 ms 1155 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' In the more than two years since `e37d0b8730` was merged, Git's implementation of on-disk reverse indexes has been thoroughly tested, both from users enabling `pack.writeReverseIndexes`, and from GitHub's deployment of the feature. The latter has been running without incident for more than two years. This patch changes Git's behavior to write on-disk reverse indexes by default when indexing a pack, which should make the above operations faster for everybody's Git installation after a repack. (The previous commit explains some potential drawbacks of using on-disk reverse indexes in certain limited circumstances, that essentially boil down to a trade-off between time to generate, and time to access. For those limited cases, the `pack.readReverseIndex` escape hatch can be used). [1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Elijah Newren	87bed17907	object-file.h: move declarations for object-file.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:10 -07:00
Elijah Newren	6f2d743043	treewide: be explicit about dependence on oid-array.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	75f273d9b7	treewide: be explicit about dependence on pack-revindex.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Junio C Hamano	6047b28eb7	Merge branch 'en/header-split-cleanup' Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers	2023-04-06 13:38:31 -07:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Junio C Hamano	e7dca80692	Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-04 08:25:52 -07:00
Ævar Arnfjörð Bjarmason	a5183d7696	cocci: apply the "promisor-remote.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason	bc726bd075	cocci: apply the "object-store.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00
Elijah Newren	e38da487cc	setup.h: move declarations for setup.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:54 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	d5ebb50dcb	wrapper.h: move declarations for wrapper.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Elijah Newren	cbeab74713	replace-object.h: move read_replace_refs declaration from cache.h to here Adjust several files to be more explicit about their dependency on replace-objects to accommodate this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:30 -08:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Elijah Newren	36bf195890	alloc.h: move ALLOC_GROW() functions from cache.h This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:28 -08:00
Jiang Xin	b4eda05d58	i18n: fix mismatched camelCase config variables Some config variables are combinations of multiple words, and we typically write them in camelCase forms in manpage and translatable strings. It's not easy to find mismatches for these camelCase config variables during code reviews, but occasionally they are identified during localization translations. To check for mismatched config variables, I introduced a new feature in the helper program for localization[^1]. The following mismatched config variables have been identified by running the helper program, such as "git-po-helper check-pot". Lowercase in manpage should use camelCase: * Documentation/config/http.txt: http.pinnedpubkey Lowercase in translable strings should use camelCase: * builtin/fast-import.c: pack.indexversion * builtin/gc.c: gc.logexpiry * builtin/index-pack.c: pack.indexversion * builtin/pack-objects.c: pack.indexversion * builtin/repack.c: pack.writebitmaps * commit.c: i18n.commitencoding * gpg-interface.c: user.signingkey * http.c: http.postbuffer * submodule-config.c: submodule.fetchjobs Mismatched camelCases, choose the former: * Documentation/config/transfer.txt: transfer.credentialsInUrl remote.c: transfer.credentialsInURL [^1]: https://github.com/git-l10n/git-po-helper Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-17 10:38:26 -07:00
Junio C Hamano	538dc459a0	Merge branch 'ep/maint-equals-null-cocci' Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-20 15:26:59 -07:00
Junio C Hamano	2b0a58d164	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35 * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-02 10:06:04 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Junio C Hamano	eb804cd405	Merge branch 'ns/core-fsyncmethod' Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-03-25 16:38:24 -07:00
Junio C Hamano	38bbb9e990	Merge branch 'ab/string-list-count-in-size-t' Count string_list items in size_t, not "unsigned int". * ab/string-list-count-in-size-t: string-list API: change "nr" and "alloc" to "size_t" gettext API users: don't explicitly cast ngettext()'s "n"	2022-03-16 17:53:09 -07:00
Junio C Hamano	430883a70c	Merge branch 'ab/object-file-api-updates' Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in .c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables	2022-03-16 17:53:08 -07:00
Junio C Hamano	ccafbbfb4e	Merge branch 'ab/plug-random-leaks' Plug random memory leaks. * ab/plug-random-leaks: repository.c: free the "path cache" in repo_clear() range-diff: plug memory leak in read_patches() range-diff: plug memory leak in common invocation lockfile API users: simplify and don't leak "path" commit-graph: stop fill_oids_from_packs() progress on error and free() commit-graph: fix memory leak in misused string_list API submodule--helper: fix trivial leak in module_add() transport: stop needlessly copying bundle header references bundle: call strvec_clear() on allocated strvec remote-curl.c: free memory in cmd_main() urlmatch.c: add and use a _release() function diff.c: free "buf" in diff_words_flush() merge-base: free() allocated "struct commit *" list index-pack: fix memory leaks	2022-03-13 22:56:18 +00:00
Neeraj Singh	020406eaa5	core.fsync: introduce granular fsync control infrastructure This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Ævar Arnfjörð Bjarmason	6f69325258	gettext API users: don't explicitly cast ngettext()'s "n" Change a few stray users of the inline gettext.h Q_() function to stop casting its "n" argument, the vast majority of the users of that wrapper API use the implicit cast to "unsigned long". The ngettext() function (which Q_() resolves to) takes an "unsigned long int", and so does our Q_() wrapper for it, see `0c9ea33b90` (i18n: add stub Q_() wrapper for ngettext, 2011-03-09). The function isn't ours, but provided by e.g. GNU libintl. This amends code added in added in `7171a0b0cf` (index-pack: correct "len" type in unpack_data(), 2016-07-13). The cast it added for the printf format to die() was needed, but not the cast to Q_(). Likewise the casts in strbuf.c added in `8f354a1fae` (l10n: localizable upload progress messages, 2019-07-02) and for builtin/merge-recursive.c in `ccf7813139` (i18n: merge-recursive: mark error messages for translation, 2016-09-15) weren't needed. In the latter case the cast was copy/pasted from the argument to warning() itself, added in `b74d779bd9` (MinGW: Fix compiler warning in merge-recursive, 2009-05-23). The cast for warning() is needed, but not the one for ngettext()'s "n" argument. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-07 11:57:52 -08:00
Ævar Arnfjörð Bjarmason	f2bcc69e7e	index-pack: fix memory leaks Fix various memory leaks in "git index-pack", due to how tightly coupled this command is with the revision walking this doesn't make any new tests pass. But e.g. this now passes, and had several failures before, i.e. we still have failures in tests 3, 5 etc., which are being skipped here. ./t5300-pack-object.sh --run=1-2,4,6-27,30-42 It is a bit odd that we'll free "opts.anomaly", since the "opts" is a "struct pack_idx_option" declared in pack.h. In pack-write.c there's a reset_pack_idx_option(), but it only wipes the contents, but doesn't free() anything. Doing this here in cmd_index_pack() is correct because while the struct is declared in pack.h, this code in builtin/index-pack.c (in read_v2_anomalous_offsets()) is what allocates the "opts.anomaly", so we should also free it here. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-04 13:24:17 -08:00
Ævar Arnfjörð Bjarmason	44439c1c58	object-file API: have hash_object_file() take "enum object_type" Change the hash_object_file() function to take an "enum object_type". Since a preceding commit all of its callers are passing either "{commit,tree,blob,tag}_type", or the result of a call to type_name(), the parse_object() caller that would pass NULL is now using stream_object_signature(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	0f156dbb04	object-file API: split up and simplify check_object_signature() Split up the check_object_signature() function into that non-streaming version (it accepts an already filled "buf"), and a new stream_object_signature() which will retrieve the object from storage, and hash it on-the-fly. All of the callers of check_object_signature() were effectively calling two different functions, if we go by cyclomatic complexity. I.e. they'd either take the early "if (map)" branch and return early, or not. This has been the case since the "if (map)" condition was added in `090ea12671` (parse_object: avoid putting whole blob in core, 2012-03-07). We can then further simplify the resulting check_object_signature() function since only one caller wanted to pass a non-NULL "buf" and a non-NULL "real_oidp". That "read_loose_object()" codepath used by "git fsck" can instead use hash_object_file() followed by oideq(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	ee213de22d	object API users + docs: check <0, not !0 with check_object_signature() Change those users of the object API that misused check_object_signature() by assuming it returned any non-zero when the OID didn't match the expected value to check <0 instead. In practice all of this code worked before, but it wasn't consistent with rest of the users of the API. Let's also clarify what the <0 return value means in API docs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	b04cdea46c	object-file API: add a format_object_header() function Add a convenience function to wrap the xsnprintf() command that generates loose object headers. This code was copy/pasted in various parts of the codebase, let's define it in one place and re-use it from there. All except one caller of it had a valid "enum object_type" for us, it's only write_object_file_prepare() which might need to deal with "git hash-object --literally" and a potential garbage type. Let's have the primary API use an "enum object_type", and define a _literally() function that can take an arbitrary "const char " for the type. See [1] for the discussion that prompted this patch, i.e. new code in object-file.c that wanted to copy/paste the xsnprintf() invocation. In the case of fast-import.c the callers unfortunately need to cast back & forth between "unsigned char " and "char ", since format_object_header() ad encode_in_pack_object_header() take different signedness. 1. https://lore.kernel.org/git/211213.86bl1l9bfz.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Matt Cooper	0cf5fbc2e4	index-pack: clarify the breached limit As a small courtesy to users, report what limit was breached. This is especially useful when a push exceeds a server-defined limit, since the user is unlikely to have configured the limit (their host did). Also demonstrate the human-readable message in a test. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Matt Cooper <vtbassmatt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-23 17:41:10 -08:00
Jean-Noël Avila	6fa00ee843	i18n: factorize "--foo requires --bar" and the like They are all replaced by "the option '%s' requires '%s'", which is a new string but replaces 17 previous unique strings. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-05 13:31:00 -08:00
Jean-Noël Avila	12909b6b8a	i18n: turn "options are incompatible" into "cannot be used together" Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-05 13:29:23 -08:00
Jiang Xin	f733719316	i18n: fix typos found during l10n for git 2.34.0 Emir and Jean-Noël reported typos in some i18n messages when preparing l10n for git 2.34.0. * Fix unstable spelling of config variable "gpg.ssh.defaultKeyCommand" which was introduced in commit `fd9e226776` (ssh signing: retrieve a default key from ssh-agent, 2021-09-10). * Add missing space between "with" and "--python" which was introduced in commit `bd0708c7eb` (ref-filter: add %(raw) atom, 2021-07-26). * Fix unmatched single quote in 'builtin/index-pack.c' which was introduced in commit `8737dab346` (index-pack: refactor renaming in final(), 2021-09-09) [1] https://github.com/git-l10n/git-po/pull/567 Reported-by: Emir Sarı <bitigchi@me.com> Reported-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-31 22:49:49 -07:00
Junio C Hamano	061a21d36d	Merge branch 'ab/fsck-unexpected-type' "git fsck" has been taught to report mismatch between expected and actual types of an object better. * ab/fsck-unexpected-type: fsck: report invalid object type-path combinations fsck: don't hard die on invalid object types object-file.c: stop dying in parse_loose_header() object-file.c: return ULHR_TOO_LONG on "header too long" object-file.c: use "enum" return type for unpack_loose_header() object-file.c: simplify unpack_loose_short_header() object-file.c: make parse_loose_header_extended() public object-file.c: return -1, not "status" from unpack_loose_header() object-file.c: don't set "typep" when returning non-zero cat-file tests: test for current --allow-unknown-type behavior cat-file tests: add corrupt loose object test cat-file tests: test for missing/bogus object with -t, -s and -p cat-file tests: move bogus_* variable declarations earlier fsck tests: test for garbage appended to a loose object fsck tests: test current hash/type mismatch behavior fsck tests: refactor one test to use a sub-repo fsck tests: add test for fsck-ing an unknown type	2021-10-25 16:06:56 -07:00
Ævar Arnfjörð Bjarmason	96e41f58fe	fsck: report invalid object type-path combinations Improve the error that's emitted in cases where we find a loose object we parse, but which isn't at the location we expect it to be. Before this change we'd prefix the error with a not-a-OID derived from the path at which the object was found, due to an emergent behavior in how we'd end up with an "OID" in these codepaths. Now we'll instead say what object we hashed, and what path it was found at. Before this patch series e.g.: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ mv objects/e6/ objects/e7 Would emit ("[...]" used to abbreviate the OIDs): git fsck error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...]) error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...] Now we'll instead emit: error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...] Furthermore, we'll do the right thing when the object type and its location are bad. I.e. this case: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ mv objects/83 objects/84 As noted in an earlier commits we'd simply die early in those cases, until preceding commits fixed the hard die on invalid object type: $ git fsck fatal: invalid object type Now we'll instead emit sensible error messages: $ git fsck error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...] error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...] In both fsck.c and object-file.c we're using null_oid as a sentinel value for checking whether we got far enough to be certain that the issue was indeed this OID mismatch. We need to add the "object corrupt or missing" special-case to deal with cases where read_loose_object() will return an error before completing check_object_signature(), e.g. if we have an error in unpack_loose_rest() because we find garbage after the valid gzip content: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ git fsck error: garbage at end of loose object 'e69d[...]' error: unable to unpack contents of ./objects/e6/9d[...] error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...] There is currently some weird messaging in the edge case when the two are combined, i.e. because we're not explicitly passing along an error state about this specific scenario from check_stream_oid() via read_loose_object() we'll end up printing the null OID if an object is of an unknown type and it can't be unpacked by zlib, e.g.: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ /usr/bin/git fsck fatal: invalid object type $ ~/g/git/git fsck error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f' error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f [...] I think it's OK to leave that for future improvements, which would involve enum-ifying more error state as we've done with "enum unpack_loose_header_result" in preceding commits. In these increasingly more obscure cases the worst that can happen is that we'll get slightly nonsensical or inapplicable error messages. There's other such potential edge cases, all of which might produce some confusing messaging, but still be handled correctly as far as passing along errors goes. E.g. if check_object_signature() returns and oideq(real_oid, null_oid()) is true, which could happen if it returns -1 due to the read_istream() call having failed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:01 -07:00
Junio C Hamano	b1b065ee35	Merge branch 'rs/use-xopen-in-index-pack' Code clean-up. * rs/use-xopen-in-index-pack: index-pack: use xopen in init_thread	2021-09-23 13:44:50 -07:00
Junio C Hamano	67fc02be54	Merge branch 'ab/unbundle-progress' Add progress display to "git bundle unbundle". * ab/unbundle-progress: bundle: show progress on "unbundle" index-pack: add --progress-title option bundle API: change "flags" to be "extra_index_pack_args" bundle API: start writing API documentation	2021-09-20 15:20:42 -07:00
Junio C Hamano	a1af533323	Merge branch 'tb/pack-finalize-ordering' The order in which various files that make up a single (conceptual) packfile has been reevaluated and straightened up. This matters in correctness, as an incomplete set of files must not be shown to a running Git. * tb/pack-finalize-ordering: pack-objects: rename .idx files into place after .bitmap files pack-write: split up finish_tmp_packfile() function builtin/index-pack.c: move `.idx` files into place last index-pack: refactor renaming in final() builtin/repack.c: move `.idx` files into place last pack-write.c: rename `.idx` files after `*.rev` pack-write: refactor renaming in finish_tmp_packfile() bulk-checkin.c: store checksum directly pack.h: line-wrap the definition of finish_tmp_packfile()	2021-09-20 15:20:42 -07:00
René Scharfe	6346f704a0	index-pack: use xopen in init_thread Support an arbitrary file descriptor expression in the semantic patch for replacing open+die_errno with xopen, not just an identifier, and apply it. This makes the error message at the single affected place more consistent and reduces code duplication. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-10 14:22:50 -07:00
Taylor Blau	522a5c2cf5	builtin/index-pack.c: move `.idx` files into place last In a similar spirit as preceding patches to `git repack` and `git pack-objects`, fix the identical problem in `git index-pack`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-09 18:23:11 -07:00
Ævar Arnfjörð Bjarmason	8737dab346	index-pack: refactor renaming in final() Refactor the renaming in final() into a helper function, this is similar in spirit to a preceding refactoring of finish_tmp_packfile() in pack-write.c. Before `e37d0b8730` (builtin/index-pack.c: write reverse indexes, 2021-01-25) it probably wasn't worth it to have this sort of helper, due to the differing "else if" case for "pack" files v.s. "idx" files. But since we've got "rev" as well now, let's do the renaming via a helper, this is both a net decrease in lines, and improves the readability, since we can easily see at a glance that the logic for writing these three types of files is exactly the same, aside from the obviously differing cases of "*final_name" being NULL, and "make_read_only_if_same" being different. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-09 18:23:11 -07:00
Ævar Arnfjörð Bjarmason	f46c46e4f2	index-pack: add --progress-title option Add a --progress-title option to index-pack, when data is piped into index-pack its progress is a proxy for whatever's feeding it data. This option will allow us to set a more relevant progress bar title in "git bundle unbundle", and is also used in my "bundle-uri" RFC patches[1] by a new caller in fetch-pack.c. The code change in cmd_index_pack() won't handle "--progress-title=xyz", only "--progress-title xyz", and the "(i+1)" style (as opposed to "i + 1") is a bit odd. Not using the "--long-option=value" style is inconsistent with existing long options handled by cmd_index_pack(), but makes the code that needs to call it better (two strvec_push(), instead of needing a strvec_pushf()). Since the option is internal-only the inconsistency shouldn't matter. I'm copying the pattern to handle it as-is from the handling of the existing "-o" option in the same function, see `9cf6d3357a` (Add git-index-pack utility, 2005-10-12) for its addition. That's a short option, but the code to implement the two is the same in functionality and style. Eventually we'd like to migrate all of this this to parse_options(), which would make these differences in behavior go away. 1. https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 10:59:23 -07:00
René Scharfe	66e905b7dd	use xopen() to handle fatal open(2) failures Add and apply a semantic patch for using xopen() instead of calling open(2) and die() or die_errno() explicitly. This makes the error messages more consistent and shortens the code. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-25 14:39:08 -07:00
Ævar Arnfjörð Bjarmason	103e02c700	*.c static functions: don't forward-declare __attribute__ `9cf6d3357a` (Add git-index-pack utility, 2005-10-12) and `466dbc42f5` (receive-pack: Send internal errors over side-band #2, 2010-02-10) we added these static functions and forward-declared their __attribute__((printf)). I think this may have been to work around some compiler limitation at the time, but in any case we have a lot of code that uses the briefer way of declaring these that I'm using here, so if we had any such issues with compilers we'd have seen them already. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-12 12:09:53 -07:00
brian m. carlson	5951bf467e	Use the final_oid_fn to finalize hashing of object IDs When we're hashing a value which is going to be an object ID, we want to zero-pad that value if necessary. To do so, use the final_oid_fn instead of the final_fn anytime we're going to create an object ID to ensure we perform this operation. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:38 +09:00
brian m. carlson	92e2cab96b	Always use oidread to read into struct object_id In the future, we'll want oidread to automatically set the hash algorithm member for an object ID we read into it, so ensure we use oidread instead of hashcpy everywhere we're copying a hash value into a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:38 +09:00
Junio C Hamano	5644419d04	Merge branch 'ab/fsck-api-cleanup' Fsck API clean-up. * ab/fsck-api-cleanup: fetch-pack: use new fsck API to printing dangling submodules fetch-pack: use file-scope static struct for fsck_options fetch-pack: don't needlessly copy fsck_options fsck.c: move gitmodules_{found,done} into fsck_options fsck.c: add an fsck_set_msg_type() API that takes enums fsck.c: pass along the fsck_msg_id in the fsck_error callback fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from .c to .h fsck.c: give "FOREACH_MSG_ID" a more specific name fsck.c: undefine temporary STR macro after use fsck.c: call parse_msg_type() early in fsck_set_msg_type() fsck.h: re-order and re-assign "enum fsck_msg_type" fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" fsck.c: rename remaining fsck_msg_id "id" to "msg_id" fsck.c: remove (mostly) redundant append_msg_id() function fsck.c: rename variables in fsck_set_msg_type() for less confusion fsck.h: use "enum object_type" instead of "int" fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} fsck.c: refactor and rename common config callback	2021-04-07 16:54:09 -07:00
Ævar Arnfjörð Bjarmason	3745e2693d	fetch-pack: use new fsck API to printing dangling submodules Refactor the check added in `5476e1efde` (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to make use of us now passing the "msg_id" to the user defined "error_func". We can now compare against the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated message. Let's also replace register_found_gitmodules() with directly manipulating the "gitmodules_found" member. A recent commit moved it into "fsck_options" so we could do this here. I'm sticking this callback in fsck.c. Perhaps in the future we'd like to accumulate such callbacks into another file (maybe fsck-cb.c, similar to parse-options-cb.c?), but while we've got just the one let's just put it into fsck.c. A better alternative in this case would be some library some more obvious library shared by fetch-pack.c ad builtin/index-pack.c, but there isn't such a thing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	462f5cae0f	fetch-pack: don't needlessly copy fsck_options Change the behavior of the .gitmodules validation added in `5476e1efde` (fetch-pack: print and use dangling .gitmodules, 2021-02-22) so we're using one "fsck_options". I found that code confusing to read. One might think that not setting up the error_func earlier means that we're relying on the "error_func" not being set in some code in between the two hunks being modified here. But we're not, all we're doing in the rest of "cmd_index_pack()" is further setup by calling fsck_set_msg_types(), and assigning to do_fsck_object. So there was no reason in `5476e1efde` to make a shallow copy of the fsck_options struct before setting error_func. Let's just do this setup at the top of the function, along with the "walk" assignment. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	394d5d31b0	fsck.c: pass along the fsck_msg_id in the fsck_error callback Change the fsck_error callback to also pass along the fsck_msg_id. Before this change the only way to get the message id was to parse it back out of the "message". Let's pass it down explicitly for the benefit of callers that might want to use it, as discussed in [1]. Passing the msg_type is now redundant, as you can always get it back from the msg_id, but I'm not changing that convention. It's really common to need the msg_type, and the report() function itself (which calls "fsck_error") needs to call fsck_msg_type() to discover it. Let's not needlessly re-do that work in the user callback. 1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	1b32b59f9b	fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new fsck_msg_type enum. These defines were originally introduced in: - `ba002f3b28` (builtin-fsck: move common object checking code to fsck.c, 2008-02-25) - `f50c440730` (fsck: disallow demoting grave fsck errors to warnings, 2015-06-22) - `efaba7cc77` (fsck: optionally ignore specific fsck issues completely, 2015-06-22) - `f27d05b170` (fsck: allow upgrading fsck warnings to errors, 2015-06-22) The reason these were defined in two different places is because we use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are used by external callbacks. Untangling that would take some more work, since we expose the new "enum fsck_msg_type" to both. Similar to "enum object_type" it's not worth structuring the API in such a way that only those who need FSCK_{ERROR,WARN} pass around a different type. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	a1aad71601	fsck.h: use "enum object_type" instead of "int" Change the fsck_walk_func to use an "enum object_type" instead of an "int" type. The types are compatible, and ever since this was added in `355885d531` (add generic, type aware object chain walker, 2008-02-25) we've used entries from object_type (OBJ_BLOB etc.). So this doesn't really change anything as far as the generated code is concerned, it just gives the compiler more information and makes this easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Junio C Hamano	6ee353d42f	Merge branch 'jt/transfer-fsck-across-packs' The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args	2021-03-01 14:02:57 -08:00
Jonathan Tan	5476e1efde	fetch-pack: print and use dangling .gitmodules Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed. This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Taylor Blau	e8c58f894b	t: support GIT_TEST_WRITE_REV_INDEX Add a new option that unconditionally enables the pack.writeReverseIndex setting in order to run the whole test suite in a mode that generates on-disk reverse indexes. Additionally, enable this mode in the second run of tests under linux-gcc in 'ci/run-build-and-tests.sh'. Once on-disk reverse indexes are proven out over several releases, we can change the default value of that configuration to 'true', and drop this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:44 -08:00
Taylor Blau	e37d0b8730	builtin/index-pack.c: write reverse indexes Teach 'git index-pack' to optionally write and verify reverse index with '--[no-]rev-index', as well as respecting the 'pack.writeReverseIndex' configuration option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Taylor Blau	84d544943c	builtin/index-pack.c: allow stripping arbitrary extensions To derive the filename for a .idx file, 'git index-pack' uses derive_filename() to strip the '.pack' suffix and add the new suffix. Prepare for stripping off suffixes other than '.pack' by making the suffix to strip a parameter of derive_filename(). In order to make this consistent with the "suffix" parameter which does not begin with a ".", an additional check in derive_filename. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Martin Ågren	e5afd4449d	object-file.c: rename from sha1-file.c Drop the last remnant of "sha1" in this file and rename it to reflect that we're not just able to handle SHA-1 these days. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:01:55 -08:00
Jeff King	f86f769550	compute pack .idx byte offsets using size_t A pack and its matching .idx file are limited to 2^32 objects, because the pack format contains a 32-bit field to store the number of objects. Hence we use uint32_t in the code. But the byte count of even a .idx file can be much larger than that, because it stores at least a hash and an offset for each object. So using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650 objects. This confuses load_idx(), which computes the minimum size like this: unsigned long min_size = 8 + 4256 + nr(hashsz + 4 + 4) + hashsz + hashsz; Even though min_size will be big enough on most 64-bit platforms, the actual arithmetic is done as a uint32_t, resulting in a truncation. We actually exceed that min_size, but then we do: unsigned long max_size = min_size; if (nr) max_size += (nr - 1)8; to account for the variable-sized table. That computation doesn't overflow quite so low, but with the truncation for min_size, we end up with a max_size that is much smaller than our actual size. So we complain that the idx is invalid, and can't find any of its objects. We can fix this case by casting "nr" to a size_t, which will do the multiplication in 64-bits (assuming you're on a 64-bit platform; this will never work on a 32-bit system since we couldn't map the whole .idx anyway). Likewise, we don't have to worry about further additions, because adding a smaller number to a size_t will convert the other side to a size_t. A few notes: - obviously we could just declare "nr" as a size_t in the first place (and likewise, packed_git.num_objects). But it's conceptually a uint32_t because of the on-disk format, and we correctly treat it that way in other contexts that don't need to compute byte offsets (e.g., iterating over the set of objects should and generally does use a uint32_t). Switching to size_t would make all of those other cases look wrong. - it could be argued that the proper type is off_t to represent the file offset. But in practice the .idx file must fit within memory, because we mmap the whole thing. And the rest of the code (including the idx_size variable we're comparing against) uses size_t. - we'll add the same cast to the max_size arithmetic line. Even though we're adding to a larger type, which will convert our result, the multiplication is still done as a 32-bit value and can itself overflow. I didn't check this with my test case, since it would need an even larger pack (~530M objects), but looking at compiler output shows that it works this way. The standard should agree, but I couldn't find anything explicit in 6.3.1.8 ("usual arithmetic conversions"). The case in load_idx() was the most immediate one that I was able to trigger. After fixing it, looking up actual objects (including the very last one in sha1 order) works in a test repo with 153,725,110 objects. That's because bsearch_hash() works with uint32_t entry indices, and the actual byte access: int cmp = hashcmp(table + mi stride, sha1); is done with "stride" as a size_t, causing the uint32_t "mi" to be promoted to a size_t. This is the way most code will access the index data. However, I audited all of the other byte-wise accesses of packed_git.index_data, and many of the others are suspect (they are similar to the max_size one, where we are adding to a properly sized offset or directly to a pointer, but the multiplication in the sub-expression can overflow). I didn't trigger any of these in practice, but I believe they're potential problems, and certainly adding in the cast is not going to hurt anything here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-16 13:41:35 -08:00
Junio C Hamano	c7ac8c0a7c	Merge branch 'jk/index-pack-hotfixes' Hotfix and clean-up for the jt/threaded-index-pack topic that has graduated to v2.29-rc0. * jk/index-pack-hotfixes: index-pack: make get_base_data() comment clearer index-pack: drop type_cas mutex index-pack: restore "resolving deltas" progress meter	2020-10-08 21:53:26 -07:00
Jonathan Tan	ec6a8f9705	index-pack: make get_base_data() comment clearer A comment mentions that we may free cached delta bases via find_unresolved_deltas(), but that function went away in `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08). Since we need to rewrite that comment anyway, make the entire comment clearer. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-07 13:32:27 -07:00
Jeff King	bebe171947	index-pack: drop type_cas mutex The type_cas lock lost all of its callers in `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08), so we can safely delete it. The compiler didn't alert us that the variable became unused, because we still call pthread_mutex_init() and pthread_mutex_destroy() on it. It's worth considering also whether that commit was in error to remove the use of the lock. Why don't we need it now, if we did before, as described in `ab791dd138` (index-pack: fix race condition with duplicate bases, 2014-08-29)? I think the answer is that we now look at and assign the child_obj->real_type field in the main thread while holding the work_lock(). So we don't have to worry about racing with the worker threads. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-07 11:51:26 -07:00
Jeff King	cea69151a4	index-pack: restore "resolving deltas" progress meter Commit `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08) refactored the main loop in threaded_second_pass(), but also deleted the call to display_progress() at the top of the loop. This means that users typically see no progress at all during the delta resolution phase (and for large repositories, Git appears to hang). This looks like an accident that was unrelated to the intended change of that commit, since we continue to update nr_resolved_deltas in resolve_delta(). Let's restore the call to get that progress back. We'll also add a test that confirms we generate the expected progress. This isn't perfect, as it wouldn't catch a bug where progress was delayed to the end. That was probably possible to trigger when receiving a thin pack, because we'd eventually call display_progress() from fix_unresolved_deltas(), but only once after doing all the work. However, since our test case generates a complete pack, it reliably demonstrates this particular bug and its fix. And we can't do better without making the test racy. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-07 11:50:09 -07:00
Junio C Hamano	b7e65b51e5	Merge branch 'jt/threaded-index-pack' "git index-pack" learned to resolve deltified objects with greater parallelism. * jt/threaded-index-pack: index-pack: make quantum of work smaller index-pack: make resolve_delta() assume base data index-pack: calculate {ref,ofs}_{first,last} early index-pack: remove redundant child field index-pack: unify threaded and unthreaded code index-pack: remove redundant parameter Documentation: deltaBaseCacheLimit is per-thread	2020-09-22 12:36:28 -07:00
Jonathan Tan	f08cbf60fe	index-pack: make quantum of work smaller Currently, when index-pack resolves deltas, it does not split up delta trees into threads: each delta base root (an object that is not a REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on that root (direct or indirect) are processed in the same thread. This is a problem when a repository contains a large text file (thus, delta-able) that is modified many times - delta resolution time during fetching is dominated by processing the deltas corresponding to that text file. This patch contains a solution to that. When cloning using git -c core.deltabasecachelimit=1g clone \ https://fuchsia.googlesource.com/third_party/vulkan-cts on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads, which is the default). The solution is to have a global work stack. This stack contains delta bases (objects, whether appearing directly in the packfile or generated by delta resolution, that themselves have delta children) that need to be processed; whenever a thread needs work, it peeks at the top of the stack and processes its next unprocessed child. If a thread finds the stack empty, it will look for more delta base roots to push on the stack instead. The main weakness of having a global work stack is that more time is spent in the mutex, but profiling has shown that most time is spent in the resolution of the deltas themselves, so this shouldn't be an issue in practice. In any case, experimentation (as described in the clone command above) shows that this patch is a net improvement. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-08 15:52:17 -07:00
Jonathan Tan	ee6f058384	index-pack: make resolve_delta() assume base data A subsequent commit will make the quantum of work smaller, necessitating more locking. This commit allows resolve_delta() to be called outside the lock. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-24 14:14:52 -07:00
Jonathan Tan	b4718cae51	index-pack: calculate {ref,ofs}_{first,last} early This is refactoring 2 of 2 to simplify struct base_data. Whenever we make a struct base_data, immediately calculate its delta children. This eliminates confusion as to when the {ref,ofs}_{first,last} fields are initialized. Before this patch, the delta children were calculated at the last possible moment. This allowed the members of struct base_data to be populated in any order, superficially useful when we have the object contents before the struct object_entry. But this makes reasoning about the state of struct base_data more complicated, hence this patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-24 14:12:58 -07:00
Jonathan Tan	a7f7e84a49	index-pack: remove redundant child field This is refactoring 1 of 2 to simplify struct base_data. In index-pack, each thread maintains a doubly-linked list of the delta chain that it is currently processing (the "base" and "child" pointers in struct base_data). When a thread exceeds the delta base cache limit and needs to reclaim memory, it uses the "child" pointers to traverse the lineage, reclaiming the memory of the eldest delta bases first. A subsequent patch will perform memory reclaiming in a different way and will thus no longer need the "child" pointer. Because the "child" pointer is redundant even now, remove it so that the aforementioned subsequent patch will be clearer. In the meantime, reclaim memory in the reverse order of the "base" pointers. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-24 14:11:14 -07:00
Jonathan Tan	46e6fb1e44	index-pack: unify threaded and unthreaded code Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-24 14:02:31 -07:00
Jonathan Tan	fc968e26c2	index-pack: remove redundant parameter find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but the object type is already present in the name of the function. Remove that parameter from these functions. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-24 13:55:57 -07:00
Jeff King	fbff95b67f	index-pack: adjust default threading cap Commit `b8a2486f15` (index-pack: support multithreaded delta resolving, 2012-05-06) describes an experiment that shows that setting the number of threads for index-pack higher than 3 does not help. I repeated that experiment using a more modern version of Git and a more modern CPU and got different results. Here are timings for p5302 against linux.git run on my laptop, a Core i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16): 5302.3: index-pack 0 threads 256.28(253.41+2.79) 5302.4: index-pack 1 threads 257.03(254.03+2.91) 5302.5: index-pack 2 threads 149.39(268.34+3.06) 5302.6: index-pack 4 threads 94.96(294.10+3.23) 5302.7: index-pack 8 threads 68.12(339.26+3.89) 5302.8: index-pack 16 threads 70.90(655.03+7.21) 5302.9: index-pack default number of threads 116.91(290.05+3.21) You can see that wall-clock times continue to improve dramatically up to the number of cores, but bumping beyond that (into hyperthreading territory) does not help (and in fact hurts a little). Here's the same experiment on a machine with dual Xeon 6230's, totaling 40 cores (80 with hyperthreading): 5302.3: index-pack 0 threads 310.04(302.73+6.90) 5302.4: index-pack 1 threads 310.55(302.68+7.40) 5302.5: index-pack 2 threads 178.17(304.89+8.20) 5302.6: index-pack 5 threads 99.53(315.54+9.56) 5302.7: index-pack 10 threads 72.80(327.37+12.79) 5302.8: index-pack 20 threads 60.68(357.74+21.66) 5302.9: index-pack 40 threads 58.07(454.44+67.96) 5302.10: index-pack 80 threads 59.81(720.45+334.52) 5302.11: index-pack default number of threads 134.18(309.32+7.98) The results are similar; things stop improving at 40 threads. Curiously, going from 20 to 40 really doesn't help much, either (and increases CPU time considerably). So that may represent an actual barrier to parallelism, where we lose out due to context-switching and loss of cache locality, but don't reap the wall-clock benefits due to contention of our coarse-grained locks. So what's a good default value? It's clear that the current cap of 3 is too low; our default values are 42% and 57% slower than the best times on each machine. The results on the 40-core machine imply that 20 threads is an actual barrier regardless of the number of cores, so we'll take that as a maximum. We get the best results on these machines at half of the online-cpus value. That's presumably a result of the hyperthreading. That's common on multi-core Intel processors, but not necessarily elsewhere. But if we take it as an assumption, we can perform optimally on hyperthreaded machines and still do much better than the status quo on other machines, as long as we never half below the current value of 3. So that's what this patch does. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-21 12:02:36 -07:00
brian m. carlson	586740aa6e	builtin/index-pack: add option to specify hash algorithm git index-pack is usually run in a repository, but need not be. Since packs don't contains information on the algorithm in use, instead relying on context, add an option to index-pack to tell it which one we're using in case someone runs it outside of a repository. Since using --stdin necessarily implies a repository, don't allow specifying an object format if it's provided to prevent users from passing an option that won't work. Add documentation for this option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-19 14:04:08 -07:00
brian m. carlson	629dffc461	packfile: compute and use the index CRC offset Both v2 pack index files and the v3 format specified as part of the NewHash work have similar data starting at the CRC table. Much of the existing code wants to read either this table or the offset entries following it, and in doing so computes the offset each time. In order to share as much code between v2 and v3, compute the offset of the CRC table and store it when the pack is opened. Use this value to compute offsets to not only the CRC table, but to the offset entries beyond it. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-27 10:07:07 -07:00
Jonathan Tan	db7ed7418b	promisor-remote: accept 0 as oid_nr in function There are 3 callers to promisor_remote_get_direct() that first check if the number of objects to be fetched is equal to 0. Fold that check into promisor_remote_get_direct(), and in doing so, be explicit as to what promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success, immediately). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-02 12:42:32 -07:00
Junio C Hamano	7b029ebaef	Merge branch 'jk/index-pack-dupfix' The index-pack code now diagnoses a bad input packstream that records the same object twice when it is used as delta base; the code used to declare a software bug when encountering such an input, but it is an input error. * jk/index-pack-dupfix: index-pack: downgrade twice-resolved REF_DELTA to die()	2020-02-14 12:54:24 -08:00
Jeff King	a21781011f	index-pack: downgrade twice-resolved REF_DELTA to die() When we're resolving a REF_DELTA, we compare-and-swap its type from REF_DELTA to whatever real type the base object has, as discussed in `ab791dd138` (index-pack: fix race condition with duplicate bases, 2014-08-29). If the old type wasn't a REF_DELTA, we consider that a BUG(). But as discussed in that commit, we might see this case whenever we try to resolve an object twice, which may happen because we have multiple copies of the base object. So this isn't a bug at all, but rather a sign that the input pack is broken. And indeed, this case is triggered already in t5309.5 and t5309.6, which create packs with delta cycles and duplicate bases. But we never noticed because those tests are marked expect_failure. Those tests were added by `b2ef3d9ebb` (test index-pack on packs with recoverable delta cycles, 2013-08-23), which was leaving the door open for cases that we theoretically _could_ handle. And when we see an already-resolved object like this, in theory we could keep going after confirming that the previously resolved child->real_type matches base->obj->real_type. But: - enforcing the "only resolve once" rule here saves us from an infinite loop in other parts of the code. If we keep going, then the delta cycle in t5309.5 causes us to loop infinitely, as find_ref_delta_children() doesn't realize which objects have already been resolved. So there would be more changes needed to make this case work, and in the meantime we'd be worse off. - any pack that triggers this is broken anyway. It either has a duplicate base object, or it has a cycle which causes us to bring in a duplicate via --fix-thin. In either case, we'd end up rejecting the pack in write_idx_file(), which also detects duplicates. So the tests have little value in documenting what we _could_ be doing (and have been neglected for 6+ years). Let's switch them to confirming that we handle this case cleanly (and switch out the BUG() for a more informative die() so that we do so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-04 13:19:11 -08:00
Matheus Tavares	b98d188581	sha1-file: allow check_object_signature() to handle any repo Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Matheus Tavares	2dcde20e1c	sha1-file: pass git_hash_algo to hash_object_file() Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Matheus Tavares	c8123e72f6	streaming: allow open_istream() to handle any repo Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Junio C Hamano	676278f8ea	Merge branch 'bc/object-id-part17' Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...	2019-10-11 14:24:46 +09:00
brian m. carlson	69fa337060	builtin/index-pack: replace sha1_to_hex Since sha1_to_hex is limited to SHA-1, replace it with hash_to_hex so this code works with other algorithms. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 15:04:59 -07:00
Christian Couder	b14ed5adaf	Use promisor_remote_get_direct() and has_promisor_remote() Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-25 14:05:37 -07:00
Jonathan Tan	8a30a1efd1	index-pack: prefetch missing REF_DELTA bases When fetching, the client sends "have" commit IDs indicating that the server does not need to send any object referenced by those commits, reducing network I/O. When the client is a partial clone, the client still sends "have"s in this way, even if it does not have every object referenced by a commit it sent as "have". If a server omits such an object, it is fine: the client could lazily fetch that object before this fetch, and it can still do so after. The issue is when the server sends a thin pack containing an object that is a REF_DELTA against such a missing object: index-pack fails to fix the thin pack. When support for lazily fetching missing objects was added in `8b4c0103a9` ("sha1_file: support lazily fetching missing objects", 2017-12-08), support in index-pack was turned off in the belief that it accesses the repo only to do hash collision checks. However, this is not true: it also needs to access the repo to resolve REF_DELTA bases. Support for lazy fetching should still generally be turned off in index-pack because it is used as part of the lazy fetching process itself (if not, infinite loops may occur), but we do need to fetch the REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that those are REF_DELTA themselves, because we do not send "have" when making such fetches.) To resolve this, prefetch all missing REF_DELTA bases before attempting to resolve them. This both ensures that all bases are attempted to be fetched, and ensures that we make only one request per index-pack invocation, and not one request per missing object. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-15 11:01:40 +09:00
SZEDER Gábor	79e3aa6624	index-pack: show progress while checking objects When 'git index-pack' is run by 'git clone', its check_objects() function usually doesn't take long enough to be a concern, but I just run into a situation where it took about a minute or so: I inadvertently put some memory pressure on my tiny laptop while cloning linux.git, and then there was quite a long silence between the "Resolving deltas" and "Checking connectivity" progress bars. Show a progress bar during the loop of check_objects() to let the user know that something is still going on. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 18:08:05 +09:00
Jeff King	98374a07c9	convert has_sha1_file() callers to has_object_file() The only remaining callers of has_sha1_file() actually have an object_id already. They can use the "object" variant, rather than dereferencing the hash themselves. The code changes here were completely generated by the included coccinelle patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-08 09:41:06 -08:00
Junio C Hamano	f5f0f68d61	Merge branch 'tb/print-size-t-with-uintmax-format' Code preparation to replace ulong vars with size_t vars where appropriate. * tb/print-size-t-with-uintmax-format: Upcast size_t variables to uintmax_t when printing	2018-11-19 16:24:41 +09:00
Torsten Bögershausen	ca473cef91	Upcast size_t variables to uintmax_t when printing When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 16:43:52 +09:00

1 2 3 4 5 ...

346 Commits (d39f04b638f7f862efebb5bf028bad50f6aa9e28)