Commit Graph

381 Commits (2bc5414c411aab33c155b1070b7764ef6a49a02d)

Author SHA1 Message Date
Junio C Hamano 2bc5414c41 Merge branch 'ps/parse-options-integers'
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.

* ps/parse-options-integers:
  parse-options: detect mismatches in integer signedness
  parse-options: introduce precision handling for `OPTION_UNSIGNED`
  parse-options: introduce precision handling for `OPTION_INTEGER`
  parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
  parse-options: support unit factors in `OPT_INTEGER()`
  global: use designated initializers for options
  parse: fix off-by-one for minimum signed values
2025-04-24 17:25:34 -07:00
Patrick Steinhardt 791aeddfa2 parse-options: detect mismatches in integer signedness
It was reported that "t5620-backfill.sh" fails on s390x and sparc64 in a
test that exercises the "--min-batch-size" command line option. The
symptom was that the option didn't seem to have an effect: we didn't
fetch objects with a batch size of 20, but instead fetched all objects
at once.

As it turns out, the root cause is that `--min-batch-size` uses
`OPT_INTEGER()` to parse the command line option. While this macro
expects the caller to pass a pointer to an integer, we instead pass a
pointer to a `size_t`. This coincidentally works on most platforms, but
it breaks apart on the mentioned platforms because they are big endian.

This issue isn't specific to git-backfill(1): there are a couple of
other places where we have the same type confusion going on. This
indicates that the issue really is the interface that the parse-options
subsystem provides -- it is simply too easy to get this wrong as there
isn't any kind of compiler warning, and things just work on the most
common systems.

Address the systemic issue by introducing two new build asserts
`BARF_UNLESS_SIGNED()` and `BARF_UNLESS_UNSIGNED()`. As the names
already hint at, those macros will cause a compiler error when passed a
value that is not signed or unsigned, respectively.

Adapt `OPT_INTEGER()`, `OPT_UNSIGNED()` as well as `OPT_MAGNITUDE()` to
use those asserts. This uncovers a small set of sites where we indeed
have the same bug as in git-backfill(1). Adapt all of them to use the
correct option.

Reported-by: Todd Zullinger <tmz@pobox.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:16 -07:00
Patrick Steinhardt d012ceb5f3 global: use designated initializers for options
While we expose macros for most of our different option types understood
by the "parse-options" subsystem, not every combination of fields that
has one as that would otherwise quickly lead to an explosion of macros.
Instead, we just initialize structures manually for those variants of
fields that don't have a macro.

Callsites that open-code these structure initialization don't use
designated initializers though and instead just provide values for each
of the fields that they want to initialize. This has three significant
downsides:

  - Callsites need to specify all values up to the last field that they
    care about. This often includes fields that should simply be left at
    their default zero-initialized state, which adds distraction.

  - Any reader not deeply familiar with the layout of the structure
    has a hard time figuring out what the respective initializers mean.

  - Reordering or introducing new fields in the middle of the structure
    is impossible without adapting all callsites.

Convert all sites to instead use designated initializers, which we have
started using in our codebase quite a while ago. This allows us to skip
any default-initialized fields, gives the reader context by specifying
the field names and allows us to reorder or introduce new fields where
we want to.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:15 -07:00
Patrick Steinhardt 68cd492a3e object-store: merge "object-store-ll.h" and "object-store.h"
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:37 -07:00
Patrick Steinhardt 7d70b29c4f hash: stop depending on `the_repository` in `null_oid()`
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository->hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:20 -07:00
Patrick Steinhardt 74d414c9f1 object: stop depending on `the_repository`
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:18 -07:00
Patrick Steinhardt 59b6131a67 pager: stop using `the_repository`
Stop using `the_repository` in the "pager" subsystem by passing in a
repository when setting up the pager and when configuring it.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Patrick Steinhardt 41f43b8243 global: mark code units that generate warnings with `-Wsign-compare`
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
Patrick Steinhardt 43fedde3df builtin/grep: fix leak with `--max-count=0`
When executing with `--max-count=0` we'll return early from git-grep(1)
without performing any cleanup, which causes memory leaks. Plug these.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:52 -08:00
Junio C Hamano b8e318ea58 Merge branch 'jc/pass-repo-to-builtins'
The convention to calling into built-in command implementation has
been updated to pass the repository, if known, together with the
prefix value.

* jc/pass-repo-to-builtins:
  add: pass in repo variable instead of global the_repository
  builtin: remove USE_THE_REPOSITORY for those without the_repository
  builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h
  builtin: add a repository parameter for builtin functions
2024-09-23 10:35:09 -07:00
John Cai 03eae9afb4 builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h
Instead of including USE_THE_REPOSITORY_VARIABLE by default on every
builtin, remove it from builtin.h and add it to all the builtins that
include builtin.h (by definition, that means all builtins/*.c).

Also, remove the include statement for repository.h since it gets
brought in through builtin.h.

The next step will be to migrate each builtin
from having to use the_repository.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-13 14:32:24 -07:00
John Cai 9b1cb5070f builtin: add a repository parameter for builtin functions
In order to reduce the usage of the global the_repository, add a
parameter to builtin functions that will get passed a repository
variable.

This commit uses UNUSED on most of the builtin functions, as subsequent
commits will modify the actual builtins to pass the repository parameter
down.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-13 14:27:08 -07:00
Patrick Steinhardt ee087c29c8 builtin/grep: fix leaking object context
Even when `get_oid_with_context()` fails it may have allocated some data
in the object context. But we do not release it in git-grep(1) when the
call fails, leading to a memory leak. Plug it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05 08:49:12 -07:00
Patrick Steinhardt f87c55c264 object-name: free leaking object contexts
While it is documented in `struct object_context::path` that this
variable needs to be released by the caller, this fact is rather easy to
miss given that we do not ever provide a function to release the object
context. And of course, while some callers dutifully release the path,
many others don't.

Introduce a new `object_context_release()` function that releases the
path. Convert callsites that used to free the path to use that new
function and add missing calls to callsites that were leaking memory.
Refactor those callsites as required to have a single return path, only.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11 13:15:05 -07:00
Junio C Hamano 1002f28a52 Merge branch 'eb/hash-transition'
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
2024-03-28 14:13:50 -07:00
Junio C Hamano c59ba68ea7 Merge branch 'js/check-null-from-read-object-file'
The code paths that call repo_read_object_file() have been
tightened to react to errors.

* js/check-null-from-read-object-file:
  Always check the return value of `repo_read_object_file()`
2024-02-14 15:36:06 -08:00
Johannes Schindelin 568459bf5e Always check the return value of `repo_read_object_file()`
There are a couple of places in Git's source code where the return value
is not checked. As a consequence, they are susceptible to segmentation
faults.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06 10:42:28 -08:00
Elijah Newren d57c671a51 treewide: remove unnecessary includes in source files
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:33 -08:00
Elijah Newren eea0e59ffb treewide: remove unnecessary includes in source files
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:31 -08:00
Junio C Hamano 2d2cd0a1bc Merge branch 'jc/grep-f-relative-to-cwd'
"cd sub && git grep -f patterns" tried to read "patterns" file at
the top level of the working tree; it has been corrected to read
"sub/patterns" instead.

* jc/grep-f-relative-to-cwd:
  grep: -f <path> is relative to $cwd
2023-11-07 10:26:43 +09:00
Junio C Hamano 5b2424b658 grep: -f <path> is relative to $cwd
Just like OPT_FILENAME() does, "git grep -f <path>" should treat
the <path> relative to the original $cwd by paying attention to the
prefix the command is given.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-12 10:41:59 -07:00
Eric W. Biederman efed687edc tree-walk: init_tree_desc take an oid to get the hash algorithm
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 14:57:40 -07:00
Junio C Hamano 921a713d66 Merge branch 'rs/grep-no-no-or'
"git grep -e A --no-or -e B" is accepted, even though the negation
of "or" did not mean anything, which has been tightened.

* rs/grep-no-no-or:
  grep: reject --no-or
2023-09-18 13:53:13 -07:00
René Scharfe aae8558b10 grep: reject --no-or
Since 3e230fa1b2 (grep: use parseopt, 2009-05-07) git grep has been
accepting the option --no-or.  It does the same as --or: nothing.
That's confusing and unintended.  Forbid negating --or.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-07 13:35:07 -07:00
René Scharfe 2a63c79dae grep: use OPT_INTEGER_F for --max-depth
a91f453f64 (grep: Add --max-depth option., 2009-07-22) added the option
--max-depth, defining it using a positional struct option initializer of
type OPTION_INTEGER.  It also sets defval to 1 for some reason, but that
value would only be used if the flag PARSE_OPT_OPTARG was given.

Use the macro OPT_INTEGER_F instead to standardize the definition and
specify only the necessary values.  This also normalizes argh to N_("n")
as a side-effect, which is OK.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05 14:59:26 -07:00
Junio C Hamano fea92e4cac Merge branch 'jc/tree-walk-drop-base-offset'
Code simplification.

* jc/tree-walk-drop-base-offset:
  tree-walk: drop unused base_offset from do_match()
  tree-walk: lose base_offset that is never used in tree_entry_interesting
2023-08-02 09:37:23 -07:00
Junio C Hamano ce481ac8b3 Merge branch 'cw/compat-util-header-cleanup'
Further shuffling of declarations across header files to streamline
file dependencies.

* cw/compat-util-header-cleanup:
  git-compat-util: move alloc macros to git-compat-util.h
  treewide: remove unnecessary includes for wrapper.h
  kwset: move translation table from ctype
  sane-ctype.h: create header for sane-ctype macros
  git-compat-util: move wrapper.c funcs to its header
  git-compat-util: move strbuf.c funcs to its header
2023-07-17 11:30:42 -07:00
Junio C Hamano 0ad927e9e0 tree-walk: lose base_offset that is never used in tree_entry_interesting
The tree_entry_interesting() function takes base_offset, allowing
its callers to potentially pass a non-zero number to skip the early
part of the path string.

The feature is never exercised and we do not even know what bugs are
lurking there, as all callers pass 0 to the parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-07 15:27:28 -07:00
Junio C Hamano b3d1c85d48 Merge branch 'gc/config-context'
Reduce reliance on a global state in the config reading API.

* gc/config-context:
  config: pass source to config_parser_event_fn_t
  config: add kvi.path, use it to evaluate includes
  config.c: remove config_reader from configsets
  config: pass kvi to die_bad_number()
  trace2: plumb config kvi
  config.c: pass ctx with CLI config
  config: pass ctx with config files
  config.c: pass ctx in configsets
  config: add ctx arg to config_fn_t
  urlmatch.h: use config_fn_t type
  config: inline git_color_default_config
2023-07-06 11:54:48 -07:00
Calvin Wan 91c080dff5 git-compat-util: move alloc macros to git-compat-util.h
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05 11:42:31 -07:00
Glen Choo 8868b1ebfb config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.

In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.

Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.

The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 14:06:40 -07:00
Glen Choo a4e7e317f8 config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).

In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.

Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:

- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed

Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.

The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:

- trace2/tr2_cfg.c:tr2_cfg_set_fl()

  This is indirectly called by git_config_set() so that the trace2
  machinery can notice the new config values and update its settings
  using the tr2 config parsing function, i.e. tr2_cfg_cb().

- builtin/checkout.c:checkout_main()

  This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
  This might be worth refactoring away in the future, since
  git_xmerge_config() can call git_default_config(), which can do much
  more than just parsing.

Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 14:06:39 -07:00
Glen Choo 97eeeea2dc config: inline git_color_default_config
git_color_default_config() is a shorthand for calling two other config
callbacks. There are no other non-static functions that do this and it
will complicate our refactoring of config_fn_t so inline it instead.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 14:06:38 -07:00
Elijah Newren a034e9106f object-store-ll.h: split this header out of object-store.h
The vast majority of files including object-store.h did not need dir.h
nor khash.h.  Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.

After this patch:
    $ git grep -h include..object-store | sort | uniq -c
          2 #include "object-store.h"
        129 #include "object-store-ll.h"

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:54 -07:00
Elijah Newren c339932bd8 repository: remove unnecessary include of path.h
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren bc5c5ec044 cache.h: remove this no-longer-used header
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.

Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first).  This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren 08c46a499a read-cache*.h: move declarations for read-cache.c functions from cache.h
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h.  Also move some related inline
functions from cache.h to read-cache.h.  The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren ca4eed708d pager.h: move declarations for pager.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren 87bed17907 object-file.h: move declarations for object-file.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren dabab1d6e6 object-name.h: move declarations for object-name.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Junio C Hamano e7dca80692 Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04 08:25:52 -07:00
Ævar Arnfjörð Bjarmason bc726bd075 cocci: apply the "object-store.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"object-store.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Elijah Newren d48be35ca6 write-or-die.h: move declarations for write-or-die.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren e38da487cc setup.h: move declarations for setup.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren f394e093df treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
Elijah Newren 41771fa435 cache.h: remove dependence on hex.h; make other files include it explicitly
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
Elijah Newren 36bf195890 alloc.h: move ALLOC_GROW() functions from cache.h
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places.  It does mean that we also need to add
includes of alloc.h in a number of C files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
Ævar Arnfjörð Bjarmason 8f56511945 builtin/{grep,log}.: don't define "USE_THE_INDEX_COMPATIBILITY_MACROS"
Adding "USE_THE_INDEX_COMPATIBILITY_MACROS" to these two appears to
have been unnecessary from the start, as going back and compiling
f8adbec9fe (cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch,
2019-01-24) without that addition works.

Let's not have these ask for the compatibility macros from cache.h
that they don't need.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21 12:06:14 +09:00
Shaoxuan Yuan 7cae7627c4 builtin/grep.c: integrate with sparse index
Turn on sparse index and remove ensure_full_index().

Before this patch, `git-grep` utilizes the ensure_full_index() method to
expand the index and search all the entries. Because this method
requires walking all the trees and constructing the index, it is the
slow part within the whole command.

To achieve better performance, this patch uses grep_tree() to search the
sparse directory entries and get rid of the ensure_full_index() method.

Why grep_tree() is a better choice over ensure_full_index()?

1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks
   into every sparse-directory entry (represented by a tree) recursively
   when looping over the index, and the result of doing so matches the
   result of expanding the index.

2) grep_tree() utilizes pathspecs to limit the scope of searching.
   ensure_full_index() always expands the index, which means it will
   always walk all the trees and blobs in the repo without caring if
   the user only wants a subset of the content, i.e. using a pathspec.
   On the other hand, grep_tree() will only search the contents that
   match the pathspec, and thus possibly walking fewer trees.

3) grep_tree() does not construct and copy back a new index, while
   ensure_full_index() does. This also saves some time.

----------------
Performance test

- Summary:

p2000 tests demonstrate a ~71% execution time reduction for
`git grep --cached bogus -- "f2/f1/f1/*"` using tree-walking logic.
However, notice that this result varies depending on the pathspec
given. See below "Command used for testing" for more details.

Test                              HEAD~   HEAD
-------------------------------------------------------
2000.78: git grep ... (full-v3)   0.35    0.39 (≈)
2000.79: git grep ... (full-v4)   0.36    0.30 (≈)
2000.80: git grep ... (sparse-v3) 0.88    0.23 (-73.8%)
2000.81: git grep ... (sparse-v4) 0.83    0.26 (-68.6%)

- Command used for testing:

	git grep --cached bogus -- "f2/f1/f1/*"

The reason for specifying a pathspec is that, if we don't specify a
pathspec, then grep_tree() will walk all the trees and blobs to find the
pattern, and the time consumed doing so is not too different from using
the original ensure_full_index() method, which also spends most of the
time walking trees. However, when a pathspec is specified, this latest
logic will only walk the area of trees enclosed by the pathspec, and the
time consumed is reasonably a lot less.

Generally speaking, because the performance gain is acheived by walking
less trees, which are specified by the pathspec, the HEAD time v.s.
HEAD~ time in sparse-v[3|4], should be proportional to
"pathspec enclosed area" v.s. "all area", respectively. Namely, the
wider the <pathspec> is encompassing, the less the performance
difference between HEAD~ and HEAD, and vice versa.

That is, if we don't specify a pathspec, the performance difference [1]
is indistinguishable: both methods walk all the trees and take generally
same amount of time (even with the index construction time included for
ensure_full_index()).

[1] Performance test result without pathspec (hence walking all trees):

	Command used:

		git grep --cached bogus

	Test                                HEAD~  HEAD
	---------------------------------------------------
	2000.78: git grep ... (full-v3)     6.17   5.19 (≈)
	2000.79: git grep ... (full-v4)     6.19   5.46 (≈)
	2000.80: git grep ... (sparse-v3)   6.57   6.44 (≈)
	2000.81: git grep ... (sparse-v4)   6.65   6.28 (≈)

--------------------------
NEEDSWORK about submodules

There are a few NEEDSWORKs that belong to improvements beyond this
topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for
more context. The other two NEEDSWORKs in t1092 are also relative.

Suggested-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-23 09:41:27 -07:00
Carlos López 68437ede53 grep: add --max-count command line option
This patch adds a command line option analogous to that of GNU
grep(1)'s -m / --max-count, which users might already be used to.
This makes it possible to limit the amount of matches shown in the
output while keeping the functionality of other options such as -C
(show code context) or -p (show containing function), which would be
difficult to do with a shell pipeline (e.g. head(1)).

Signed-off-by: Carlos López 00xc@protonmail.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 13:23:29 -07:00