Commit Graph

319 Commits (1c4411cce1bfc515cba4c1ca63b7d616e023b5e1)

Author SHA1 Message Date
Junio C Hamano 8b28e2e2e4 Merge branch 'ds/midx-normalize-pathname-before-comparison'
The path taken by "git multi-pack-index" command from the end user
was compared with path internally prepared by the tool withut first
normalizing, which lead to duplicated paths not being noticed,
which has been corrected.

* ds/midx-normalize-pathname-before-comparison:
  cache: use const char * for get_object_directory()
  multi-pack-index: use --object-dir real path
  midx: use real paths in lookup_multi_pack_index()
2022-05-04 09:51:29 -07:00
Derrick Stolee 11f9e8de3d cache: use const char * for get_object_directory()
The get_object_directory() method returns the exact string stored at
the_repository->objects->odb->path. The return type of "char *" implies
that the caller must keep track of the buffer and free() it when
complete. This causes significant problems later when the ODB is
accessed.

Use "const char *" as the return type to avoid this confusion. There are
no current callers that care about the non-const definition.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-25 11:31:13 -07:00
Junio C Hamano 439c1e6d5d Merge branch 'jh/builtin-fsmonitor-part2'
Built-in fsmonitor (part 2).

* jh/builtin-fsmonitor-part2: (30 commits)
  t7527: test status with untracked-cache and fsmonitor--daemon
  fsmonitor: force update index after large responses
  fsmonitor--daemon: use a cookie file to sync with file system
  fsmonitor--daemon: periodically truncate list of modified files
  t/perf/p7519: add fsmonitor--daemon test cases
  t/perf/p7519: speed up test on Windows
  t/perf/p7519: fix coding style
  t/helper/test-chmtime: skip directories on Windows
  t/perf: avoid copying builtin fsmonitor files into test repo
  t7527: create test for fsmonitor--daemon
  t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon
  help: include fsmonitor--daemon feature flag in version info
  fsmonitor--daemon: implement handle_client callback
  compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS
  compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent
  compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows
  fsmonitor--daemon: create token-based changed path cache
  fsmonitor--daemon: define token-ids
  fsmonitor--daemon: add pathname classification
  fsmonitor--daemon: implement 'start' command
  ...
2022-04-04 10:56:24 -07:00
Junio C Hamano eb804cd405 Merge branch 'ns/core-fsyncmethod'
Replace core.fsyncObjectFiles with two new configuration variables,
core.fsync and core.fsyncMethod.

* ns/core-fsyncmethod:
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-25 16:38:24 -07:00
Jeff Hostetler 1e0ea5c431 fsmonitor: config settings are repository-specific
Move fsmonitor config settings to a new and opaque
`struct fsmonitor_settings` structure.  Add a lazily-loaded pointer
to this into `struct repo_settings`

Create an `enum fsmonitor_mode` type in `struct fsmonitor_settings` to
represent the state of fsmonitor.  This lets us represent which, if
any, fsmonitor provider (hook or IPC) is enabled.

Create `fsm_settings__get_*()` getters to lazily look up fsmonitor-
related config settings.

Get rid of the `core_fsmonitor` global variable.  Move the code to
lookup the existing `core.fsmonitor` config value into the fsmonitor
settings.

Create a hook pathname variable in `struct fsmonitor-settings` and
only set it when in hook mode.

Extend the definition of `core.fsmonitor` to be either a boolean
or a hook pathname.  When true, the builtin FSMonitor is used.
When false or unset, no FSMonitor (neither builtin nor hook) is
used.

The existing `core_fsmonitor` global variable was used to store the
pathname to the fsmonitor hook *and* it was used as a boolean to see
if fsmonitor was enabled.  This dual usage and global visibility leads
to confusion when we add the IPC-based provider.  So lets hide the
details in fsmonitor-settings.c and let it decide which provider to
use in the case of multiple settings.  This avoids cluttering up
repo-settings.c with these private details.

A future commit in builtin-fsmonitor series will add the ability to
disqualify worktrees for various reasons, such as being mounted from a
remote volume, where fsmonitor should not be started.  Having the
config settings hidden in fsmonitor-settings.c allows such worktree
restrictions to override the config values used.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-25 16:04:15 -07:00
Neeraj Singh 844a8ad4f8 core.fsync: add configuration parsing
This change introduces code to parse the core.fsync setting and
configure the fsync_components variable.

core.fsync is configured as a comma-separated list of component names to
sync. Each time a core.fsync variable is encountered in the
configuration heirarchy, we start off with a clean state with the
platform default value. Passing 'none' resets the value to indicate
nothing will be synced. We gather all negative and positive entries from
the comma separated list and then compute the new value by removing all
the negative entries and adding all of the positive entries.

We issue a warning for components that are not recognized so that the
configuration code is compatible with configs from future versions of
Git with more repo components.

Complete documentation for the new setting is included in a later patch
in the series so that it can be reviewed once in final form.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
Neeraj Singh 020406eaa5 core.fsync: introduce granular fsync control infrastructure
This commit introduces the infrastructure for the core.fsync
configuration knob. The repository components we want to sync
are identified by flags so that we can turn on or off syncing
for specific components.

If core.fsyncObjectFiles is set and the core.fsync configuration
also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any
loose objects. This picks the strictest data integrity behavior
if core.fsync and core.fsyncObjectFiles are set to conflicting values.

This change introduces the currently unused fsync_component
helper, which will be used by a later patch that adds fsyncing to
the refs backend.

Actual configuration and documentation of the fsync components
list are in other patches in the series to separate review of
the underlying mechanism from the policy of how it's configured.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
Neeraj Singh abf38abec2 core.fsyncmethod: add writeout-only mode
This commit introduces the `core.fsyncMethod` configuration
knob, which can currently be set to `fsync` or `writeout-only`.

The new writeout-only mode attempts to tell the operating system to
flush its in-memory page cache to the storage hardware without issuing a
CACHE_FLUSH command to the storage controller.

Writeout-only fsync is significantly faster than a vanilla fsync on
common hardware, since data is written to a disk-side cache rather than
all the way to a durable medium. Later changes in this patch series will
take advantage of this primitive to implement batching of hardware
flushes.

When git_fsync is called with FSYNC_WRITEOUT_ONLY, it may fail and the
caller is expected to do an ordinary fsync as needed.

On Apple platforms, the fsync system call does not issue a CACHE_FLUSH
directive to the storage controller. This change updates fsync to do
fcntl(F_FULLFSYNC) to make fsync actually durable. We maintain parity
with existing behavior on Apple platforms by setting the default value
of the new core.fsyncMethod option.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10 15:10:22 -08:00
Elijah Newren ecc7c8841d repo_read_index: add config to expect files outside sparse patterns
Typically with sparse checkouts, we expect files outside the sparsity
patterns to be marked as SKIP_WORKTREE and be missing from the working
tree.  Sometimes this expectation would be violated however; including
in cases such as:
  * users grabbing files from elsewhere and writing them to the worktree
    (perhaps by editing a cached copy in an editor, copying/renaming, or
     even untarring)
  * various git commands having incomplete or no support for the
    SKIP_WORKTREE bit[1,2]
  * users attempting to "abort" a sparse-checkout operation with a
    not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
    working tree is not atomic)[3].
When the SKIP_WORKTREE bit in the index did not reflect the presence of
the file in the working tree, it traditionally caused confusion and was
difficult to detect and recover from.  So, in a sparse checkout, since
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE
bit at index read time for entries corresponding to files that are
present in the working tree.

There is another workflow, however, where it is expected that paths
outside the sparsity patterns appear to exist in the working tree and
that they do not lose the SKIP_WORKTREE bit, at least until they get
modified.  A Git-aware virtual file system[4] takes advantage of its
position as a file system driver to expose all files in the working
tree, fetch them on demand using partial clone on access, and tell Git
to pay attention to them on demand by updating the sparse checkout
pattern on writes.  This means that commands like "git status" only have
to examine files that have potentially been modified, whereas commands
like "ls" are able to show the entire codebase without requiring manual
updates to the sparse checkout pattern.

Thus since af6a51875a, Git with such Git-aware virtual file systems
unsets the SKIP_WORKTREE bit for all files and commands like "git
status" have to fetch and examine them all.

Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
allow limiting the tracked set of files to a small set once again.  A
Git-aware virtual file system or other application that wants to
maintain files outside of the sparse checkout can set this in a
repository to instruct Git not to check for the presence of
SKIP_WORKTREE files.  The setting defaults to false, so most users of
sparse checkout will still get the benefit of an automatically updating
index to recover from the variety of difficult issues detailed in
af6a51875a for paths with SKIP_WORKTREE set despite the path being
present.

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] The three long paragraphs in the middle of
    https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[4] such as the vfsd described in
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 23:37:48 -08:00
Junio C Hamano 0dc90d954d Merge branch 'ns/tmp-objdir'
New interface into the tmp-objdir API to help in-core use of the
quarantine feature.

* ns/tmp-objdir:
  tmp-objdir: disable ref updates when replacing the primary odb
  tmp-objdir: new API for creating temporary writable databases
2022-01-03 16:24:15 -08:00
Junio C Hamano 63a2e8b41e Merge branch 'ew/test-wo-fsync'
Allow running our tests while disabling fsync.

* ew/test-wo-fsync:
  tests: disable fsync everywhere
2021-12-15 09:39:52 -08:00
Neeraj Singh ecd81dfc79 tmp-objdir: disable ref updates when replacing the primary odb
When creating a subprocess with a temporary ODB, we set the
GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
to update refs, since the tmp-objdir may go away.

Introduce a similar mechanism for in-process temporary ODBs when
we call tmp_objdir_replace_primary_odb. Now both mechanisms set
the disable_ref_updates flag on the odb, which is queried by
the ref_transaction_prepare function.

Peff's test case [1] was invoking ref updates via the cachetextconv
setting. That particular code silently does nothing when a ref
update is forbidden. See the call to notes_cache_put in
fill_textconv where errors are ignored.

[1] https://lore.kernel.org/git/YVOn3hDsb5pnxR53@coredump.intra.peff.net/

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-08 14:06:46 -08:00
Neeraj Singh b3cecf49ea tmp-objdir: new API for creating temporary writable databases
The tmp_objdir API provides the ability to create temporary object
directories, but was designed with the goal of having subprocesses
access these object stores, followed by the main process migrating
objects from it to the main object store or just deleting it.  The
subprocesses would view it as their primary datastore and write to it.

Here we add the tmp_objdir_replace_primary_odb function that replaces
the current process's writable "main" object directory with the
specified one. The previous main object directory is restored in either
tmp_objdir_migrate or tmp_objdir_destroy.

For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
object_database` to mark ephemeral object databases that do not require
fsync durability.

Add 'git prune' support for removing temporary object databases, and
make sure that they have a name starting with tmp_ and containing an
operation-specific name.

Based-on-patch-by: Elijah Newren <newren@gmail.com>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-08 14:06:36 -08:00
Eric Wong 412e4caee3 tests: disable fsync everywhere
The "GIT_TEST_FSYNC" environment variable now exists for
disabling fsync() even on packfiles and other "critical" data.

Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
on an HDD, test runtime drops from ~4 minutes down to ~3 minutes.
Using "GIT_TEST_FSYNC=1" re-enables fsync() for comparison
purposes.

SVN interopability tests are minimally affected since SVN will
still use fsync in various places.

This will also be useful for 3rd-party tools which create
throwaway git repositories of temporary data, but remains
undocumented for end users.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-29 10:22:40 -07:00
Junio C Hamano f6c075ad71 Merge branch 'jk/ref-paranoia'
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.

* jk/ref-paranoia:
  refs: drop "broken" flag from for_each_fullref_in()
  ref-filter: drop broken-ref code entirely
  ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
  repack, prune: drop GIT_REF_PARANOIA settings
  refs: turn on GIT_REF_PARANOIA by default
  refs: omit dangling symrefs when using GIT_REF_PARANOIA
  refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
  refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
  refs-internal.h: move DO_FOR_EACH_* flags next to each other
  t5312: be more assertive about command failure
  t5312: test non-destructive repack
  t5312: create bogus ref as necessary
  t5312: drop "verbose" helper
  t5600: provide detached HEAD for corruption failures
  t5516: don't use HEAD ref for invalid ref-deletion tests
  t7900: clean up some more broken refs
2021-10-11 10:21:47 -07:00
Junio C Hamano d8d33378ed Merge branch 'ab/repo-settings-cleanup'
Code cleanup.

* ab/repo-settings-cleanup:
  repository.h: don't use a mix of int and bitfields
  repo-settings.c: simplify the setup
  read-cache & fetch-negotiator: check "enum" values in switch()
  environment.c: remove test-specific "ignore_untracked..." variable
  wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
2021-10-06 13:40:11 -07:00
Jeff King 5d1f5b8cd4 repack, prune: drop GIT_REF_PARANOIA settings
Now that GIT_REF_PARANOIA is the default, we don't need to selectively
enable it for destructive operations. In fact, it's harmful to do so,
because it overrides any GIT_REF_PARANOIA=0 setting that the user may
have provided (because they're trying to work around some corruption).

With these uses gone, we can further clean up the ref_paranoia global,
and make it a static variable inside the refs code.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Ævar Arnfjörð Bjarmason c6b4888b3f environment.c: remove test-specific "ignore_untracked..." variable
Instead of the global ignore_untracked_cache_config variable added in
dae6c322fa (test-dump-untracked-cache: don't modify the untracked
cache, 2016-01-27) we can make use of the new facility to set config
via environment variables added in d8d77153ea (config: allow
specifying config entries via envvar pairs, 2021-01-12).

It's arguably a bit hacky to use setenv() and getenv() to pass
messages between the same program, but since the test helpers are not
the main intended audience of repo-settings.c I think it's better than
hardcoding the test-only special-case in prepare_repo_settings().

This uses the xsetenv() wrapper added in the preceding commit, if we
don't set these in the environment we'll fail in
t7063-status-untracked-cache.sh, but let's fail earlier anyway if that
were to happen.

This breaks any parent process that's potentially using the
GIT_CONFIG_* and GIT_CONFIG_PARAMETERS mechanism to pass one-shot
config setting down to a git subprocess, but in this case we don't
care about the general case of such potential parents. This process
neither spawns other "git" processes, nor is it interested in other
configuration. We might want to pick up other test modes here, but
those will be passed via GIT_TEST_* environment variables.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22 13:15:00 -07:00
Ævar Arnfjörð Bjarmason 3540c71ea5 wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
Add fatal wrappers for setenv() and unsetenv(). In d7ac12b25d (Add
set_git_dir() function, 2007-08-01) we started checking its return
value, and since 48988c4d0c (set_git_dir: die when setenv() fails,
2018-03-30) we've had set_git_dir_1() die if we couldn't set it.

Let's provide a wrapper for both, this will be useful in many other
places, a subsequent patch will make another use of xsetenv().

The checking of the return value here is over-eager according to
setenv(3) and POSIX. It's documented as returning just -1 or 0, so
perhaps we should be checking -1 explicitly.

Let's just instead die on any non-zero, if our C library is so broken
as to return something else than -1 on error (and perhaps not set
errno?) the worst we'll do is die with a nonsensical errno value, but
we'll want to die in either case.

Let's make these return "void" instead of "int". As far as I can tell
there's no other x*() wrappers that needed to make the decision of
deviating from the signature in the C library, but since their return
value is only used to indicate errors (so we'd die here), we can catch
unreachable code such as

    if (xsetenv(...) < 0)
        [...];

I think it would be OK skip the NULL check of the "name" here for the
calls to die_errno(). Almost all of our setenv() callers are taking a
constant string hardcoded in the source as the first argument, and for
the rest we can probably assume they've done the NULL check
themselves. Even if they didn't, modern C libraries are forgiving
about it (e.g. glibc formatting it as "(null)"), on those that aren't,
well, we were about to die anyway. But let's include the check anyway
for good measure.

1. https://pubs.opengroup.org/onlinepubs/009604499/functions/setenv.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22 13:15:00 -07:00
René Scharfe 8f0f110156 compression: drop write-only core_compression_* variables
Since 8de7eeb54b (compression: unify pack.compression configuration
parsing, 2016-11-15) the variables core_compression_level and
core_compression_seen are only set, but never read.  Remove them.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-12 16:23:28 -07:00
Andrzej Hunt 14c3dd817d environment: move strbuf into block to plug leak
realpath is only populated if we execute the git_work_tree_initialized
block. However that block also causes us to return early, meaning we
never actually release the strbuf in the case where we populated it.
Therefore we move all strbuf related code into the block to guarantee
that we can't leak it.

LSAN output from t0095:

Direct leak of 129 byte(s) in 1 object(s) allocated from:
    #0 0x49a9b9 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0x78f585 in xrealloc wrapper.c:126:8
    #2 0x713ff4 in strbuf_grow strbuf.c:98:2
    #3 0x713ff4 in strbuf_getcwd strbuf.c:597:3
    #4 0x4f0c18 in strbuf_realpath_1 abspath.c:99:7
    #5 0x5ae4a4 in set_git_work_tree environment.c:259:3
    #6 0x6fdd8a in setup_discovered_git_dir setup.c:931:2
    #7 0x6fdd8a in setup_git_directory_gently setup.c:1235:12
    #8 0x4cb50d in get_bloom_filter_for_commit t/helper/test-bloom.c:41:2
    #9 0x4cb50d in cmd__bloom t/helper/test-bloom.c:95:3
    #10 0x4caa1f in cmd_main t/helper/test-tool.c:124:11
    #11 0x4caded in main common-main.c:52:11
    #12 0x7f0869f02349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 129 byte(s) leaked in 1 allocation(s).

It looks like this leak has existed since realpath was first added to
set_git_work_tree() in:
  3d7747e318 (real_path: remove unsafe API, 2020-03-10)

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26 12:19:20 -07:00
Patrick Steinhardt d8d77153ea config: allow specifying config entries via envvar pairs
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.

Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.

This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.

While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:45 -08:00
Patrick Steinhardt b9d147fb15 environment: make `getenv_safe()` a public function
The `getenv_safe()` helper function helps to safely retrieve multiple
environment values without the need to depend on platform-specific
behaviour for the return value's lifetime. We'll make use of this
function in a following patch, so let's make it available by making it
non-static and adding a declaration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:45 -08:00
Junio C Hamano 0d9a8e33f9 Merge branch 'jk/leakfix'
Code clean-up.

* jk/leakfix:
  submodule--helper: fix leak of core.worktree value
  config: fix leak in git_config_get_expiry_in_days()
  config: drop git_config_get_string_const()
  config: fix leaks from git_config_get_string_const()
  checkout: fix leak of non-existent branch names
  submodule--helper: use strbuf_release() to free strbufs
  clear_pattern_list(): clear embedded hashmaps
2020-08-27 14:04:49 -07:00
Jeff King 9a53219f69 config: drop git_config_get_string_const()
As evidenced by the leak fixes in the previous commit, the "const" in
git_config_get_string_const() clearly misleads people into thinking that
it does not allocate a copy of the string. We can fix this by renaming
it, but it's easier still to just drop it. Of the four remaining
callers:

  - The one in git_config_parse_expiry() still needs to allocate, since
    that's what its callers expect. We can just use the non-const
    version and cast our pointer. Slightly ugly, but the damage is
    contained in one spot.

  - The two in apply are writing to global "const char *" variables, and
    need to continue allocating. We often mark these as const because we
    assign default string literals to them. But in this case we don't do
    that, so we can just declare them as real "char *" pointers and use
    the non-const version.

  - The call in checkout doesn't actually need a copy; it can just use
    the non-allocating "tmp" version of the function.

The function is also mentioned in the MyFirstContribution document. We
can swap that call out for the non-allocating "tmp" variant, which fits
well in the example given.

We'll drop the "configset" and "repo" variants, as well (which are
unused).

Note that this frees up the "const" name, so we could rename the "tmp"
variant back to that. But let's give some time for topics in flight to
adapt to the new code before doing so (if we do it too soon, the
function semantics will change but the compiler won't alert us).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 15:35:47 -07:00
Jeff King d70a9eb611 strvec: rename struct fields
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").

Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 19:18:06 -07:00
Jeff King ef8d7ac42a strvec: convert more callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Junio C Hamano 896833b268 Merge branch 'tb/shallow-cleanup'
Code cleanup.

* tb/shallow-cleanup:
  shallow: use struct 'shallow_lock' for additional safety
  shallow.h: document '{commit,rollback}_shallow_file'
  shallow: extract a header file for shallow-related functions
  commit: make 'commit_graft_pos' non-static
2020-05-13 12:19:18 -07:00
Taylor Blau 120ad2b0f1 shallow: extract a header file for shallow-related functions
There are many functions in commit.h that are more related to shallow
repositories than they are to any sort of generic commit machinery.
Likely this began when there were only a few shallow-related functions,
and commit.h seemed a reasonable enough place to put them.

But, now there are a good number of shallow-related functions, and
placing them all in 'commit.h' doesn't make sense.

This patch extracts a 'shallow.h', which takes all of the declarations
from 'commit.h' for functions which already exist in 'shallow.c'. We
will bring the remaining shallow-related functions defined in 'commit.c'
in a subsequent patch.

For now, move only the ones that already are implemented in 'shallow.c',
and update the necessary includes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30 14:19:13 -07:00
Alexandr Miloslavskiy 3d7747e318 real_path: remove unsafe API
Returning a shared buffer invites very subtle bugs due to reentrancy or
multi-threading, as demonstrated by the previous patch.

There was an unfinished effort to abolish this [1].

Let's finally rid of `real_path()`, using `strbuf_realpath()` instead.

This patch uses a local `strbuf` for most places where `real_path()` was
previously called.

However, two places return the value of `real_path()` to the caller. For
them, a `static` local `strbuf` was added, effectively pushing the
problem one level higher:
    read_gitfile_gently()
    get_superproject_working_tree()

[1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-10 11:41:40 -07:00
Alexandr Miloslavskiy 0915a5b4cd set_git_dir: fix crash when used with real_path()
`real_path()` returns result from a shared buffer, inviting subtle
reentrance bugs. One of these bugs occur when invoked this way:
    set_git_dir(real_path(git_dir))

In this case, `real_path()` has reentrance:
    real_path
    read_gitfile_gently
    repo_set_gitdir
    setup_git_env
    set_git_dir_1
    set_git_dir

Later, `set_git_dir()` uses its now-dead parameter:
    !is_absolute_path(path)

Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-06 14:45:51 -08:00
Junio C Hamano bd72a08d6c Merge branch 'ds/sparse-cone'
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.

* ds/sparse-cone: (21 commits)
  sparse-checkout: improve OS ls compatibility
  sparse-checkout: respect core.ignoreCase in cone mode
  sparse-checkout: check for dirty status
  sparse-checkout: update working directory in-process for 'init'
  sparse-checkout: cone mode should not interact with .gitignore
  sparse-checkout: write using lockfile
  sparse-checkout: use in-process update for disable subcommand
  sparse-checkout: update working directory in-process
  sparse-checkout: sanitize for nested folders
  unpack-trees: add progress to clear_ce_flags()
  unpack-trees: hash less in cone mode
  sparse-checkout: init and set in cone mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: add 'cone' mode
  trace2: add region in clear_ce_flags
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add '--stdin' option to set subcommand
  sparse-checkout: 'set' subcommand
  clone: add --sparse mode
  sparse-checkout: create 'init' subcommand
  ...
2019-12-25 11:21:58 -08:00
Johannes Schindelin 67af91c47a Sync with 2.23.1
* maint-2.23: (44 commits)
  Git 2.23.1
  Git 2.22.2
  Git 2.21.1
  mingw: sh arguments need quoting in more circumstances
  mingw: fix quoting of empty arguments for `sh`
  mingw: use MSYS2 quoting even when spawning shell scripts
  mingw: detect when MSYS2's sh is to be spawned more robustly
  t7415: drop v2.20.x-specific work-around
  Git 2.20.2
  t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  ...
2019-12-06 16:31:39 +01:00
Johannes Schindelin fc346cb292 Sync with 2.20.2
* maint-2.20: (36 commits)
  Git 2.20.2
  t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  ...
2019-12-06 16:31:12 +01:00
Johannes Schindelin d851d94151 Sync with 2.19.3
* maint-2.19: (34 commits)
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  ...
2019-12-06 16:30:49 +01:00
Johannes Schindelin 7c9fbda6e2 Sync with 2.18.2
* maint-2.18: (33 commits)
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  ...
2019-12-06 16:30:38 +01:00
Johannes Schindelin 14af7ed5a9 Sync with 2.17.3
* maint-2.17: (32 commits)
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  ...
2019-12-06 16:29:15 +01:00
Johannes Schindelin bdfef0492c Sync with 2.16.6
* maint-2.16: (31 commits)
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  ...
2019-12-06 16:27:36 +01:00
Johannes Schindelin 9ac92fed5b Sync with 2.15.4
* maint-2.15: (29 commits)
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  clone --recurse-submodules: prevent name squatting on Windows
  is_ntfs_dotgit(): only verify the leading segment
  ...
2019-12-06 16:27:18 +01:00
Johannes Schindelin d3ac8c3f27 Sync with 2.14.6
* maint-2.14: (28 commits)
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  clone --recurse-submodules: prevent name squatting on Windows
  is_ntfs_dotgit(): only verify the leading segment
  test-path-utils: offer to run a protectNTFS/protectHFS benchmark
  ...
2019-12-06 16:26:55 +01:00
Johannes Schindelin 9102f958ee protect_ntfs: turn on NTFS protection by default
Back in the DOS days, in the FAT file system, file names always
consisted of a base name of length 8 plus a file extension of length 3.
Shorter file names were simply padded with spaces to the full 8.3
format.

Later, the FAT file system was taught to support _also_ longer names,
with an 8.3 "short name" as primary file name. While at it, the same
facility allowed formerly illegal file names, such as `.git` (empty base
names were not allowed), which would have the "short name" `git~1`
associated with it.

For backwards-compatibility, NTFS supports alternative 8.3 short
filenames, too, even if starting with Windows Vista, they are only
generated on the system drive by default.

We addressed the problem that the `.git/` directory can _also_ be
accessed via `git~1/` (when short names are enabled) in 2b4c6efc82
(read-cache: optionally disallow NTFS .git variants, 2014-12-16), i.e.
since Git v1.9.5, by introducing the config setting `core.protectNTFS`
and enabling it by default on Windows.

In the meantime, Windows 10 introduced the "Windows Subsystem for Linux"
(short: WSL), i.e. a way to run Linux applications/distributions in a
thinly-isolated subsystem on Windows (giving rise to many a "2016 is the
Year of Linux on the Desktop" jokes). WSL is getting increasingly
popular, also due to the painless way Linux application can operate
directly ("natively") on files on Windows' file system: the Windows
drives are mounted automatically (e.g. `C:` as `/mnt/c/`).

Taken together, this means that we now have to enable the safe-guards of
Git v1.9.5 also in WSL: it is possible to access a `.git` directory
inside `/mnt/c/` via the 8.3 name `git~1` (unless short name generation
was disabled manually). Since regular Linux distributions run in WSL,
this means we have to enable `core.protectNTFS` at least on Linux, too.

To enable Services for Macintosh in Windows NT to store so-called
resource forks, NTFS introduced "Alternate Data Streams". Essentially,
these constitute additional metadata that are connected to (and copied
with) their associated files, and they are accessed via pseudo file
names of the form `filename:<stream-name>:<stream-type>`.

In a recent patch, we extended `core.protectNTFS` to also protect
against accesses via NTFS Alternate Data Streams, e.g. to prevent
contents of the `.git/` directory to be "tracked" via yet another
alternative file name.

While it is not possible (at least by default) to access files via NTFS
Alternate Data Streams from within WSL, the defaults on macOS when
mounting network shares via SMB _do_ allow accessing files and
directories in that way. Therefore, we need to enable `core.protectNTFS`
on macOS by default, too, and really, on any Operating System that can
mount network shares via SMB/CIFS.

A couple of approaches were considered for fixing this:

1. We could perform a dynamic NTFS check similar to the `core.symlinks`
   check in `init`/`clone`: instead of trying to create a symbolic link
   in the `.git/` directory, we could create a test file and try to
   access `.git/config` via 8.3 name and/or Alternate Data Stream.

2. We could simply "flip the switch" on `core.protectNTFS`, to make it
   "on by default".

The obvious downside of 1. is that it won't protect worktrees that were
clone with a vulnerable Git version already. We considered patching code
paths that check out files to check whether we're running on an NTFS
system dynamically and persist the result in the repository-local config
setting `core.protectNTFS`, but in the end decided that this solution
would be too fragile, and too involved.

The obvious downside of 2. is that everybody will have to "suffer" the
performance penalty incurred from calling `is_ntfs_dotgit()` on every
path, even in setups where.

After the recent work to accelerate `is_ntfs_dotgit()` in most cases,
it looks as if the time spent on validating ten million random
file names increases only negligibly (less than 20ms, well within the
standard deviation of ~50ms). Therefore the benefits outweigh the cost.

Another downside of this is that paths that might have been acceptable
previously now will be forbidden. Realistically, though, this is an
improvement because public Git hosters already would reject any `git
push` that contains such file names.

Note: There might be a similar problem mounting HFS+ on Linux. However,
this scenario has been considered unlikely and in light of the cost (in
the aforementioned benchmark, `core.protectHFS = true` increased the
time from ~440ms to ~610ms), it was decided _not_ to touch the default
of `core.protectHFS`.

This change addresses CVE-2019-1353.

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Helped-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05 15:36:51 +01:00
Derrick Stolee 879321eb0b sparse-checkout: add 'cone' mode
The sparse-checkout feature can have quadratic performance as
the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can
be very significant.

Create a new Boolean config option, core.sparseCheckoutCone, to
indicate that we expect the sparse-checkout file to contain a
more limited set of patterns. This is a separate config setting
from core.sparseCheckout to avoid breaking older clients by
introducing a tri-state option.

The config option does nothing right now, but will be expanded
upon in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Christian Couder 4ca9474efa Move core_partial_clone_filter_default to promisor-remote.c
Now that we can have a different default partial clone filter for
each promisor remote, let's hide core_partial_clone_filter_default
as a static in promisor-remote.c to avoid it being use for
anything other than managing backward compatibility.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:38 -07:00
Christian Couder 60b7a92d84 Move repository_format_partial_clone to promisor-remote.c
Now that we have has_promisor_remote() and can use many
promisor remotes, let's hide repository_format_partial_clone
as a static in promisor-remote.c to avoid it being use
for anything other than managing backward compatibility.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:38 -07:00
Junio C Hamano 773e408881 Merge branch 'jk/save-getenv-result'
There were many places the code relied on the string returned from
getenv() to be non-volatile, which is not true, that have been
corrected.

* jk/save-getenv-result:
  builtin_diff(): read $GIT_DIFF_OPTS closer to use
  merge-recursive: copy $GITHEAD strings
  init: make a copy of $GIT_DIR string
  config: make a copy of $GIT_CONFIG string
  commit: copy saved getenv() result
  get_super_prefix(): copy getenv() result
2019-01-29 12:47:54 -08:00
Jeff King 8aac69038f get_super_prefix(): copy getenv() result
The return value of getenv() is not guaranteed to remain valid across
multiple calls (nor across calls to setenv()). Since this function
caches the result for the length of the program, we must make a copy to
ensure that it is still valid when we need it.

Reported-by: Yngve N. Pettersen <yngve@vivaldi.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-11 18:48:58 -08:00
Junio C Hamano 3b2f8a02fa Merge branch 'jk/loose-object-cache'
Code clean-up with optimization for the codepath that checks
(non-)existence of loose objects.

* jk/loose-object-cache:
  odb_load_loose_cache: fix strbuf leak
  fetch-pack: drop custom loose object cache
  sha1-file: use loose object cache for quick existence check
  object-store: provide helpers for loose_objects_cache
  sha1-file: use an object_directory for the main object dir
  handle alternates paths the same as the main object dir
  sha1_file_name(): overwrite buffer instead of appending
  rename "alternate_object_database" to "object_directory"
  submodule--helper: prefer strip_suffix() to ends_with()
  fsck: do not reuse child_process structs
2019-01-04 13:33:32 -08:00
Junio C Hamano 6c268fdda9 Merge branch 'js/mingw-perl5lib'
Windows fix.

* js/mingw-perl5lib:
  mingw: unset PERL5LIB by default
  config: move Windows-specific config settings into compat/mingw.c
  config: allow for platform-specific core.* config settings
  config: rename `dummy` parameter to `cb` in git_default_config()
2018-11-13 22:37:20 +09:00
Jeff King f0eaf63819 sha1-file: use an object_directory for the main object dir
Our handling of alternate object directories is needlessly different
from the main object directory. As a result, many places in the code
basically look like this:

  do_something(r->objects->objdir);

  for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
        do_something(odb->path);

That gets annoying when do_something() is non-trivial, and we've
resorted to gross hacks like creating fake alternates (see
find_short_object_filename()).

Instead, let's give each raw_object_store a unified list of
object_directory structs. The first will be the main store, and
everything after is an alternate. Very few callers even care about the
distinction, and can just loop over the whole list (and those who care
can just treat the first element differently).

A few observations:

  - we don't need r->objects->objectdir anymore, and can just
    mechanically convert that to r->objects->odb->path

  - object_directory's path field needs to become a real pointer rather
    than a FLEX_ARRAY, in order to fill it with expand_base_dir()

  - we'll call prepare_alt_odb() earlier in many functions (i.e.,
    outside of the loop). This may result in us calling it even when our
    function would be satisfied looking only at the main odb.

    But this doesn't matter in practice. It's not a very expensive
    operation in the first place, and in the majority of cases it will
    be a noop. We call it already (and cache its results) in
    prepare_packed_git(), and we'll generally check packs before loose
    objects. So essentially every program is going to call it
    immediately once per program.

    Arguably we should just prepare_alt_odb() immediately upon setting
    up the repository's object directory, which would save us sprinkling
    calls throughout the code base (and forgetting to do so has been a
    source of subtle bugs in the past). But I've stopped short of that
    here, since there are already a lot of other moving parts in this
    patch.

  - Most call sites just get shorter. The check_and_freshen() functions
    are an exception, because they have entry points to handle local and
    nonlocal directories separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13 14:22:03 +09:00