The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.
* ps/object-read-stream:
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
When asked to perform object consistency checks via the `--fsck-objects`
flag we verify that each object part of the pack is valid. In general,
this check can even be performed outside of a Git repository: we don't
need an initialized object database as we simply read the object from
the packfile directly.
But there's one exception: a subset of the object checks may be deferred
to a later point in time. For now, this only concerns ".gitmodules" and
".gitattributes" files: whenever we see a tree referencing these files
we queue them for a deferred check. This is done because we need to do
some extra checks for those files to ensure that they are well-formed,
and these checks need to be done regardless of whether the corresponding
blobs are part of the packfile or not.
This works inside a repository, but unfortunately the logic leads to a
segfault when running outside of one. This is because we eventually call
`odb_read_object()`, which will crash because the object database has
not been initialized.
There's multiple options here:
- We could in theory create a purely in-memory database with only a
packfile store that contains the single packfile. We don't really
have the infrastructure for this yet though, and it would end up
being quite hacky.
- We could refuse to perform consistency checks outside of a
repository. But most of the checks work alright, so this would be a
regression.
- We can skip the finalizing consistency checks when running outside
of a repository. This is not as invasive as skipping all checks,
but it's not great to randomly skip a subset of tests, either.
None of these options really feel perfect. The first one would be the
obvious choice if easily possible.
There's another option though: instead of skipping the final object
checks, we can die if there are any queued object checks. With this
change we now die exactly if and only if we would have previously
segfaulted. Like this we ensure that objects that _may_ fail the
consistency checks won't be silently skipped, and at the same time we
give users a much better error message.
Refactor the code accordingly and add a test that would have triggered
the segfault. Note that we also move down the logic to add the packfile
to the store. There is no point doing this any earlier than right before
we execute `fsck_finish()`, and it ensures that the logic to set up and
perform the consistency check is self-contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have turned `struct odb_read_stream` into a
publicly visible structure. Furthermore, this structure now contains the
type and size of the object that we are about to stream. Consequently,
the out-pointers that we used before to propagate the type and size of
the streamed object are now somewhat redundant with the data contained
in the structure itself.
Drop these out-pointers and adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "streaming" terminology is somewhat generic, so it may not be
immediately obvious that "streaming.{c,h}" is specific to the object
database. Rectify this by moving it into the "odb/" directory so that it
can be immediately attributed to the object subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the streaming interface to be centered around object databases
instead of centered around the repository. Rename the functions
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the following patches we are about to make the `git_istream` more
generic so that it becomes fully controlled by the specific object
source that wants to create it. As part of these refactorings we'll
fully move the structure into the object database subsystem.
Prepare for this change by renaming the structure from `git_istream`
to `odb_read_stream`. This mirrors the `odb_write_stream` structure that
we already have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a recurring pattern where we essentially perform an upsert of a
packfile in case it isn't yet known by the packfile store. The logic to
do so is non-trivial as we have to reconstruct the packfile's key, check
the map of packfiles, then create the new packfile and finally add it to
the store.
Introduce a new function that does this dance for us. Refactor callsites
to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `install_packed_git()` functions adds a packfile to a specific
object store. Refactor it to accept a packfile store instead of a
repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.
* ps/object-file-wo-the-repository:
object-file: get rid of `the_repository` in index-related functions
object-file: get rid of `the_repository` in `force_object_loose()`
object-file: get rid of `the_repository` in `read_loose_object()`
object-file: get rid of `the_repository` in loose object iterators
object-file: remove declaration for `for_each_file_in_obj_subdir()`
object-file: inline `for_each_loose_file_in_objdir_buf()`
object-file: get rid of `the_repository` when writing objects
odb: introduce `odb_write_object()`
loose: write loose objects map via their source
object-file: get rid of `the_repository` in `finalize_object_file()`
object-file: get rid of `the_repository` in `loose_object_info()`
object-file: get rid of `the_repository` when freshening objects
object-file: inline `check_and_freshen()` functions
object-file: get rid of `the_repository` in `has_loose_object()`
object-file: stop using `the_hash_algo`
object-file: fix -Wsign-compare warnings
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
We implicitly depend on `the_repository` when moving an object file into
place in `finalize_object_file()`. Get rid of this global dependency by
passing in a repository.
Note that one might be pressed to inject an object database instead of a
repository. But the function doesn't really care about the ODB at all.
All it does is to move a file into place while checking whether there is
any collision. As such, the functionality it provides is independent of
the object database and only needs the repository as parameter so that
it can adjust permissions of the file we are about to finalize.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have some commands that can operate inside or outside a repository.
If we're operating outside a repository, we clearly cannot use the
repository's hash algorithm as a default since it doesn't exist, so
instead, let's pick the default instead of specifically SHA-1. Right
now this results in no functional change since the default is SHA-1, but
that may change in the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `odb_mkstemp()` by
passing in the object database as a parameter and adjusting all callers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git index-pack --fix-thin" used to abort to prevent a cycle in
delta chains from forming in a corner case even when there is no
such cycle.
* ds/fix-thin-fix:
index-pack: allow revisiting REF_DELTA chains
t5309: create failing test for 'git index-pack'
test-tool: add pack-deltas helper
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:
- The new function has a short-and-sweet name.
- More explicit defaults: `has_object()` doesn't fetch missing objects
via promisor remotes, and neither does it reload packfiles if an
object wasn't found by default. This ensures that it becomes
immediately obvious when a simple object existence check may result
in expensive actions.
Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.
Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:
- `repo_has_object_file(...)` is equivalent to
`has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., 0)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.
The replacements should be functionally equivalent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pack_keep()` creates a file at the passed-in path. If
this fails, then the function re-tries by first creating any potentially
missing leading directories and then trying to create the file once
again. As such, this function doesn't host any kind of logic that is
specific to the object store, but is rather a generic helper function.
Rename the function to `safe_create_file_with_leading_directories()` and
move it into "path.c". While at it, refactor it so that it loses its
dependency on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.
This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.
The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.
Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.
However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.
This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.
This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().
The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.
The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.
* jh/hash-init-fixes:
index-pack, unpack-objects: restore missing ->init_fn
Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers")
accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1.
This fixes the following error on fetch:
fatal: fetch-pack: invalid index-pack output
Signed-off-by: Jensen Huang <hmz007@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.
Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.
Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions in "pack-write.c" that implicitly depend
on `the_repository` or `the_hash_algo`. Remove this dependency by
injecting the repository via a parameter and adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "csum-file.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`.
Refactor the code to stop using `the_repository` by adapting functions
to receive required data as parameters. Adapt callsites accordingly by
either using `the_repository->hash_algo`, or by using a context-provided
hash algorithm in case the subsystem already got rid of its dependency
on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.
* jk/pack-header-parse-alignment-fix:
index-pack, unpack-objects: use skip_prefix to avoid magic number
index-pack, unpack-objects: use get_be32() for reading pack header
parse_pack_header_option(): avoid unaligned memory writes
packfile: factor out --pack_header argument parsing
bswap.h: squelch potential sparse -Wcast-truncate warnings
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others. The built-in commands
have been fixed to show them on the standard output consistently.
* jc/show-usage-help:
builtin: send usage() help text to standard output
oddballs: send usage() help text to standard output
builtins: send usage_with_options() help text to standard output
usage: add show_usage_if_asked()
parse-options: add show_usage_with_options_if_asked()
t0012: optionally check that "-h" output goes to stdout
The `write_rev_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above. Also modify children functions
`write_rev_file_order()` and `write_rev_header()` to accept
'the_hash_algo'.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
However, in `midx-write.c`, since all usage of global variables is
removed, don't reintroduce them and instead use the `repo` available in
the context.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.
Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.
* ps/the-repository:
match-trees: stop using `the_repository`
graph: stop using `the_repository`
add-interactive: stop using `the_repository`
tmp-objdir: stop using `the_repository`
resolve-undo: stop using `the_repository`
credential: stop using `the_repository`
mailinfo: stop using `the_repository`
diagnose: stop using `the_repository`
server-info: stop using `the_repository`
send-pack: stop using `the_repository`
serve: stop using `the_repository`
trace: stop using `the_repository`
pager: stop using `the_repository`
progress: stop using `the_repository`
When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.
Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.
This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.
We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).
It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).
I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.
Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.
In preparation for a bugfix, let's factor the duplicated code into a
common helper.
The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).
Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the show_usage_and_exit_if_asked() helper we introduced
earlier, fix callers of usage() that want to show the help text when
explicitly asked by the end-user. The help text now goes to the
standard output stream for them.
These are the bog standard "if we got only '-h', then that is a
request for help" callers. Their
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(message);
are simply replaced with
show_usage_and_exit_if_asked(argc, argv, message);
With this, the built-ins tested by t0012 all send their help text to
their standard output stream, so the check in t0012 that was half
tightened earlier is now fully tightened to insist on standard error
stream being empty.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The extra "barrier" approach was too much code whose sole purpose
was to work around a race that is not even ours (i.e. in LSan's
teardown code).
In preparation for queuing a solution taking a much-less-invasive
approach, let's revert them.
We sometimes get false positives from our linux-leaks CI job because of
a race in LSan itself. The problem is that one thread is still
initializing its stack in LSan's code (and allocating memory to do so)
while anothe thread calls die(), taking down the whole process and
triggering a leak check.
The problem is described in more detail in 993d38a066 (index-pack: spawn
threads atomically, 2024-01-05), which tried to fix it by pausing worker
threads until all calls to pthread_create() had completed. But that's
not enough to fix the problem, because the LSan setup code runs in the
threads themselves. So even though pthread_create() has returned, we
have no idea if all threads actually finished their setup before letting
any of them do real work.
We can fix that by using a barrier inside the threads themselves,
waiting for all of them to hit the start of their main function before
any of them proceed.
You can test for the race by running:
make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
cd t
./t5309-pack-delta-cycles.sh --stress
which fails quickly before this patch, and should run indefinitely
without it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 993d38a066.
That commit was trying to solve a race between LSan setting up the
threads stack and another thread calling exit(), by making sure that all
pthread_create() calls have finished before doing any work that might
trigger the exit().
But that isn't sufficient. The setup code actually runs in the
individual threads themselves, not in the spawning thread's call to
pthread_create(). So while it may have improved the race a bit, you can
still trigger it pretty quickly with:
make SANITIZE=leak
cd t
./t5309-pack-delta-cycles.sh --stress
Let's back out that failed attempt so we can try again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>