Implement the `write_object_stream()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `write_object()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `read_object_stream()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `read_object_info()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pretend_object()` writes an object into the in-memory
object database source. The effect of this is that the object will now
become readable, but it won't ever be persisted to disk.
Before storing the object, we first verify whether the object already
exists. This is done by calling `odb_has_object()` to check all sources,
followed by `find_cached_object()` to check whether we have already
stored the object in our in-memory source.
This is unnecessary though, as `odb_has_object()` already checks the
in-memory source transitively via:
- `odb_has_object()`
- `odb_read_object_info_extended()`
- `do_oid_object_info_extended()`
- `find_cached_object()`
Drop the explicit call to `find_cached_object()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `free()` callback function for the "in-memory" source.
Note that this requires us to define `struct cached_object_entry` in
"odb/source-inmemory.h", as it is accessed in both "odb.c" and
"odb/source-inmemory.c" now. This will be fixed in subsequent commits
though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Next to our typical object database sources, each object database also
has an implicit source of "cached" objects. These cached objects only
exist in memory and some use cases:
- They contain evergreen objects that we expect to always exist, like
for example the empty tree.
- They can be used to store temporary objects that we don't want to
persist to disk, which is used by git-blame(1) to create a fake
worktree commit.
Overall, their use is somewhat restricted though. For example, we don't
provide the ability to use it as a temporary object database source that
allows the user to write objects, but discard them after Git exists. So
while these cached objects behave almost like a source, they aren't used
as one.
This is about to change over the following commits, where we will turn
cached objects into a new "in-memory" source. This will allow us to use
it exactly the same as any other source by providing the same common
interface as the "files" source.
For now, the in-memory source only hosts the cached objects and doesn't
provide any logic yet. This will change with subsequent commits, where
we move respective functionality into the source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
How an ODB transaction handles writing objects is expected to vary
between implementations. Introduce a new `write_object_stream()`
callback in `struct odb_transaction` to make this function pluggable.
Rename `index_blob_packfile_transaction()` to
`odb_transaction_files_write_object_stream()` and wire it up for use
with `struct odb_transaction_files` accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_blob_packfile_transaction()` function streams blob data
directly from an fd. This makes it difficult to reuse as part of a
generic transactional object writing interface.
Refactor the packfile write path to operate on a `struct
odb_write_stream`, allowing callers to supply data from arbitrary
sources.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In certain scenarios, Git handles writing blobs that exceed
"core.bigFileThreshold" differently by streaming the object directly
into a packfile. When there is an active ODB transaction, these blobs
are streamed to the same packfile instead of using a separate packfile
for each. If "pack.packSizeLimit" is configured and streaming another
object causes the packfile to exceed the configured limit, the packfile
is truncated back to the previous object and the object write is
restarted in a new packfile.
This works fine, but requires the fd being read from to save a
checkpoint so it becomes possible to rewind the input source via seeking
back to a known offset at the beginning. In a subsequent commit, blob
streaming is converted to use `struct odb_write_stream` as a more
generic input source instead of an fd which doesn't provide a mechanism
for rewinding.
For this use case though, rewinding the fd is not strictly necessary
because the inflated size of the object is known and can be used to
approximate whether writing the object would cause the packfile to
exceed the configured limit prior to writing anything. These blobs
written to the packfile are never deltified thus the size difference
between what is written versus the inflated size is due to zlib
compression. While this does prevent packfiles from being filled to the
potential maximum is some cases, it should be good enough and still
prevents the packfile from exceeding any configured limit.
Use the inflated blob size to determine whether writing an object to a
packfile will exceed the configured "pack.packSizeLimit".
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_blob_packfile_transaction()` function handles streaming a
blob from an fd to compute its object ID and conditionally writes the
object directly to a packfile if the INDEX_WRITE_OBJECT flag is set. A
subsequent commit will make these packfile object writes part of the
transaction interface. Consequently, having the object write be
conditional on this flag is a bit awkward.
In preparation for this change, introduce a dedicated
`hash_blob_stream()` helper that only computes the OID from a `struct
odb_write_stream`. This is invoked by `index_fd()` instead when the
INDEX_WRITE_OBJECT is not set. The object write performed via
`index_blob_packfile_transaction()` is made unconditional accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `read()` callback used by `struct odb_write_stream` currently
returns a pointer to an internal buffer along with the number of bytes
read. This makes buffer ownership unclear and provides no way to report
errors.
Update the interface to instead require the caller to provide a buffer,
and have the callback return the number of bytes written to it or a
negative value on error. While at it, also move the `struct
odb_write_stream` definition to "odb/streaming.h". Call sites are
updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each ODB source is expected to provide an ODB transaction implementation
that should be used when starting a transaction. With d6fc6fe6f8
(odb/source: make `begin_transaction()` function pluggable, 2026-03-05),
the `struct odb_source` now provides a pluggable callback for beginning
transactions. Use the callback provided by the ODB source accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current ODB transaction interface is colocated with other ODB
interfaces in "odb.{c,h}". Subsequent commits will expand `struct
odb_transaction` to support write operations on the transaction
directly. To keep things organized and prevent "odb.{c,h}" from becoming
more unwieldy, split out `struct odb_transaction` into a separate
header.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit ec0becacc9 (run-command: add stdin callback for
parallelization, 2026-01-28), we taught run_processes_parallel() to
ignore SIGPIPE, since we wouldn't want a write() to a broken pipe of one
of the children to take down the whole process.
But there's a subtle ordering issue. After we ignore SIGPIPE, we call
pp_init(), which installs its own cleanup handler for multiple signals
using sigchain_push_common(), which includes SIGPIPE. So if we receive
SIGPIPE while writing to a child, we'll trigger that handler first, pop
it off the stack, and then re-raise (which is then ignored because of
the SIG_IGN we pushed first).
But what does that handler do? It tries to clean up all of the child
processes, under the assumption that when we re-raise the signal we'll
be exiting the process!
So a hook that exits without reading all of its input will cause us to
get SIGPIPE, which will put us in a signal handler that then tries to
kill() that same child.
This seems to be mostly harmless on Linux. The process has already
exited by this point, and though kill() does not complain (since the
process has not been reaped with a wait() call), it does not affect the
exit status of the process.
However, this seems not to be true on all platforms. This case is
triggered by t5401.13, "pre-receive hook that forgets to read its
input". This test fails on NonStop since that hook was converted to the
run_processes_parallel() API.
We can fix it by reordering the code a bit. We should run pp_init()
first, and then push our SIG_IGN onto the stack afterwards, so that it
is truly ignored while feeding the sub-processes.
Note that we also reorder the popping at the end of the function, too.
This is not technically necessary, as we are doing two pops either way,
but now the pops will correctly match their pushes.
This also fixes a related case that we can't test yet. If we did have
more than one process to run, then one child causing SIGPIPE would cause
us to kill() all of the children (which might still actually be
running). But the hook API is the only user of the new feed_pipe
feature, and it does not yet support parallel hook execution. So for now
we'll always execute the processes sequentially. Once parallel hook
execution exists, we'll be able to add a test which covers this.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During Git 2.52 timeframe, we broke streaming computation of object
hash outside a repository, which has been corrected.
* jt/index-fd-wo-repo-regression-fix:
During Git 2.52 timeframe, we broke streaming computation of object
hash outside a repository, which has been corrected.
* jt/index-fd-wo-repo-regression-fix-maint:
object-file: avoid ODB transaction when not writing objects
The experimental `git replay` command learned the `--ref=<ref>` option
to allow specifying which ref to update, overriding the default behavior.
* tc/replay-ref:
replay: allow to specify a ref with option --ref
replay: use stuck form in documentation and help message
builtin/replay: mark options as not negatable
add_files_to_cache() used diff_files() to detect only the paths that
are different between the index and the working tree and add them,
which does not need rename detection, which interfered with unnecessary
conflicts.
* ng/add-files-to-cache-wo-rename:
read-cache: disable renames in add_files_to_cache
Update reftable library part with what is used in libgit2 to improve
portability to different target codebases and platforms.
* ps/reftable-portability:
reftable/system: add abstraction to mmap files
reftable/system: add abstraction to retrieve time in milliseconds
reftable/fsck: use REFTABLE_UNUSED instead of UNUSED
reftable/stack: provide fsync(3p) via system header
reftable: introduce "reftable-system.h" header
Various code clean-up around odb subsystem.
* ps/odb-cleanup:
odb: drop unneeded headers and forward decls
odb: rename `odb_has_object()` flags
odb: use enum for `odb_write_object` flags
odb: rename `odb_write_object()` flags
treewide: use enum for `odb_for_each_object()` flags
CodingGuidelines: document our style for flags
Add the missing &&'s so we properly propagate failures
between commands in the hook helper functions.
Also add a missing mkdir -p arg (found by adding the &&).
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ce1661f9da (odb: add transaction interface, 2025-09-16), existing
ODB transaction logic is adapted to create a transaction interface
at the ODB layer. The intent here is for the ODB transaction
interface to eventually provide an object source agnostic means to
manage transactions.
An unintended consequence of this change though is that
`object-file.c:index_fd()` may enter the ODB transaction path even
when no object write is requested. In non-repository contexts, this
can result in a NULL dereference and segfault. One such case occurs
when running git-diff(1) outside of a repository with
"core.bigFileThreshold" forcing the streaming path in `index_fd()`:
$ echo foo >foo
$ echo bar >bar
$ git -c core.bigFileThreshold=1 diff -- foo bar
In this scenario, the caller only needs to compute the object ID. Object
hashing does not require an ODB, so starting a transaction is both
unnecessary and invalid.
Fix the bug by avoiding the use of ODB transactions in `index_fd()` when
callers are only interested in computing the object hash.
Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
[jc: adjusted to fd13909e (Merge branch 'jt/odb-transaction', 2025-10-02)]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git backfill" is capable of auto-detecting a sparsely checked out
working tree, which was broken.
* th/backfill-auto-detect-sparseness-fix:
backfill: auto-detect sparse-checkout from config
The check in "receive-pack" to prevent a checked out branch from
getting updated via updateInstead mechanism has been corrected.
* ps/receive-pack-updateinstead-in-worktree:
receive-pack: use worktree HEAD for updateInstead
t5516: clean up cloned and new-wt in denyCurrentBranch and worktrees test
t5516: test updateInstead with worktree and unborn bare HEAD
Handling of signed commits and tags in fast-import has been made more
configurable.
* jt/fast-import-signed-modes:
fast-import: add 'abort-if-invalid' mode to '--signed-tags=<mode>'
fast-import: add 'sign-if-invalid' mode to '--signed-tags=<mode>'
fast-import: add 'strip-if-invalid' mode to '--signed-tags=<mode>'
fast-import: add 'abort-if-invalid' mode to '--signed-commits=<mode>'
fast-export: check for unsupported signing modes earlier
The way the "git log -L<range>:<file>" feature is bolted onto the
log/diff machinery is being reworked a bit to make the feature
compatible with more diff options, like -S/G.
* mm/line-log-use-standard-diff-output:
doc: note that -L supports patch formatting and pickaxe options
t4211: add tests for -L with standard diff options
line-log: route -L output through the standard diff pipeline
line-log: fix crash when combined with pickaxe options
Reduce dependency on `the_repository` in add-patch.c file.
* sp/add-patch-with-fewer-the-repository:
add-patch: use repository instance from add_i_state instead of the_repository
Internals of "git fsck" have been refactored to not depend on the
global `the_repository` variable.
* ps/fsck-wo-the-repository:
builtin/fsck: stop using `the_repository` in error reporting
builtin/fsck: stop using `the_repository` when marking objects
builtin/fsck: stop using `the_repository` when checking packed objects
builtin/fsck: stop using `the_repository` with loose objects
builtin/fsck: stop using `the_repository` when checking reflogs
builtin/fsck: stop using `the_repository` when checking refs
builtin/fsck: stop using `the_repository` when snapshotting refs
builtin/fsck: fix trivial dependence on `the_repository`
fsck: drop USE_THE_REPOSITORY
fsck: store repository in fsck options
fsck: initialize fsck options via a function
fetch-pack: move fsck options into function scope
The value of a wrong pointer variable was referenced in an error
message that reported that it shouldn't be NULL.
* yc/path-walk-fix-error-reporting:
path-walk: fix NULL pointer dereference in error message
Fix a regression in writing the commit-graph where commits with dates
exceeding 34 bits (beyond year 2514) could cause an underflow and
crash Git during the generation data overflow chunk writing.
* ps/commit-graph-overflow-fix:
commit-graph: fix writing generations with dates exceeding 34 bits
A handful of inappropriate uses of the_repository have been
rewritten to use the right repository structure instance in the
read-cache.c codepath.
* jd/read-cache-trace-wo-the-repository:
read-cache: use istate->repo for trace2 logging
Adjust the codebase for C23 that changes functions like strchr()
that discarded constness when they return a pointer into a const
string to preserve constness.
* jk/c23-const-preserving-fixes:
config: store allocated string in non-const pointer
rev-parse: avoid writing to const string for parent marks
revision: avoid writing to const string for parent marks
rev-parse: simplify dotdot parsing
revision: make handle_dotdot() interface less confusing
A few code paths that spawned child processes for network
connection weren't wait(2)ing for their children and letting "init"
reap them instead; they have been tightened.
* aa/reap-transport-child-processes:
transport-helper, connect: use clean_on_exit to reap children on abnormal exit
pack-objects's --stdin-packs=follow mode learns to handle
excluded-but-open packs.
* tb/stdin-packs-excluded-but-open:
repack: mark non-MIDX packs above the split as excluded-open
pack-objects: support excluded-open packs with --stdin-packs
t7704: demonstrate failure with once-cruft objects above the geometric split
pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
pack-objects: plug leak in `read_stdin_packs()`