kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Patrick Steinhardt	68cd492a3e	object-store: merge "object-store-ll.h" and "object-store.h" The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:37 -07:00
Taylor Blau	08f612ba70	builtin/pack-objects.c: freshen objects from existing cruft packs Once an object is written into a cruft pack, we can only freshen it by writing a new loose or packed copy of that object with a more recent mtime. Prior to `61568efa95` (builtin/pack-objects.c: support `--max-pack-size` with `--cruft`, 2023-08-28), we typically had at most one cruft pack in a repository at any given time. So freshening unreachable objects was straightforward when already rewriting the cruft pack (and its .mtimes file). But `61568efa95` changes things: 'pack-objects' now supports writing multiple cruft packs when invoked with `--cruft` and the `--max-pack-size` flag. Cruft packs are rewritten until they reach some size threshold, at which point they are considered "frozen", and will only be modified in a pruning GC, or if the threshold itself is adjusted. Prior to this patch, however, this process breaks down when we attempt to freshen an object packed in an earlier cruft pack, and that cruft pack is larger than the threshold and thus will survive the repack. When this is the case, it is impossible to freshen objects in cruft pack(s) when those cruft packs are larger than the threshold. This is because we would avoid writing them in the new cruft pack entirely, for a couple of reasons. 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()' we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping over the packs we're going to retain (which are marked as kept in-core by 'read_cruft_objects()'). This means that we will avoid enumerating additional packed copies of objects found in any cruft packs which are larger than the given size threshold. Thus there is no opportunity to call 'create_object_entry()' whatsoever. 2. We likewise will discard the loose copy (if one exists) of any unreachable object packed in a cruft pack that is larger than the threshold. Here our call path is 'add_unreachable_loose_objects()', which uses the 'add_loose_object()' callback. That function will eventually land us in 'want_object_in_pack()' (via 'add_cruft_object_entry()'), and we'll discard the object as it appears in one of the packs which we marked as kept in-core. This means in effect that it is impossible to freshen an unreachable object once it appears in a cruft pack larger than the given threshold. Instead, we should pack an additional copy of an unreachable object we want to freshen even if it appears in a cruft pack, provided that the cruft copy has an mtime which is before the mtime of the copy we are trying to pack/freshen. This is sub-optimal in the sense that it requires keeping an additional copy of unreachable objects upon freshening, but we don't have a better alternative without the ability to make in-place modifications to existing .mtimes files. In order to implement this, we have to adjust the behavior of 'want_found_object()'. When 'pack-objects' is told that we're not going to retain any cruft packs (i.e. the set of packs marked as kept in-core does not contain a cruft pack), the behavior is unchanged. But when there is at least one cruft pack that we're holding onto, it is no longer sufficient to reject a copy of an object found in that cruft pack for that reason alone. In this case, we only want to reject a candidate object when copies of that object either: - exists in a non-cruft pack that we are retaining, regardless of that pack's mtime, or - exists in a cruft pack with an mtime at least as recent as the copy we are debating whether or not to pack, in which case freshening would be redundant. To do this, keep track of whether or not we have any cruft packs in our in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft' flag. When we end up in this new special case, we replace a call to 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject objects when we have a copy in an existing cruft pack with at least as recent an mtime as our candidate (in which case "freshening" would be redundant). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-13 11:48:04 -07:00
Junio C Hamano	f8b9821f7d	Merge branch 'jk/pack-header-parse-alignment-fix' It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings	2025-01-28 13:02:23 -08:00
Jeff King	4f02f4d68d	parse_pack_header_option(): avoid unaligned memory writes In order to recreate a pack header in our in-memory buffer, we cast the buffer to a "struct pack_header" and assign the individual fields. This is reported to cause SIGBUS on sparc64 due to alignment issues. We can work around this by using put_be32() which will write individual bytes into the buffer. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:55 -08:00
Jeff King	798e0f4516	packfile: factor out --pack_header argument parsing Both index-pack and unpack-objects accept a --pack_header argument. This is an undocumented internal argument used by receive-pack and fetch to pass along information about the header of the pack, which they've already read from the incoming stream. In preparation for a bugfix, let's factor the duplicated code into a common helper. The callers are still responsible for identifying the option. While this could likewise be factored out, it is more flexible this way (e.g., if they ever started using parse-options and wanted to handle both the stuck and unstuck forms). Likewise, the callers are responsible for reporting errors, though they both just call die(). I've tweaked unpack-objects to match index-pack in marking the error for translation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:55 -08:00
Junio C Hamano	4156b6a741	Merge branch 'ps/build-sign-compare' Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings	2024-12-23 09:32:11 -08:00
Patrick Steinhardt	41f43b8243	global: mark code units that generate warnings with `-Wsign-compare` Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:02 +09:00
Taylor Blau	2811649951	packfile.c: remove unnecessary prepare_packed_git() call In `454ea2e4d7` (treewide: use get_all_packs, 2018-08-20) we converted existing calls to both: - get_packed_git(), as well as - the_repository->objects->packed_git , to instead use the new get_all_packs() function. In the instance that this commit addresses, there was a preceding call to prepare_packed_git(), which dates all the way back to `660c889e46` (sha1_file: add for_each iterators for loose and packed objects, 2014-10-15) when its caller (for_each_packed_object()) was first introduced. This call could have been removed in `454ea2e4d7`, since get_all_packs() itself calls prepare_packed_git(). But the translation in `454ea2e4d7` was (to the best of my knowledge) a find-and-replace rather than inspecting each individual caller. Having an extra prepare_packed_git() call here is harmless, since it will notice that we have already set the 'packed_git_initialized' field and the call will be a noop. So we're only talking about a few dozen CPU cycles to set up and tear down the stack frame. But having a lone prepare_packed_git() call immediately before a call to get_all_packs() confused me, so let's remove it as redundant to avoid more confusion in the future. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:56 +09:00
Karthik Nayak	d284713bae	config: make `packed_git_(limit\|window_size)` non-global variables The variables `packed_git_window_size` and `packed_git_limit` are global config variables used in the `packfile.c` file. Since it is only used in this file, let's change it from being a global config variable to a local variable for the subsystem. With this, we rid `packfile.c` from all global variable usage and this means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:55 +09:00
Karthik Nayak	d6b2d21fbf	config: make `delta_base_cache_limit` a non-global variable The `delta_base_cache_limit` variable is a global config variable used by multiple subsystems. Let's make this non-global, by adding this variable independently to the subsystems where it is used. First, add the setting to the `repo_settings` struct, this provides access to the config in places where the repository is available. Use this in `packfile.c`. In `index-pack.c` we add it to the `pack_idx_option` struct and its constructor. While the repository struct is available here, it may not be set because `git index-pack` can be used without a repository. In `gc.c` add it to the `gc_config` struct and also the constructor function. The gc functions currently do not have direct access to a repository struct. These changes are made to remove the usage of `delta_base_cache_limit` as a global variable in `packfile.c`. This brings us one step closer to removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c` which we complete in the next patch. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:55 +09:00
Karthik Nayak	c87910b96b	packfile: pass down repository to `for_each_packed_object` The function `for_each_packed_object` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from this function and closely related function `is_promisor_object`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	cc656f4eb2	packfile: pass down repository to `has_object[_kept]_pack` The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	873b00597b	packfile: pass down repository to `odb_pack_name` The function `odb_pack_name` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	4f9e6bd492	packfile: pass `repository` to static function in the file Some of the static functions in the `packfile.c` access global variables, which can simply be avoided by passing the `repository` struct down to them. Let's do that. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	9c5ce06d74	packfile: use `repository` from `packed_git` directly In the previous commit, we introduced the `repository` structure inside `packed_git`. This provides an alternative route instead of using the global `the_repository` variable. Let's modify `packfile.c` now to use this field wherever possible instead of relying on the global state. There are still a few instances of `the_repository` usage in the file, where there is no struct `packed_git` locally available, which will be fixed in the following commits. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:53 +09:00
Karthik Nayak	2cf3fe63f6	packfile: add repository to struct `packed_git` The struct `packed_git` holds information regarding a packed object file. Let's add the repository variable to this object, to represent the repository that this packfile belongs to. This helps remove dependency on the global `the_repository` object in `packfile.c` by simply using repository information now readily available in the struct. We do need to consider that a packfile could be part of the alternates of a repository, but considering that we only have one repository struct and also that we currently anyways use 'the_repository', we should be OK with this change. We also modify `alloc_packed_git` to ensure that the repository is added to newly created `packed_git` structs. This requires modifying the function and all its callee to pass the repository object down the levels. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:53 +09:00
Jeff King	863f2459a2	packfile: use oidread() instead of hashcpy() to fill object_id When chasing a REF_DELTA, we need to pull the raw hash bytes out of the mmap'd packfile into an object_id struct. We do that with a raw hashcpy() of the appropriate length (that happens directly now, though before the previous commit it happened inside find_pack_entry_one(), also using a hashcpy). But I think this creates a potentially dangerous situation due to `d4d364b2c7` (hash: convert `oidcmp()` and `oideq()` to compare whole hash, 2024-06-14). When using sha1, we'll have uninitialized bytes in the latter part of the object_id.hash buffer, which could fool oideq(), etc. We should use oidread() instead, which correctly zero-pads the extra bytes, as of `c98d762ed9` (global: ensure that object IDs are always padded, 2024-06-14). As far as I can see, this has not been a problem in practice because the object_id we feed to find_pack_entry_one() is never used with oideq(), etc. It is being compared to the bytes mmap'd from a pack idx file, which of course do not have the extra padding bytes themselves. So there's no bug here, but this just puzzled me while looking at the code. We should do the more obviously safe thing, both for future-proofing and to avoid confusing readers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	479ab76c9f	packfile: use object_id in find_pack_entry_one() The main function we use to search a pack index for an object is find_pack_entry_one(). That function still takes a bare pointer to the hash, despite the fact that its underlying bsearch_pack() function needs an object_id struct. And so we end up making an extra copy of the hash into the struct just to do a lookup. As it turns out, all callers but one already have such an object_id. So we can just take a pointer to that struct and use it directly. This avoids the extra copy and provides a more type-safe interface. The one exception is get_delta_base() in packfile.c, when we are chasing a REF_DELTA from inside the pack (and thus we have a pointer directly to the mmap'd pack memory, not a struct). We can just bump the hashcpy() from inside find_pack_entry_one() to this one caller that needs it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	4d99559147	packfile: convert find_sha1_pack() to use object_id The find_sha1_pack() function has a few problems: - it's badly named, since it works with any object hash - it takes the hash as a bare pointer rather than an object_id struct We can fix both of these easily, as all callers actually have a real object_id anyway. I also found the existence of this function somewhat confusing, as it is about looking in an arbitrary set of linked packed_git structs. It's good for things like dumb-http which are looking in downloaded remote packs, and not our local packs. But despite the name, it is not a good way to find the pack which contains a local object (it skips the use of the midx, the pack mru list, and so on). So let's also add an explanatory comment above the declaration that may point people in the right direction. I suspect the calls in fast-import.c, which use the packed_git list from the repository struct, could actually just be using find_pack_entry(). But since we'd need to keep it anyway for dumb-http, I didn't dig further there. If we eventually drop dumb-http support, then it might be worth examining them to see if we can get rid of the function entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	4390fea963	packfile: drop sha1_pack_index_name() Like sha1_pack_name() that we dropped in the previous commit, this function uses an error-prone static strbuf and the somewhat misleading name "sha1". The only caller left is in pack-redundant.c. While this command is marked for potential removal in our BreakingChanges document, we still have it for now. But it's simple enough to convert it to use its own strbuf with the underlying odb_pack_name() function, letting us drop the otherwise obsolete function. Note that odb_pack_name() does its own strbuf_reset(), so it's safe to use directly within a loop like this. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	c2dc4c9fbb	packfile: drop sha1_pack_name() The sha1_pack_name() function has a few ugly bits: - it writes into a static strbuf (and not even a ring buffer of them), which can lead to subtle invalidation problems - it uses the term "sha1", but it's really using the_hash_algo, which could be sha256 There's only one caller of it left. And in fact that caller is better off using the underlying odb_pack_name() function itself, since it's just copying the result into its own strbuf anyway. Converting that caller lets us get rid of this now-obselete function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	03b8eed7f5	packfile: drop has_pack_index() The has_pack_index() function has several oddities that may make it surprising if you are trying to find out if we have a pack with some $hash: - it is not looking for a valid pack that we found while searching object directories. It just looks for any pack-$hash.idx file in the pack directory. - it only looks in the local directory, not any alternates - it takes a bare "unsigned char" hash, which we try to avoid these days The only caller it has is in the dumb http code; it wants to know if we already have the pack idx in question. This can happen if we downloaded the pack (and generated its index) during a previous fetch. Before the previous patch ("dumb-http: store downloaded pack idx as tempfile"), it could also happen if we downloaded the .idx from the remote but didn't get the matching .pack. But since that patch, we don't hold on to those .idx files. So there's no need to look for the .idx file in the filesystem; we can just scan through the packed_git list to see if we have it. That lets us simplify the dumb http code a bit, as we know that if we have the .idx we have the matching .pack already. And it lets us get rid of this odd function that is unlikely to be needed again. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	63aca3f7f1	dumb-http: store downloaded pack idx as tempfile This patch fixes a regression in `b1b8dfde69` (finalize_object_file(): implement collision check, 2024-09-26) where fetching a v1 pack idx file over the dumb-http protocol would cause the fetch to fail. The core of the issue is that dumb-http stores the idx we fetch from the remote at the same path that will eventually hold the idx we generate from "index-pack --stdin". The sequence is something like this: 0. We realize we need some object X, which we don't have locally, and nor does the other side have it as a loose object. 1. We download the list of remote packs from objects/info/packs. 2. For each entry in that file, we download each pack index and store it locally in .git/objects/pack/pack-$hash.idx (the $hash is not something we can verify yet and is given to us by the remote). 3. We check each pack index we got to see if it has object X. When we find a match, we download the matching .pack file from the remote to a tempfile. We feed that to "index-pack --stdin", which reindexes the pack, rather than trusting that it has what the other side claims it does. In most cases, this will end up generating the exact same (byte-for-byte) pack index which we'll store at the same pack-$hash.idx path, because the index generation and $hash id are computed based on what's in the packfile. But: a. The other side might have used other options to generate the index. For instance we use index v2 by default, but long ago it was v1 (and you can still ask for v1 explicitly). b. The other side might even use a different mechanism to determine $hash. E.g., long ago it was based on the sorted list of objects in the packfile, but we switched to using the pack checksum in `1190a1acf8` (pack-objects: name pack files after trailer hash, 2013-12-05). The regression we saw in the real world was (3a). A recent client fetching from a server with a v1 index downloaded that index, then complained about trying to overwrite it with its own v2 index. This collision is otherwise harmless; we know we want to replace the remote version with our local one, but the collision check doesn't realize that. There are a few options to fix it: - we could teach index-pack a command-line option to ignore only pack idx collisions, and use it when the dumb-http code invokes index-pack. This would be an awkward thing to expose users to and would involve a lot of boilerplate to get the option down to the collision code. - we could delete the remote .idx file right before running index-pack. It should be redundant at that point (since we've just downloaded the matching pack). But it feels risky to delete something from our own .git/objects based on what the other side has said. I'm not entirely positive that a malicious server couldn't lie about which pack-$hash.idx it has and get us to delete something precious. - we can stop co-mingling the downloaded idx files in our local objects directory. This is a slightly bigger change but I think fixes the root of the problem more directly. This patch implements the third option. The big design questions are: where do we store the downloaded files, and how do we manage their lifetimes? There are some additional quirks to the dumb-http system we should consider. Remember that in step 2 we downloaded every pack index, but in step 3 we may only download some of the matching packs. What happens to those other idx files now? They sit in the .git/objects/pack directory, possibly waiting to be used at a later date. That may save bandwidth for a subsequent fetch, but it also creates a lot of weird corner cases: - our local object directory now has semi-untrusted .idx files sitting around, without their matching .pack - in case 3b, we noted that we might not generate the same hash as the other side. In that case even if we download the matching pack, our index-pack invocation will store it in a different pack-$hash.idx file. And the unmatched .idx will sit there forever. - if the server repacks, it may delete the old packs. Now we have these orphaned .idx files sitting around locally that will never be used (nor deleted). - if we repack locally we may delete our local version of the server's pack index and not realize we have it. So we'll download it again, even though we have all of the objects it mentions. I think the right solution here is probably some more complex cache management system: download the remote .idx files to their own storage directory, mark them as "seen" when we get their matching pack (to avoid re-downloading even if we repack), and then delete them when the server's objects/info/refs no longer mentions them. But since the dumb http protocol is so ancient and so inferior to the smart http protocol, I don't think it's worth spending a lot of time creating such a system. For this patch I'm just downloading the idx files to .git/objects/tmp_pack_*, and marking them as tempfiles to be deleted when we exit (and due to the name, any we miss due to a crash, etc, should eventually be removed by "git gc" runs based on timestamps). That is slightly worse for one case: if we download an idx but not the matching pack, we won't retain that idx for subsequent runs. But the flip side is that we're making other cases better (we never hold on to useless idx files forever). I suspect that worse case does not even come up often, since it implies that the packs are generated to match distinct parts of history (i.e., in practice even in a repo with many packs you're going to end up grabbing all of those packs to do a clone). If somebody really cares about that, I think the right path forward is a managed cache directory as above, and this patch is providing the first step in that direction anyway (by moving things out of the objects/pack/ directory). There are two test changes. One demonstrates the broken v1 index case (it double-checks the resulting clone with fsck to be careful, but prior to this patch it actually fails at the clone step). The other tweaks the expectation for a test that covers the "slightly worse" case to accommodate the extra index download. The code changes are fairly simple. We stop using finalize_object_file() to copy the remote's index file into place, and leave it as a tempfile. We give the tempfile a real ".idx" name, since the packfile code expects that, and thus we make sure it is out of the usual packs/ directory (so we'd never mistake it for a real local .idx). We also have to change parse_pack_index(), which creates a temporary packed_git to access our index (we need this because all of the pack idx code assumes we have that struct). It reads the index data from the tempfile, but prior to this patch would speculatively write the finalized name into the packed_git struct using the pack-$hash we expect to use. I was mildly surprised that this worked at all, since we call verify_pack_index() on the packed_git which mentions the final name before moving the file into place! But it works because parse_pack_index() leaves the mmap-ed data in the struct, so the lazy-open in verify_pack_index() never triggers, and we read from the tempfile, ignoring the filename in the struct completely. Hacky, but it works. After this patch, parse_pack_index() now uses the index filename we pass in to derive a matching .pack name. This is OK to change because there are only two callers, both in the dumb http code (and the other passes in an existing pack-$hash.idx name, so the derived name is going to be pack-$hash.pack, which is what we were using anyway). I'll follow up with some more cleanups in that area, but this patch is sufficient to fix the regression. Reported-by: fox <fox.gbr@townlong-yak.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Patrick Steinhardt	a3673f4898	environment: make `get_object_directory()` accept a repository The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-09-12 10:15:39 -07:00
Taylor Blau	fcb2205b77	midx: implement support for writing incremental MIDX chains Now that the rest of the MIDX subsystem and relevant callers have been updated to learn about how to read and process incremental MIDX chains, let's finally update the implementation in `write_midx_internal()` to be able to write incremental MIDX chains. This new feature is available behind the `--incremental` option for the `multi-pack-index` builtin, like so: $ git multi-pack-index write --incremental The implementation for doing so is relatively straightforward, and boils down to a handful of different kinds of changes implemented in this patch: - The `compute_sorted_entries()` function is taught to reject objects which appear in any existing MIDX layer. - Functions like `write_midx_revindex()` are adjusted to write pack_order values which are offset by the number of objects in the base MIDX layer. - The end of `write_midx_internal()` is adjusted to move non-incremental MIDX files when necessary (i.e. when creating an incremental chain with an existing non-incremental MIDX in the repository). There are a handful of other changes that are introduced, like new functions to clear incremental MIDX files that are unrelated to the current chain (using the same "keep_hash" mechanism as in the non-incremental case). The tests explicitly exercising the new incremental MIDX feature are relatively limited for two reasons: 1. Most of the "interesting" behavior is already thoroughly covered in t5319-multi-pack-index.sh, which handles the core logic of reading objects through a MIDX. The new tests in t5334-incremental-multi-pack-index.sh are mostly focused on creating and destroying incremental MIDXs, as well as stitching their results together across layers. 2. A new GIT_TEST environment variable is added called "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the entire test suite to write incremental MIDXs after repacking when combined with the "GIT_TEST_MULTI_PACK_INDEX" variable. This exercises the long tail of other interesting behavior that is defined implicitly throughout the rest of the CI suite. It is likewise added to the linux-TEST-vars job. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:39 -07:00
Taylor Blau	b80236d0e3	midx: support reading incremental MIDX chains Now that the MIDX machinery's internals have been taught to understand incremental MIDXs over the previous handful of commits, the MIDX machinery itself can begin reading incremental MIDXs. (Note that while the on-disk format for incremental MIDXs has been defined, the writing end has not been implemented. This will take place in the commit after next.) The core of this change involves following the order specified in the MIDX chain in reverse and opening up MIDXs in the chain one-by-one, adding them to the previous layer's `->base_midx` pointer at each step. In order to implement this, the `load_multi_pack_index()` function is taught to call a new `load_multi_pack_index_chain()` function if loading a non-incremental MIDX failed via `load_multi_pack_index_one()`. When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in the file one-by-one and dispatches calls to `load_multi_pack_index_one()` to read each layer of the MIDX chain. When a layer was successfully read, it is added to the MIDX chain by calling `add_midx_to_chain()` which validates the contents of the `BASE` chunk, performs some bounds checks on the number of combined packs and objects, and attaches the new MIDX by assigning its `base_midx` pointer to the existing part of the chain. As a supplement to this, introduce a new mode in the test-read-midx test-tool which allows us to read the information for a specific MIDX in the chain by specifying its trailing checksum via the command-line arguments like so: $ test-tool read-midx .git/objects [checksum] Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-06 12:01:38 -07:00
Patrick Steinhardt	e7da938570	global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	9da95bda74	hash: require hash algorithm in `oidread()` and `oidclr()` Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Patrick Steinhardt	f4836570a7	hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Junio C Hamano	1002f28a52	Merge branch 'eb/hash-transition' Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...	2024-03-28 14:13:50 -07:00
Elijah Newren	eea0e59ffb	treewide: remove unnecessary includes in source files Each of these were checked with gcc -E -I. ${SOURCE_FILE} \| grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:31 -08:00
Eric W. Biederman	efed687edc	tree-walk: init_tree_desc take an oid to get the hash algorithm To make it possible for git ls-tree to display the tree encoded in the hash algorithm of the oid specified to git ls-tree, update init_tree_desc to take as a parameter the oid of the tree object. Update all callers of init_tree_desc and init_tree_desc_gently to pass the oid of the tree object. Use the oid of the tree object to discover the hash algorithm of the oid and store that hash algorithm in struct tree_desc. Use the hash algorithm in decode_tree_entry and update_tree_entry_internal to handle reading a tree object encoded in a hash algorithm that differs from the repositories hash algorithm. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:57:40 -07:00
Junio C Hamano	3365e2675e	Merge branch 'jc/retire-get-sha1-hex' The implementation of "get_sha1_hex()" that reads a hexadecimal string that spells a full object name has been extended to cope with any hash function used in the repository, but the "sha1" in its name survived. Rename it to get_hash_hex(), a name that is more consistent within its friends like get_hash_hex_algop(). * jc/retire-get-sha1-hex: hex: retire get_sha1_hex()	2023-08-04 10:52:30 -07:00
Junio C Hamano	4488bb3bed	Merge branch 'tb/object-access-overflow-protection' Various offset computation in the code that accesses the packfiles and other data in the object layer has been hardened against arithmetic overflow, especially on 32-bit systems. * tb/object-access-overflow-protection: commit-graph.c: prevent overflow in `verify_commit_graph()` commit-graph.c: prevent overflow in `write_commit_graph()` commit-graph.c: prevent overflow in `merge_commit_graph()` commit-graph.c: prevent overflow in `split_graph_merge_strategy()` commit-graph.c: prevent overflow in `load_tree_for_commit()` commit-graph.c: prevent overflow in `fill_commit_in_graph()` commit-graph.c: prevent overflow in `fill_commit_graph_info()` commit-graph.c: prevent overflow in `load_oid_from_graph()` commit-graph.c: prevent overflow in add_graph_to_chain() commit-graph.c: prevent overflow in `write_commit_graph_file()` pack-bitmap.c: ensure that eindex lookups don't overflow midx.c: prevent overflow in `fill_included_packs_batch()` midx.c: prevent overflow in `write_midx_internal()` midx.c: store `nr`, `alloc` variables as `size_t`'s midx.c: prevent overflow in `nth_midxed_offset()` midx.c: prevent overflow in `nth_midxed_object_oid()` midx.c: use `size_t`'s for fanout nr and alloc packfile.c: use checked arithmetic in `nth_packed_object_offset()` packfile.c: prevent overflow in `load_idx()` packfile.c: prevent overflow in `nth_packed_object_id()`	2023-07-25 12:05:23 -07:00
Junio C Hamano	08e5fb1296	hex: retire get_sha1_hex() The naming convention around get_sha1_hex() and its friends is awkward these days, after "struct object_id" was introduced. There are three public functions around this area: * get_sha1_hex() - use the implied the_hash_algo, fill uchar * * get_oid_hex() - use the implied the_hash_algo, fill oid * * get_oid_hex_algop() - use the passed algop, fill oid * Between the latter two, the "_algop" suffix signals whether the the_hash_algo is used as the implied algorithm or the caller should pass an algorithm explicitly. That is very much understandable and is a good convention. Between the former two, however, the "SHA1" vs "OID" in the names differentiate in what type of variable the result is stored. We could argue that it makes sense to use "SHA1" to mean "flat byte buffer" to honor the historical practice in the days before "struct object_id" was invented, but the natural fourth friend of the above group would take an algop and fill a flat byte buffer, and it would be strange to name it get_sha1_hex_algop(). Do we use the passed in algo, or are we limited to SHA-1 ;-)? In fact, such a function exists, albeit as a private helper function used by the implementation of these functions, and is named a lot more sensibly: get_hash_hex_algop(). Correct the misnomer of get_sha1_hex() and use "hash", instead of "sha1", as "flat byte buffer that stores binary (as opposed to hexadecimal) representation of the hash". The four (2x2) friends now become: * get_hash_hex() - use the implied the_hash_algo, fill uchar * * get_oid_hex() - use the implied the_hash_algo, fill oid * * get_hash_hex_algop() - use the passed algop, fill uchar * * get_oid_hex_algop() - use the passed algop, fill oid * As there are only two remaining calls to get_sha1_hex() in the codebase right now, the blast radious of this change is fairly small. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-24 16:11:23 -07:00
Taylor Blau	a519abca02	packfile.c: use checked arithmetic in `nth_packed_object_offset()` In a similar spirit as the previous commits, ensure that we use `st_add()` or `st_mult()` when computing values that may overflow the 32-bit unsigned limit. Note that in each of these instances, we prevent 32-bit overflow already since we have explicit casts to `size_t`. So this code is OK as-is, but let's clarify it by using the `st_xyz()` helpers to make it obvious that we are performing the relevant computations using 64 bits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	42be681b33	packfile.c: prevent overflow in `load_idx()` Prevent an overflow when locating a pack's CRC offset when the number of packed items is greater than 2^32-1/hashsz by guarding the computation with an `st_mult()`. Note that to avoid truncating the result, the `crc_offset` member must itself become a `size_t`. The only usage of this variable (besides the assignment in `load_idx()`) is in `read_v2_anomalous_offsets()` in the index-pack code. There we use the `crc_offset` as a pointer offset, so we are already equipped to handle the type change. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:31:34 -07:00
Taylor Blau	de41d03e1c	packfile.c: prevent overflow in `nth_packed_object_id()` In `37fec86a83` (packfile: abstract away hash constant values, 2018-05-02), `nth_packed_object_id()` started using the variable `the_hash_algo->rawsz` instead of a fixed constant when trying to compute an offset into the ".idx" file for some object position. This can lead to surprising truncation when looking for an object towards the end of a large enough pack, like the following: (gdb) p hashsz $1 = 20 (gdb) p n $2 = 215043814 (gdb) p hashsz * n $3 = 5908984 , which is a debugger session broken on a known-bad call to the `nth_packed_object_id()` function. This behavior predates `37fec86a83`, and is original to the v2 index format, via: `74e34e1fca` (sha1_file.c: learn about index version 2, 2007-04-09). This is due to §6.4.4.1 of the C99 standard, which states that an untyped integer constant will take the first type in which the value can be accurately represented, among `int`, `long int`, and `long long int`. Since 20 can be represented as an `int`, and `n` is a 32-bit unsigned integer, the resulting computation is defined by §6.3.1.8, and the (signed) integer value representing `n` is converted to an unsigned type, meaning that `20 * n` (for `n` having type `uint32_t`) is equivalent to a multiplication between two unsigned 32-bit integers. When multiplying a sufficiently large `n`, the resulting value can exceed 2^32-1, wrapping around and producing an invalid result. Let's follow the example in `f86f769550` (compute pack .idx byte offsets using size_t, 2020-11-13) and replace this computation with `st_mult()`, which will ensure that the computation is done using 64-bits. While here, guard the corresponding computation for packs with v1 indexes, too. Though the likelihood of seeing a bug there is much smaller, since (a) v1 indexes are generated far less frequently than v2 indexes, and (b) they all correspond to packs no larger than 2 GiB, so having enough objects to trigger this overflow is unlikely if not impossible. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-12 21:44:59 -07:00
Calvin Wan	91c080dff5	git-compat-util: move alloc macros to git-compat-util.h alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:42:31 -07:00
Calvin Wan	da9502ff4d	treewide: remove unnecessary includes for wrapper.h Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:41:59 -07:00
Junio C Hamano	a1264a08a1	Merge branch 'en/header-split-cache-h-part-3' Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...	2023-06-29 16:43:21 -07:00
Junio C Hamano	b2166b0d49	Merge branch 'ds/remove-idx-before-pack' We create .pack and then .idx, we consider only packfiles that have .idx usable (those with only .pack are not ready yet), so we should remove .idx before removing .pack for consistency. * ds/remove-idx-before-pack: packfile: delete .idx files before .pack files	2023-06-29 16:43:20 -07:00
Elijah Newren	a034e9106f	object-store-ll.h: split this header out of object-store.h The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:54 -07:00
Derrick Stolee	0dd1324a73	packfile: delete .idx files before .pack files When installing a packfile, we place the .pack file before the .idx file. The intention is that Git scans for .idx files in the pack directory and then loads the .pack files from that list. However, when we delete packfiles, we do not do this in the reverse order as we should. The unlink_pack_path() method deletes the .pack followed by the .idx. This creates a window where the process could be interrupted between the .pack deletion and the .idx deletion, leaving the repository in a state that looks strange, but isn't actually too problematic if we assume the pack was safe to delete. The .idx without a .pack will cause some overhead, but will not interrupt other Git processes. This ordering was introduced into the 'git repack' builtin by `a1bbc6c017` (repack: rewrite the shell script in C, 2013-09-15), though we must be careful to track history through the code move in `8434e85d5f` (repack: refactor pack deletion for future use, 2019-06-10) to see that. This became more important after `73320e49ad` (builtin/repack.c: only collect fully-formed packs, 2023-06-07) changed how 'git repack' scanned for packfiles for use in the cruft pack process. It previously looked for .pack files, but that was problematic due to the order that packs are installed: repacks between the creation of a .pack and the creation of its .idx would result in hard failures. There is an independent proposal about what to do in the case of a .idx without a .pack during this 'git repack' scenario, but this change is focused on deleting .pack files more safely. Modify the order to delete the .idx before the .pack. The rest of the modifiers on the .pack should still come after the .pack so we know all of the presumed properties of the packfile as long as it exists in the filesystem, in case we wish to reinstate it by re-indexing the .pack file. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-20 13:38:41 -07:00
Junio C Hamano	ccd12a3d6c	Merge branch 'en/header-split-cache-h-part-2' More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...	2023-05-09 16:45:46 -07:00
Junio C Hamano	849c8b3dbf	Merge branch 'tb/pack-revindex-on-disk' The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()	2023-04-27 16:00:59 -07:00
Elijah Newren	5e3f94dfe3	treewide: remove cache.h inclusion due to previous changes Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-24 12:47:33 -07:00
Taylor Blau	65308ad8f7	pack-revindex: make `load_pack_revindex` take a repository In a future commit, we will introduce a `pack.readReverseIndex` configuration, which forces Git to generate the reverse index from scratch instead of loading it from disk. In order to avoid reading this configuration value more than once, we'll use the `repo_settings` struct to lazily load this value. In order to access the `struct repo_settings`, add a repository argument to `load_pack_revindex`, and update all callers to pass the correct instance (in all cases, `the_repository`). In certain instances, a new function-local variable is introduced to take the place of a `struct repository *` argument to the function itself to avoid propagating the new parameter even further throughout the tree. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:45 -07:00
Elijah Newren	87bed17907	object-file.h: move declarations for object-file.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:10 -07:00
Elijah Newren	75f273d9b7	treewide: be explicit about dependence on pack-revindex.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	74ea5c9574	treewide: be explicit about dependence on trace.h & trace2.h Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:08 -07:00
Junio C Hamano	6047b28eb7	Merge branch 'en/header-split-cleanup' Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers	2023-04-06 13:38:31 -07:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Junio C Hamano	e7dca80692	Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-04 08:25:52 -07:00
Ævar Arnfjörð Bjarmason	a5183d7696	cocci: apply the "promisor-remote.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "promisor-remote.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:46 -07:00
Elijah Newren	ec2f026961	csum-file.h: remove unnecessary inclusion of cache.h With the change in the last commit to move several functions to write-or-die.h, csum-file.h no longer needs to include cache.h. However, removing that include forces several other C files, which directly or indirectly dependend upon csum-file.h's inclusion of cache.h, to now be more explicit about their dependencies. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:55 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	d5ebb50dcb	wrapper.h: move declarations for wrapper.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Junio C Hamano	0717a424a7	Merge branch 'ds/reprepare-alternates-when-repreparing-packfiles' Once we start running, we assumed that the list of alternate object databases would never change. Hook into the machinery used to update the list of packfiles during runtime to update this list as well. * ds/reprepare-alternates-when-repreparing-packfiles: object-file: reprepare alternates when necessary	2023-03-19 15:03:12 -07:00
Junio C Hamano	d0732a8120	Merge branch 'jk/unused-post-2.39-part2' More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...	2023-03-17 14:03:09 -07:00
Derrick Stolee	e2d003dbed	object-file: reprepare alternates when necessary When an object is not found in a repository's object store, we sometimes call reprepare_packed_git() to see if the object was temporarily moved into a new pack-file (and its old pack-file or loose object was deleted). This process does a scan of each pack directory within each odb, but does not reevaluate if the odb list needs updating. Extend reprepare_packed_git() to also reprepare the alternate odb list by setting loaded_alternates to zero and calling prepare_alt_odb(). This will add newly-discoverd odbs to the linked list, but will not duplicate existing ones nor will it remove existing ones that are no longer listed in the alternates file. Do this under the object read lock to avoid readers from interacting with a potentially incomplete odb being added to the odb list. If the alternates file was edited to _remove_ some alternates during the course of the Git process, Git will continue to see alternates that were ever valid for that repository. ODBs are not removed from the list, the same as the existing behavior before this change. Git already has protections against an alternate directory disappearing from the filesystem during the lifetime of a process, and those are still in effect. This change is specifically for concurrent changes to the repository, so it is difficult to create a test that guarantees this behavior is correct. I manually verified by introducing a reprepare_packed_git() call into get_revision() and stepped into that call in a debugger with a parent 'git log' process. Multiple runs of prepare_alt_odb() kept the_repository->objects->odb as a single-item chain until I added a .git/objects/info/alternates file in a different process. The next run added the new odb to the chain and subsequent runs did not add to the chain. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-09 11:44:57 -08:00
Jeff King	be252d3349	for_each_object: mark unused callback parameters The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:13:31 -08:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Elijah Newren	36bf195890	alloc.h: move ALLOC_GROW() functions from cache.h This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:28 -08:00
Jeff King	c2f32bef9c	packfile: inline custom read_object() When the pack code was split into its own file[1], it got a copy of the static read_object() function. But there's only one caller here, so we could just inline it. And it's worth doing so, as the name read_object() invites comparisons to the public read_object_file(), but the two don't behave quite the same. [1] The move happened over several commits, but the relevant one here is `f1d8130be0` (pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry(), 2017-08-18). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-08 10:52:55 +09:00
Junio C Hamano	dd407f1c7c	Merge branch 'ab/unused-annotation' Undoes 'jk/unused-annotation' topic and redoes it to work around Coccinelle rules misfiring false positives in unrelated codepaths. * ab/unused-annotation: git-compat-util.h: use "deprecated" for UNUSED variables git-compat-util.h: use "UNUSED", not "UNUSED(var)"	2022-09-14 12:56:39 -07:00
Junio C Hamano	a6b42ec0c6	Merge branch 'jk/unused-annotation' Annotate function parameters that are not used (but cannot be removed for structural reasons), to prepare us to later compile with -Wunused warning turned on. * jk/unused-annotation: is_path_owned_by_current_uid(): mark "report" parameter as unused run-command: mark unused async callback parameters mark unused read_tree_recursive() callback parameters hashmap: mark unused callback parameters config: mark unused callback parameters streaming: mark unused virtual method parameters transport: mark bundle transport_options as unused refs: mark unused virtual method parameters refs: mark unused reflog callback parameters refs: mark unused each_ref_fn parameters git-compat-util: add UNUSED macro	2022-09-14 12:56:39 -07:00
Ævar Arnfjörð Bjarmason	5cf88fd8b0	git-compat-util.h: use "UNUSED", not "UNUSED(var)" As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of `9b24034754` (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:49:48 -07:00
Junio C Hamano	01a30a5a58	Merge branch 'jk/is-promisor-object-keep-tree-in-use' An earlier optimization discarded a tree-object buffer that is still in use, which has been corrected. * jk/is-promisor-object-keep-tree-in-use: is_promisor_object(): fix use-after-free of tree buffer	2022-08-25 14:42:31 -07:00
Jeff King	02c3c59e62	hashmap: mark unused callback parameters Hashmap comparison functions must conform to a particular callback interface, but many don't use all of their parameters. Especially the void cmp_data pointer, but some do not use keydata either (because they can easily form a full struct to pass when doing lookups). Let's mark these to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:55 -07:00
Junio C Hamano	363a193c3a	Merge branch 'jk/fsck-tree-mode-bits-fix' "git fsck" reads mode from tree objects but canonicalizes the mode before passing it to the logic to check object sanity, which has hid broken tree objects from the checking logic. This has been corrected, but to help exiting projects with broken tree objects that they cannot fix retroactively, the severity of anomalies this code detects has been demoted to "info" for now. * jk/fsck-tree-mode-bits-fix: fsck: downgrade tree badFilemode to "info" fsck: actually detect bad file modes in trees tree-walk: add a mechanism for getting non-canonicalized modes	2022-08-18 13:07:04 -07:00
Jeff King	1490d7d82d	is_promisor_object(): fix use-after-free of tree buffer Since commit `fcc07e980b` (is_promisor_object(): free tree buffer after parsing, 2021-04-13), we'll always free the buffers attached to a "struct tree" after searching them for promisor links. But there's an important case where we don't want to do so: if somebody else is already using the tree! This can happen during a "rev-list --missing=allow-promisor" traversal in a partial clone that is missing one or more trees or blobs. The backtrace for the free looks like this: #1 free_tree_buffer tree.c:147 #2 add_promisor_object packfile.c:2250 #3 for_each_object_in_pack packfile.c:2190 #4 for_each_packed_object packfile.c:2215 #5 is_promisor_object packfile.c:2272 #6 finish_object__ma builtin/rev-list.c:245 #7 finish_object builtin/rev-list.c:261 #8 show_object builtin/rev-list.c:274 #9 process_blob list-objects.c:63 #10 process_tree_contents list-objects.c:145 #11 process_tree list-objects.c:201 #12 traverse_trees_and_blobs list-objects.c:344 [...] We're in the middle of walking through the entries of a tree object via process_tree_contents(). We see a blob (or it could even be another tree entry) that we don't have, so we call is_promisor_object() to check it. That function loops over all of the objects in the promisor packfile, including the tree we're currently walking. When we're done with it there, we free the tree buffer. But as we return to the walk in process_tree_contents(), it's still holding on to a pointer to that buffer, via its tree_desc iterator, and it accesses the freed memory. Even a trivial use of "--missing=allow-promisor" triggers this problem, as the included test demonstrates (it's just a vanilla --blob:none clone). We can detect this case by only freeing the tree buffer if it was allocated on our behalf. This is a little tricky since that happens inside parse_object(), and it doesn't tell us whether the object was already parsed, or whether it allocated the buffer itself. But by checking for an already-parsed tree beforehand, we can distinguish the two cases. That feels a little hacky, and does incur an extra lookup in the object-hash table. But that cost is fairly minimal compared to actually loading objects (and since we're iterating the whole pack here, we're likely to be loading most objects, rather than reusing cached results). It may also be a good direction for this function in general, as there are other possible optimizations that rely on doing some analysis before parsing: - we could detect blobs and avoid reading their contents; they can't link to other objects, but parse_object() doesn't know that we don't care about checking their hashes. - we could avoid allocating object structs entirely for most objects (since we really only need them in the oidset), which would save some memory. - promisor commits could use the commit-graph rather than loading the object from disk This commit doesn't do any of those optimizations, but I think it argues that this direction is reasonable, rather than relying on parse_object() and trying to teach it to give us more information about whether it parsed. The included test fails reliably under SANITIZE=address just when running "rev-list --missing=allow-promisor". Checking the output isn't strictly necessary to detect the bug, but it seems like a reasonable addition given the general lack of coverage for "allow-promisor" in the test suite. Reported-by: Andrew Olsen <andrew.olsen@koordinates.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-14 18:03:36 -07:00
Jeff King	ec18b10bf2	tree-walk: add a mechanism for getting non-canonicalized modes When using init_tree_desc() and tree_entry() to iterate over a tree, we always canonicalize the modes coming out of the tree. This is a good thing to prevent bugs or oddities in normal code paths, but it's counter-productive for tools like fsck that want to see the exact contents. We can address this by adding an option to avoid the extra canonicalization. A few notes on the implementation: - I've attached the new option to the tree_desc struct itself. The actual code change is in decode_tree_entry(), which is in turn called by the public update_tree_entry(), tree_entry(), and init_tree_desc() functions, plus their "gently" counterparts. By letting it ride along in the struct, we can avoid changing the signature of those functions, which are called many times. Plus it's conceptually simpler: you really want a particular iteration of a tree to be "raw" or not, rather than individual calls. - We still have to set the new option somewhere. The struct is initialized by init_tree_desc(). I added the new flags field only to the "gently" version. That avoids disturbing the much more numerous non-gentle callers, and it makes sense that anybody being careful about looking at raw modes would also be careful about bogus trees (i.e., the caller will be something like fsck in the first place). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-10 14:26:25 -07:00
Junio C Hamano	4e0d160bbc	Merge branch 'rs/mergesort' Make our mergesort implementation type-safe. * rs/mergesort: mergesort: remove llist_mergesort() packfile: use DEFINE_LIST_SORT fetch-pack: use DEFINE_LIST_SORT commit: use DEFINE_LIST_SORT blame: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT_DEBUG mergesort: add macros for typed sort of linked lists mergesort: tighten merge loop mergesort: unify ranks loops	2022-08-03 13:36:09 -07:00
René Scharfe	9b9f5f6217	packfile: use DEFINE_LIST_SORT Build a typed sort function for packed_git lists using DEFINE_LIST_SORT instead of calling llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 20218 320 0 110936 131474 20192 packfile.o With this patch: __TEXT __DATA __OBJC others dec hex 20430 320 0 112619 133369 208f9 packfile.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:39 -07:00
Junio C Hamano	2b970bc09f	Merge branch 'jk/optim-promisor-object-enumeration' Collection of what is referenced by objects in promisor packs have been optimized to inspect these objects in the in-pack order. * jk/optim-promisor-object-enumeration: is_promisor_object(): walk promisor packs in pack-order	2022-07-11 15:38:50 -07:00
Jeff King	18c08abc82	is_promisor_object(): walk promisor packs in pack-order When we generate the list of promisor objects, we walk every pack with a .promisor file and examine its objects for any links to other objects. By default, for_each_packed_object() will go in pack .idx order. This is the worst case with respect to our delta base cache. If we have a delta chain of A->B->C->D, then visiting A may require reconstructing both B and C, unless we also visited B recently, in which case we may have cached its value. Because .idx order is based on sha1, it's random with respect to the actual object contents and deltas, and thus we're unlikely to get many cache hits. If we instead traverse in pack order, then we get the optimal case: packs are written to keep delta families together, and to place bases before their children. Even on a modest repository like git.git, this has a noticeable speedup on p5600.4, which runs "fsck" on a partial clone with blob:none (so lots of trees which need to be walked, and which delta well): Test HEAD^ HEAD ------------------------------------------------------- 5600.4: 17.87(17.83+0.04) 15.42(15.35+0.06) -13.7% On a larger repository like linux.git, the speedup is even more pronounced: Test HEAD^ HEAD ----------------------------------------------------------- 5600.4: 322.47(322.01+0.42) 186.41(185.76+0.63) -42.2% Any other operations that call is_promisor_object(), like "rev-list --exclude-promisor-objects", would similarly benefit, but the invocations in p5600 don't actually trigger any such cases. Note that we may pay a small price to build a rev-index in-memory to do the pack-order traversal. But it's still a big net win, and even that small cost goes away if you are using pack.writeReverseIndex. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-16 10:03:40 -07:00
Junio C Hamano	a50036da1a	Merge branch 'tb/cruft-packs' A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt	2022-06-03 14:30:37 -07:00
Taylor Blau	94cd775a6c	pack-mtimes: support reading .mtimes files To store the individual mtimes of objects in a cruft pack, introduce a new `.mtimes` format that can optionally accompany a single pack in the repository. The format is defined in Documentation/technical/pack-format.txt, and stores a 4-byte network order timestamp for each object in name (index) order. This patch prepares for cruft packs by defining the `.mtimes` format, and introducing a basic API that callers can use to read out individual mtimes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Junio C Hamano	2b0a58d164	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35 * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-02 10:06:04 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Junio C Hamano	c9c082850d	Merge branch 'jt/pack-header-lshift-overflow' * jt/pack-header-lshift-overflow: packfile: fix off-by-one error in decoding logic	2022-01-12 15:11:41 -08:00
Junio C Hamano	a5c97b0164	packfile: fix off-by-one error in decoding logic shift count being exactly at 7-bit smaller than the long is OK; on 32-bit architecture, shift count starts at 4 and goes through 11, 18 and 25, at which point the guard triggers one iteration too early. Reported-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-12 12:14:49 -08:00
Junio C Hamano	79aee56c1e	Merge branch 'tb/pack-revindex-on-disk-cleanup' Code clean-up. * tb/pack-revindex-on-disk-cleanup: packfile: make `close_pack_revindex()` static	2021-12-15 09:39:50 -08:00
Junio C Hamano	2d5b70de2d	Merge branch 'jt/pack-header-lshift-overflow' The code to decode the length of packed object size has been corrected. * jt/pack-header-lshift-overflow: packfile: avoid overflowing shift during decode	2021-12-10 14:35:08 -08:00
Taylor Blau	0bf0de6cc7	packfile: make `close_pack_revindex()` static Since its definition in `2f4ba2a867` (packfile: prepare for the existence of '*.rev' files, 2021-01-25), the only caller of `close_pack_revindex()` was within packfile.c. Thus there is no need for this to be exposed via packfile.h, and instead can remain static within packfile.c's compilation unit. While we're here, move the function's opening brace onto its own line. Noticed-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-04 23:01:38 -08:00
Junio C Hamano	f9ba6acaa9	Merge branch 'mc/clean-smudge-with-llp64' The clean/smudge conversion code path has been prepared to better work on platforms where ulong is narrower than size_t. * mc/clean-smudge-with-llp64: clean/smudge: allow clean filters to process extremely large files odb: guard against data loss checking out a huge file git-compat-util: introduce more size_t helpers odb: teach read_blob_entry to use size_t t1051: introduce a smudge filter test for extremely large files test-lib: add prerequisite for 64-bit platforms test-tool genzeros: generate large amounts of data more efficiently test-genzeros: allow more than 2G zeros in Windows	2021-11-29 15:41:51 -08:00
Jonathan Tan	34de5b8eac	packfile: avoid overflowing shift during decode unpack_object_header_buffer() attempts to protect against overflowing left shifts, but the limit of the shift amount should not be the size of the variable being shifted. It should be the size minus the size of its contents. Fix that accordingly. This was noticed at $DAYJOB by a fuzzer running internally. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-11 10:06:37 -08:00
Matt Cooper	d6a09e795d	odb: guard against data loss checking out a huge file This introduces an additional guard for platforms where `unsigned long` and `size_t` are not of the same size. If the size of an object in the database would overflow `unsigned long`, instead we now exit with an error. A complete fix will have to update _many_ other functions throughout the codebase to use `size_t` instead of `unsigned long`. It will have to be implemented at some stage. This commit puts in a stop-gap for the time being. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matt Cooper <vtbassmatt@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-03 11:22:27 -07:00
Junio C Hamano	068966d2e8	Merge branch 'rs/close-pack-leakfix' Leakfix. * rs/close-pack-leakfix: packfile: release bad_objects in close_pack()	2021-10-03 21:49:20 -07:00
René Scharfe	8c6b4332b4	packfile: release bad_objects in close_pack() Unusable entries of a damaged pack file are recorded in the oidset bad_objects. Release it when we're done with the pack. This doesn't affect intact packs because an empty oidset requires no allocation. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-24 09:22:46 -07:00
Junio C Hamano	28caad63d0	Merge branch 'rs/packfile-bad-object-list-in-oidset' Replace a handcrafted data structure used to keep track of bad objects in the packfile API by an oidset. * rs/packfile-bad-object-list-in-oidset: packfile: use oidset for bad objects packfile: convert has_packed_and_bad() to object_id packfile: convert mark_bad_packed_object() to object_id midx: inline nth_midxed_pack_entry() oidset: make oidset_size() an inline function	2021-09-23 13:44:46 -07:00
Junio C Hamano	0649303820	Merge branch 'tb/multi-pack-bitmaps' The reachability bitmap file used to be generated only for a single pack, but now we've learned to generate bitmaps for history that span across multiple packfiles. * tb/multi-pack-bitmaps: (29 commits) pack-bitmap: drop bitmap_index argument from try_partial_reuse() pack-bitmap: drop repository argument from prepare_midx_bitmap_git() p5326: perf tests for MIDX bitmaps p5310: extract full and partial bitmap tests midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' t7700: update to work with MIDX bitmap test knob t5319: don't write MIDX bitmaps in t5319 t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP t5326: test multi-pack bitmap behavior t/helper/test-read-midx.c: add --checksum mode t5310: move some tests to lib-bitmap.sh pack-bitmap: write multi-pack bitmaps pack-bitmap: read multi-pack bitmaps pack-bitmap.c: avoid redundant calls to try_partial_reuse pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' pack-bitmap.c: introduce 'nth_bitmap_object_oid()' pack-bitmap.c: introduce 'bitmap_num_objects()' midx: avoid opening multiple MIDXs when writing midx: close linked MIDXs, avoid leaking memory ...	2021-09-20 15:20:39 -07:00
René Scharfe	09ef66179b	packfile: use oidset for bad objects Store the object ID of broken pack entries in an oidset instead of keeping only their hashes in an unsorted array. The resulting code is shorter and easier to read. It also handles the (hopefully) very rare case of having a high number of bad objects better. Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-12 16:14:32 -07:00
René Scharfe	7407d733a4	packfile: convert has_packed_and_bad() to object_id The single caller has a full object ID, so pass it on instead of just its hash member. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-12 16:14:32 -07:00
René Scharfe	751530de5d	packfile: convert mark_bad_packed_object() to object_id All callers have full object IDs, so pass them on instead of just their hash member. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-12 16:14:32 -07:00
Taylor Blau	0f533c7284	pack-bitmap: read multi-pack bitmaps This prepares the code in pack-bitmap to interpret the new multi-pack bitmaps described in Documentation/technical/bitmap-format.txt, which mostly involves converting bit positions to accommodate looking them up in a MIDX. Note that there are currently no writers who write multi-pack bitmaps, and that this will be implemented in the subsequent commit. Note also that get_midx_checksum() and get_midx_filename() are made non-static so they can be called from pack-bitmap.c. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-01 13:56:43 -07:00
Taylor Blau	a241878ac7	object-store.h: teach for_each_packed_object to ignore kept packs The next patch will reimplement a function that wants to iterate over packed objects while ignoring packs which are marked as kept (either in-core or on-disk). Teach for_each_packed_object() to ignore all objects from those packs by adding a new flag for each of the "kept" states that a pack can be in. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-29 23:23:39 -07:00
Eric Wong	dc05929411	xmmap: inform Linux users of tuning knobs on ENOMEM Linux users may benefit from additional information on how to avoid ENOMEM from mmap despite the system having enough RAM to accomodate them. We can't reliably unmap pack windows to work around the issue since malloc and other library routines may mmap without our knowledge. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-29 23:14:25 -07:00

1 2 3 4 5 ...

356 Commits (48bc5094de4d2c549efd82780d4488071955d4ff)