kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	b8ae9bb7f5	Merge branch 'ps/object-read-stream' into jch The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`	2025-11-30 18:32:10 -08:00
Junio C Hamano	eceb7a6893	Merge branch 'ps/object-source-management' into jch Code refactoring around object database sources. * ps/object-source-management: odb: handle recreation of quarantine directories odb: handle changing a repository's commondir chdir-notify: add function to unregister listeners odb: handle initialization of sources in `odb_new()` http-push: stop setting up `the_repository` for each reference t/helper: stop setting up `the_repository` repeatedly builtin/index-pack: fix deferred fsck outside repos oidset: introduce `oidset_equal()` odb: move logic to disable ref updates into repo odb: refactor `odb_clear()` to `odb_free()` odb: adopt logic to close object databases setup: convert `set_git_dir()` to have file scope path: move `enter_repo()` into "setup.c"	2025-11-30 18:32:08 -08:00
Junio C Hamano	a545103244	Merge branch 'ps/object-source-loose' A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable	2025-11-24 15:46:41 -08:00
Patrick Steinhardt	1599b68d5e	streaming: move into object database subsystem The "streaming" terminology is somewhat generic, so it may not be immediately obvious that "streaming.{c,h}" is specific to the object database. Rectify this by moving it into the "odb/" directory so that it can be immediately attributed to the object subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:46 -08:00
Patrick Steinhardt	8c1b84bc97	streaming: move logic to read packed objects streams into backend Move the logic to read packed object streams into the respective subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	385e18810f	packfile: introduce function to read object info from a store Extract the logic to read object info for a packed object from `do_oid_object_into_extended()` into a standalone function that operates on the packfile store. This function will be used in a subsequent commit. Note that this change allows us to make `find_pack_entry()` an internal implementation detail. As a consequence though we have to move around `packfile_store_freshen_object()` so that it is defined after that function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	9aaba57993	odb: adopt logic to close object databases The logic to close an object database is currently contained in the packfile subsystem. That choice is somewhat relatable, as most of the logic really is to close resources associated with the packfile store itself. But we also end up handling object sources and commit graphs, which certainly is not related to packfiles. Move the function into the object database subsystem and rename it to `odb_close()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-19 17:41:03 -08:00
Patrick Steinhardt	f2bd88a308	object-file: refactor freshening of objects When writing an object that already exists in our object database we skip the write and instead only update mtimes of the object, either in its packed or loose object format. This logic is wholly contained in "object-file.c", but that file is really only concerned with loose objects. So it does not really make sense that it also contains the logic to freshen a packed object. Introduce a new `odb_freshen_object()` function that sits on the object database level and two functions `packfile_store_freshen_object()` and `odb_source_loose_freshen_object()`. Like this, the format-specific functions can be part of their respective subsystems, while the backend agnostic function to freshen an object sits at the object database layer. Note that this change also moves the logic that iterates through object sources from the object source layer into the object database layer. This change is intentional: object sources should ideally only have to worry about themselves, and coordination of different sources should be handled on the object database level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 12:18:47 -08:00
Patrick Steinhardt	c31bad4f7d	packfile: track packs via the MRU list exclusively We track packfiles via two different lists: - `struct packfile_store::packs` is a list that sorts local packs first. In addition, these packs are sorted so that younger packs are sorted towards the front. - `struct packfile_store::mru` is a list that sorts packs so that most-recently used packs are at the front. The reasoning behind the ordering in the `packs` list is that younger objects stored in the local object store tend to be accessed more frequently, and that is certainly true for some cases. But there are going to be lots of cases where that isn't true. Especially when traversing history it is likely that one needs to access many older objects, and due to our housekeeping it is very likely that almost all of those older objects will be contained in one large pack that is oldest. So whether or not the ordering makes sense really depends on the use case at hand. A flexible approach like our MRU list addresses that need, as it will sort packs towards the front that are accessed all the time. Intuitively, this approach is thus able to satisfy more use cases more efficiently. This reasoning casts some doubt on whether or not it really makes sense to track packs via two different lists. It causes confusion, and it is not clear whether there are use cases where the `packs` list really is such an obvious choice. Merge these two lists into one most-recently-used list. Note that there is one important edge case: `for_each_packed_object()` uses the MRU list to iterate through packs, and then it lists each object in those packs. This would have the effect that we now sort the current pack towards the front, thus modifying the list of packfiles we are iterating over, with the consequence that we'll see an infinite loop. This edge case is worked around by introducing a new field that allows us to skip updating the MRU. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:53 -07:00
Patrick Steinhardt	6aff1f25a0	packfile: always add packfiles to MRU when adding a pack When preparing the packfile store we know to also prepare the MRU list of packfiles with all packs that are currently loaded in the store via `packfile_store_prepare_mru()`. So we know that the list of packs in the MRU list should match the list of packs in the non-MRU list. But there are some direct or indirect callsites that add a packfile to the store via `packfile_store_add_pack()` without adding the pack to the MRU. And while functions that access the MRU (e.g. `find_pack_entry()`) know to call `packfile_store_prepare()`, which knows to prepare the MRU via `packfile_store_prepare_mru()`, that operation will be turned into a no-op because the packfile store is already prepared. So this will not cause us to add the packfile to the MRU, and consequently we won't be able to find the packfile in our MRU list. There are only a handful of callers outside of "packfile.c" that add a packfile to the store: - "builtin/fast-import.c" adds multiple packs of imported objects, but it knows to look up objects via `packfile_store_get_packs()`. This function does not use the MRU, so we're good. - "builtin/index-pack.c" adds the indexed pack to the store in case it needs to perform consistency checks on its objects. - "http.c" adds the fetched pack to the store so that we can access its objects. In all of these cases we actually want to access the contained objects. And luckily, reading these objects works as expected: 1. We eventually end up in `do_oid_object_info_extended()`. 2. Calling `find_pack_entry()` fails because the MRU list doesn't contain the newly added packfile. 3. The callers don't pass `OBJECT_INFO_QUICK`, so we end up repreparing the object database. This will also cause us to reprepare the MRU list. 4. We now retry reading the object via `find_pack_entry()`, and now we succeed because the MRU list got populated. This logic feels quite fragile: we intentionally add the packfile to the store, but we then ultimately rely on repreparing the entire store only to make the packfile accessible. While we do the correct thing in `do_oid_object_info_extended()`, other sites that access the MRU may not know to reprepare. But besides being fragile it's also a waste of resources: repreparing the object database requires us to re-read the alternates file and discard any caches. Refactor the code so that we unconditionally add packfiles to the MRU when adding them to a packfile store. This makes the logic less fragile and ensures that we don't have to reprepare the store to make the pack accessible. Note that this does not allow us to drop `packfile_store_prepare_mru()` just yet: while the MRU list is already populated with all packs now, the order in which we add these packs is indeterministic for most of the part. So by first calling `sort_pack()` on the other packfile list and then re-preparing the MRU list we inherit its sorting. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:53 -07:00
Patrick Steinhardt	589127caa7	packfile: move list of packs into the packfile store Move the list of packs into the packfile store. This follows the same logic as in a previous commit, where we moved the most-recently-used list of packs, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:53 -07:00
Patrick Steinhardt	02a7f6ffab	packfile: fix approximation of object counts When approximating the number of objects in a repository we only take into account two data sources, the multi-pack index and the packfile indices, as both of these data structures allow us to easily figure out how many objects they contain. But the way we currently approximate the number of objects is broken in presence of a multi-pack index. This is due to two separate reasons: - We have recently introduced initial infrastructure for incremental multi-pack indices. Starting with that series, `num_objects` only counts the number of objects of a specific layer of the MIDX chain, so we do not take into account objects from parent layers. This issue is fixed by adding `num_objects_in_base`, which contains the sum of all objects in previous layers. - When using the multi-pack index we may count objects contained in packfiles twice: once via the multi-pack index, but then we again count them via the packfile itself. This issue is fixed by skipping any packfiles that have an MIDX. Overall, given that we _always_ count the packs, we can only end up overestimating the number of objects, and the overestimation is limited to a factor of two at most. The consequences of those issues are very limited though, as we only approximate object counts in a small number of cases: - When writing a commit-graph we use the approximate object count to display the upper limit of a progress display. - In `repo_find_unique_abbrev_r()` we use it to specify a lower limit of how many hex digits we want to abbreviate to. Given that we use power-of-two here to derive the lower limit we may end up with an abbreviated hash that is one digit longer than required. - In `estimate_repack_memory()` we may end up overestimating how much memory a repack needs to pack objects. Conseuqently, we may end up dropping some packfiles from a repack. None of these are really game-changing. But it's nice to fix those issues regardless. While at it, convert the code to use `repo_for_each_pack()`. Furthermore, use `odb_prepare_alternates()` instead of explicitly preparing the packfile store. We really only want to prepare the object database sources, and `get_multi_pack_index()` already knows to prepare the packfile store for us. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:52 -07:00
Patrick Steinhardt	89219bc0cd	http: refactor subsystem to use `packfile_list`s The dumb HTTP protocol directly fetches packfiles from the remote server and temporarily stores them in a list of packfiles. Those packfiles are not yet added to the repository's packfile store until we finalize the whole fetch. Refactor the code to instead use a `struct packfile_list` to store those packs. This prepares us for a subsequent change where the `->next` pointer of `struct packed_git` will go away. Note that this refactoring creates some temporary duplication of code, as we now have both `packfile_list_find_oid()` and `find_oid_pack()`. The latter function will be removed in a subsequent commit though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:52 -07:00
Patrick Steinhardt	f905a855b1	packfile: move the MRU list into the packfile store Packfiles have two lists associated to them: - A list that keeps track of packfiles in the order that they were added to a packfile store. - A list that keeps track of packfiles in most-recently-used order so that packfiles that are more likely to contain a specific object are ordered towards the front. Both of these lists are hosted by `struct packed_git` itself, So to identify all packfiles in a repository you simply need to grab the first packfile and then iterate the `->next` pointers or the MRU list. This pattern has the problem that all packfiles are part of the same list, regardless of whether or not they belong to the same object source. With the upcoming pluggable object database effort this needs to change: packfiles should be contained by a single object source, and reading an object from any such packfile should use that source to look up the object. Consequently, we need to break up the global lists of packfiles into per-object-source lists. A first step towards this goal is to move those lists out of `struct packed_git` and into the packfile store. While the packfile store is currently sitting on the `struct object_database` level, the intent is to push it down one level into the `struct odb_source` in a subsequent patch series. Introduce a new `struct packfile_list` that is used to manage lists of packfiles and use it to store the list of most-recently-used packfiles in `struct packfile_store`. For now, the new list type is only used in a single spot, but we'll expand its usage in subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:52 -07:00
Patrick Steinhardt	e78ab37054	packfile: use a `strmap` to store packs by name To allow fast lookups of a packfile by name we use a hashmap that has the packfile name as key and the pack itself as value. But while this is the perfect use case for a `strmap`, we instead use `struct hashmap` and store the hashmap entry in the packfile itself. Simplify the code by using a `strmap` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:09:52 -07:00
Patrick Steinhardt	ecad863c12	packfile: rename `packfile_store_get_all_packs()` In a preceding commit we have removed `packfile_store_get_packs()`. With this function removed it's somewhat useless to still have the "all" infix in `packfile_store_get_all_packs()`. Rename the latter to drop that infix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:40 -07:00
Patrick Steinhardt	86d8c62f48	packfile: introduce macro to iterate through packs We have a bunch of different sites that want to iterate through all packs of a given `struct packfile_store`. This pattern is somewhat verbose and repetitive, which makes it somewhat cumbersome. Introduce a new macro `repo_for_each_pack()` that removes some of the boilerplate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:39 -07:00
Patrick Steinhardt	5b410c8276	packfile: drop `packfile_store_get_packs()` In the preceding commits we have removed all remaining callers of `packfile_store_get_packs()`, the function is thus unused now. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:39 -07:00
Patrick Steinhardt	dd52a29b78	packfile: refactor `get_packed_git_mru()` to work on packfile store The `get_packed_git_mru()` function prepares the packfile store and then returns its packfiles in most-recently-used order. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:51 -07:00
Patrick Steinhardt	d2779beb36	packfile: refactor `get_all_packs()` to work on packfile store The `get_all_packs()` function prepares the packfile store and then returns its packfiles. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:51 -07:00
Patrick Steinhardt	751808b2a1	packfile: refactor `get_packed_git()` to work on packfile store The `get_packed_git()` function prepares the packfile store and then returns its packfiles. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:51 -07:00
Patrick Steinhardt	ab8aff4a6b	packfile: move `get_multi_pack_index()` into "midx.c" The `get_multi_pack_index()` function is declared and implemented in the packfile subsystem, even though it really belongs into the multi-pack index subsystem. The reason for this is likely that it needs to call `packfile_store_prepare()`, which is not exposed by the packfile system. In a subsequent commit we're about to add another caller outside of the packfile system though, so we'll have to expose the function anyway. Do so now already and move `get_multi_pack_index()` into the MIDX subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Patrick Steinhardt	d67530f6bb	packfile: introduce function to load and add packfiles We have a recurring pattern where we essentially perform an upsert of a packfile in case it isn't yet known by the packfile store. The logic to do so is non-trivial as we have to reconstruct the packfile's key, check the map of packfiles, then create the new packfile and finally add it to the store. Introduce a new function that does this dance for us. Refactor callsites to use it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Patrick Steinhardt	f6f236d926	packfile: refactor `install_packed_git()` to work on packfile store The `install_packed_git()` functions adds a packfile to a specific object store. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Patrick Steinhardt	78237ea53d	packfile: split up responsibilities of `reprepare_packed_git()` In `reprepare_packed_git()` we perform a couple of operations: - We reload alternate object directories. - We clear the loose object cache. - We reprepare packfiles. While the logic is hosted in "packfile.c", it clearly reaches into other subsystems that aren't related to packfiles. Split up the responsibility and introduce `odb_reprepare()` which now becomes responsible for repreparing the whole object database. The existing `reprepare_packed_git()` function is refactored accordingly and only cares about reloading the packfile store now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Patrick Steinhardt	c36ecc0685	packfile: refactor `prepare_packed_git()` to work on packfile store The `prepare_packed_git()` function and its friends are responsible for loading packfiles as well as the multi-pack index for a given object database. Refactor these functions to accept a packfile store instead of a repository to clarify their scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Patrick Steinhardt	995ee88027	packfile: reorder functions to avoid function declaration Reorder functions so that we can avoid a forward declaration of `prepare_packed_git()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	bd1a521de8	odb: move kept cache into `struct packfile_store` The object database tracks a cache of "kept" packfiles, which is used by git-pack-objects(1) to handle cruft objects. With the introduction of the `struct packfile_store` we have a better place to host this cache though. Move the cache accordingly. This moves the last bit of packfile-related state from the object database into the packfile store. Adapt the comment for the `packfiles` pointer in `struct object_database` to reflect this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	fe835b0ca0	odb: move MRU list of packfiles into `struct packfile_store` The object database tracks the list of packfiles in most-recently-used order, which is mostly used to favor reading from packfiles that contain most of the objects that we're currently accessing. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the list accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	14aaf5c9d8	odb: move packfile map into `struct packfile_store` The object database tracks a map of packfiles by their respective paths, which is used to figure out whether a given packfile has already been loaded. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the map accordingly. `pack_map_entry_cmp()` isn't used anywhere but in "packfile.c" anymore after this change, so we convert it to a static function, as well. Note that we also drop the `inline` hint: the function is used as a callback function exclusively, and callbacks cannot be inlined. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	3421cb56a8	odb: move initialization bit into `struct packfile_store` The object database knows to skip re-initializing the list of packfiles in case it's already been initialized. Whether or not that is the case is tracked via a separate `initialized` bit that is stored in the object database. With the introduction of the `struct packfile_store` we have a better place to host this bit though. Move it accordingly. While at it, convert the field into a boolean now that we're allowed to use them in our code base. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:49 -07:00
Patrick Steinhardt	535b7a667a	odb: move list of packfiles into `struct packfile_store` The object database tracks the list of packfiles it currently knows about. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the list accordingly. Extract the logic from `odb_clear()` that knows to close all such packfiles and move it into the new subsystem, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:48 -07:00
Patrick Steinhardt	b7983adb51	packfile: introduce a new `struct packfile_store` Information about an object database's packfiles is currently distributed across two different structures: - `struct packed_git` contains the `next` pointer as well as the `mru_head`, both of which serve to store the list of packfiles. - `struct object_database` contains several fields that relate to the packfiles. So we don't really have a central data structure that tracks our packfiles, and consequently responsibilities aren't always clear cut. A consequence for the upcoming pluggable object databases is that this makes it very hard to move management of packfiles from the object database level down into the object database source. Introduce a new `struct packfile_store` which is about to become the single source of truth for managing packfiles. Right now this data structure doesn't yet contain anything, but in subsequent patches we will move all data structures that relate to packfiles and that are currently contained in `struct object_database` into this new home. Note that this is only a first step: most importantly, we won't (yet) move the `struct packed_git::next` pointer around. This will happen in a subsequent patch series though so that `struct packed_git` will really only host information about the specific packfile it represents. Further note that the new structure still sits at the wrong level at the end of this patch series: as mentioned, it should eventually sit at the level of the object database source, not at the object database level. But introducing the packfile store now already makes it way easier to eventually push down the now-selfcontained data structure by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:48 -07:00
Patrick Steinhardt	9ff2129615	midx: drop redundant `struct repository` parameter There are a couple of functions that take both a `struct repository` and a `struct multi_pack_index`. This provides redundant information though without much benefit given that the multi-pack index already has a pointer to its owning repository. Drop the `struct repository` parameter from such functions. While at it, reorder the list of parameters of `fill_midx_entry()` so that the MIDX comes first to better align with our coding guidelines. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-08-11 09:22:22 -07:00
Patrick Steinhardt	595bef7180	odb: store locality in object database sources Object database sources are classified either as: - Local, which means that the source is the repository's primary source. This is typically ".git/objects". - Non-local, which is everything else. Most importantly this includes alternates and quarantine directories. This locality is often computed ad-hoc by checking whether a given object source is the first one. This works, but it is quite roundabout. Refactor the code so that we store locality when creating the sources in the first place. This makes it both more accessible and robust. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-08-11 09:22:21 -07:00
Patrick Steinhardt	ec865d94d4	midx: remove now-unused linked list of multi-pack indices In the preceding commits we have migrated all users of the linked list of multi-pack indices to instead use those stored in the object database sources. Remove those now-unused pointers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:30 -07:00
Patrick Steinhardt	c620586fcc	packfile: stop using linked MIDX list in `get_all_packs()` Refactor `get_all_packs()` so that we stop using the linked list of multi-pack indices. Note that there is no need to explicitly prepare alternates, and neither do we have to use `get_multi_pack_index()`, because `prepare_packed_git()` already takes care of populating all data structures for us. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:30 -07:00
Patrick Steinhardt	7fc1998392	packfile: stop using linked MIDX list in `find_pack_entry()` Refactor `find_pack_entry()` so that we stop using the linked list of multi-pack indices. Note that there is no need to explicitly prepare alternates, and neither do we have to use `get_multi_pack_index()`, because `prepare_packed_git()` already takes care of populating all data structures for us. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:29 -07:00
Patrick Steinhardt	736bb725eb	packfile: refactor `get_multi_pack_index()` to work on sources The function `get_multi_pack_index()` loads multi-pack indices via `prepare_packed_git()` and then returns the linked list of multi-pack indices that is stored in `struct object_database`. That list is in the process of being removed though in favor of storing the MIDX as part of the object database source it belongs to. Refactor `get_multi_pack_index()` so that it returns the multi-pack index for a single object source. Callers are now expected to call this function for each source they are interested in. This requires them to iterate through alternates, so we have to prepare alternate object sources before doing so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:29 -07:00
Patrick Steinhardt	6567432ab4	midx: stop using linked list when closing MIDX When calling `close_midx()` we not only close the multi-pack index for one object source, but instead we iterate through the whole linked list of MIDXs to close all of them. This linked list is about to go away in favor of using the new per-source pointer to its respective MIDX. Refactor the function to iterate through sources instead. Note that after this patch, there's a couple of callsites left that continue to use `close_midx()` without iterating through all sources. These are all cases where we don't care about the MIDX from other sources though, so it's fine to keep them as-is. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:29 -07:00
Patrick Steinhardt	ec4380f446	packfile: refactor `prepare_packed_git_one()` to work on sources In the preceding commit we refactored how we load multi-pack indices to take a corresponding "source" as input. As part of this refactoring we started to store a pointer to the MIDX in `struct odb_source` itself. Refactor loading of packfiles in the same way: instead of passing in the object directory, we now pass in the source from which we want to load packfiles. This allows us to simplify the code because we don't have to search for a corresponding MIDX anymore, but we can instead directly use the MIDX that we have already prepared beforehand. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:28 -07:00
Patrick Steinhardt	4d8be89d97	midx: start tracking per object database source Multi-pack indices are tracked via `struct multi_pack_index`. This data structure is stored as a linked list inside `struct object_database`, which is the global database that spans across all of the object sources. This layout causes two problems: - Object databases consist of multiple object sources (e.g. one source per alternate object directory), where each multi-pack index is specific to one of those sources. Regardless of that though, the MIDX is not tracked per source, but tracked globally for the whole object database. This creates a mismatch between the on-disk layout and how things are organized in the object database subsystems and makes some parts, like figuring out whether a source has an MIDX, quite awkward. - Multi-pack indices are an implementation detail of how efficient access for packfiles work. As such, they are neither relevant in the context of loose objects, nor in a potential future where we have pluggable backends. Refactor `prepare_multi_pack_index_one()` so that it works on a specific source, which allows us to easily store a pointer to the multi-pack index inside of it. For now, this pointer exists next to the existing linked list we have in the object database. Users will be adjusted in subsequent patches to instead use the per-source pointers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-15 12:07:28 -07:00
Patrick Steinhardt	e989dd96b8	odb: rename `oid_object_info()` Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:37 -07:00
Patrick Steinhardt	c44185f6c1	odb: get rid of `the_repository` when handling alternates The functions to manage alternates all depend on `the_repository`. Refactor them to accept an object database as a parameter and adjust all callers. The functions are renamed accordingly. Note that right now the situation is still somewhat weird because we end up using the object store path provided by the object store's repository anyway. Consequently, we could have instead passed in a pointer to the repository instead of passing in the pointer to the object store. This will be addressed in subsequent commits though, where we will start to use the path owned by the object store itself. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:36 -07:00
Patrick Steinhardt	8f49151763	object-store: rename files to "odb.{c,h}" In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:34 -07:00
Patrick Steinhardt	a1e2581a1e	object-store: rename `object_directory` to `odb_source` The `object_directory` structure is used as an access point for a single object directory like ".git/objects". While the structure isn't yet fully self-contained, the intent is for it to eventually contain all information required to access objects in one specific location. While the name "object directory" is a good fit for now, this will change over time as we continue with the agenda to make pluggable object databases a thing. Eventually, objects may not be accessed via any kind of directory at all anymore, but they could instead be backed by any kind of durable storage mechanism. While it seems quite far-fetched for now, it is thinkable that eventually this might even be some form of a database, for example. As such, the current name of this structure will become worse over time as we evolve into the direction of pluggable ODBs. Immediate next steps will start to carve out proper self-contained object directories, which requires us to pass in these object directories as parameters. Based on our modern naming schema this means that those functions should then be named after their subsystem, which means that we would start to bake the current name into the codebase more and more. Let's preempt this by renaming the structure. There have been a couple alternatives that were discussed: - `odb_backend` was discarded because it led to the association that one object database has a single backend, but the model is that one alternate has one backend. Furthermore, "backend" is more about the actual backing implementation and less about the high-level concept. - `odb_alternate` was discarded because it is a bit of a stretch to also call the main object directory an "alternate". Instead, pick `odb_source` as the new name. It makes it sufficiently clear that there can be multiple sources and does not cause confusion when mixed with the already-existing "alternate" terminology. In the future, this change allows us to easily introduce for example a `odb_files_source` and other format-specific implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:34 -07:00
Patrick Steinhardt	1ace066449	object-store: rename `raw_object_store` to `object_database` The `raw_object_store` structure is the central entry point for reading and writing objects in a repository. The main purpose of this structure is to manage object directories and provide an interface to access and write objects in those object directories. Right now, many of the functions associated with the raw object store implicitly rely on `the_repository` to get access to its `objects` pointer, which is the `raw_object_store`. As we want to generally get rid of using `the_repository` across our codebase we will have to convert this implicit dependency on this global variable into an explicit parameter. This conversion can be done by simply passing in an explicit pointer to a repository and then using its `->objects` pointer. But there is a second effort underway, which is to make the object subsystem more selfcontained so that we can eventually have pluggable object backends. As such, passing in a repository wouldn't make a ton of sense, and the goal is to convert the object store interfaces such that we always pass in a reference to the `raw_object_store` instead. This will expose the `raw_object_store` type to a lot more callers though, which surfaces that this type is named somewhat awkwardly. The "raw_" prefix makes readers wonder whether there is a non-raw variant of the object store, but there isn't. Furthermore, we nowadays want to name functions in a way that they can be clearly attributed to a specific subsystem, but calling them e.g. `raw_object_store_has_object()` is just too unwieldy, even when dropping the "raw_" prefix. Instead, rename the structure to `object_database`. This term is already used a lot throughout our codebase, and it cannot easily be mistaken for "object directories", either. Furthermore, its acronym ODB is already well-known and works well as part of a function's name, like for example `odb_has_object()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:33 -07:00
Junio C Hamano	9a43523dc3	Merge branch 'ps/midx-negative-packfile-cache' When a stale .midx file refers to .pack files that no longer exist, we ended up checking for these non-existent files repeatedly, which has been optimized by memoizing the non-existence. * ps/midx-negative-packfile-cache: midx: stop repeatedly looking up nonexistent packfiles packfile: explain ordering of how we look up auxiliary pack files	2025-05-30 11:59:18 -07:00
Patrick Steinhardt	320572c43d	packfile: explain ordering of how we look up auxiliary pack files When adding a packfile to an object database we perform four syscalls: - Three calls to access(3p) are done to check for auxiliary data structures. - One call to stat(3p) is done to check for the ".pack" itself. One curious bit is that we perform the access(3p) calls before checking for the packfile itself, but if the packfile doesn't exist we discard all results. The access(3p) calls are thus essentially wasted, so one may be triggered to reorder those calls so that we can short-circuit the other syscalls in case the packfile does not exist. The order in which we look up files is quite important though to help avoid races: - When installing a packfile we move auxiliary data structures into place before we install the ".idx" file. - When deleting a packfile we first delete the ".idx" and ".pack" files before deleting auxiliary data structures. As such, to avoid any races with concurrently created or deleted packs we need to make sure that we _first_ read auxiliary data structures before we read the corresponding ".idx" or ".pack" file. Otherwise it may easily happen that we return a populated but misclassified pack. Add a comment to `add_packed_git()` to make future readers aware of this ordering requirement. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-28 07:56:29 -07:00
Jeff King	d2956385a9	oid_object_info(): drop type_name strbuf We provide a mechanism for callers to get the object type as a raw string, rather than an object_type enum. This was in theory useful for returning types that are not representable in the enum, but we consider any such type to be an error, and there are no callers that use the strbuf anymore. Let's drop support to simplify the code a bit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00

1 2 3 4 5 ...

356 Commits (832492c3a85303b04e7197138ff3dcb535e09f00)