kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Jeff King	f86f769550	compute pack .idx byte offsets using size_t A pack and its matching .idx file are limited to 2^32 objects, because the pack format contains a 32-bit field to store the number of objects. Hence we use uint32_t in the code. But the byte count of even a .idx file can be much larger than that, because it stores at least a hash and an offset for each object. So using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650 objects. This confuses load_idx(), which computes the minimum size like this: unsigned long min_size = 8 + 4256 + nr(hashsz + 4 + 4) + hashsz + hashsz; Even though min_size will be big enough on most 64-bit platforms, the actual arithmetic is done as a uint32_t, resulting in a truncation. We actually exceed that min_size, but then we do: unsigned long max_size = min_size; if (nr) max_size += (nr - 1)8; to account for the variable-sized table. That computation doesn't overflow quite so low, but with the truncation for min_size, we end up with a max_size that is much smaller than our actual size. So we complain that the idx is invalid, and can't find any of its objects. We can fix this case by casting "nr" to a size_t, which will do the multiplication in 64-bits (assuming you're on a 64-bit platform; this will never work on a 32-bit system since we couldn't map the whole .idx anyway). Likewise, we don't have to worry about further additions, because adding a smaller number to a size_t will convert the other side to a size_t. A few notes: - obviously we could just declare "nr" as a size_t in the first place (and likewise, packed_git.num_objects). But it's conceptually a uint32_t because of the on-disk format, and we correctly treat it that way in other contexts that don't need to compute byte offsets (e.g., iterating over the set of objects should and generally does use a uint32_t). Switching to size_t would make all of those other cases look wrong. - it could be argued that the proper type is off_t to represent the file offset. But in practice the .idx file must fit within memory, because we mmap the whole thing. And the rest of the code (including the idx_size variable we're comparing against) uses size_t. - we'll add the same cast to the max_size arithmetic line. Even though we're adding to a larger type, which will convert our result, the multiplication is still done as a 32-bit value and can itself overflow. I didn't check this with my test case, since it would need an even larger pack (~530M objects), but looking at compiler output shows that it works this way. The standard should agree, but I couldn't find anything explicit in 6.3.1.8 ("usual arithmetic conversions"). The case in load_idx() was the most immediate one that I was able to trigger. After fixing it, looking up actual objects (including the very last one in sha1 order) works in a test repo with 153,725,110 objects. That's because bsearch_hash() works with uint32_t entry indices, and the actual byte access: int cmp = hashcmp(table + mi stride, sha1); is done with "stride" as a size_t, causing the uint32_t "mi" to be promoted to a size_t. This is the way most code will access the index data. However, I audited all of the other byte-wise accesses of packed_git.index_data, and many of the others are suspect (they are similar to the max_size one, where we are adding to a properly sized offset or directly to a pointer, but the multiplication in the sub-expression can overflow). I didn't trigger any of these in practice, but I believe they're potential problems, and certainly adding in the cast is not going to hurt anything here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	ec6a8f9705	index-pack: make get_base_data() comment clearer A comment mentions that we may free cached delta bases via find_unresolved_deltas(), but that function went away in `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08). Since we need to rewrite that comment anyway, make the entire comment clearer. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jeff King	bebe171947	index-pack: drop type_cas mutex The type_cas lock lost all of its callers in `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08), so we can safely delete it. The compiler didn't alert us that the variable became unused, because we still call pthread_mutex_init() and pthread_mutex_destroy() on it. It's worth considering also whether that commit was in error to remove the use of the lock. Why don't we need it now, if we did before, as described in `ab791dd138` (index-pack: fix race condition with duplicate bases, 2014-08-29)? I think the answer is that we now look at and assign the child_obj->real_type field in the main thread while holding the work_lock(). So we don't have to worry about racing with the worker threads. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jeff King	cea69151a4	index-pack: restore "resolving deltas" progress meter Commit `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08) refactored the main loop in threaded_second_pass(), but also deleted the call to display_progress() at the top of the loop. This means that users typically see no progress at all during the delta resolution phase (and for large repositories, Git appears to hang). This looks like an accident that was unrelated to the intended change of that commit, since we continue to update nr_resolved_deltas in resolve_delta(). Let's restore the call to get that progress back. We'll also add a test that confirms we generate the expected progress. This isn't perfect, as it wouldn't catch a bug where progress was delayed to the end. That was probably possible to trigger when receiving a thin pack, because we'd eventually call display_progress() from fix_unresolved_deltas(), but only once after doing all the work. However, since our test case generates a complete pack, it reliably demonstrates this particular bug and its fix. And we can't do better without making the test racy. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	f08cbf60fe	index-pack: make quantum of work smaller Currently, when index-pack resolves deltas, it does not split up delta trees into threads: each delta base root (an object that is not a REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on that root (direct or indirect) are processed in the same thread. This is a problem when a repository contains a large text file (thus, delta-able) that is modified many times - delta resolution time during fetching is dominated by processing the deltas corresponding to that text file. This patch contains a solution to that. When cloning using git -c core.deltabasecachelimit=1g clone \ https://fuchsia.googlesource.com/third_party/vulkan-cts on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads, which is the default). The solution is to have a global work stack. This stack contains delta bases (objects, whether appearing directly in the packfile or generated by delta resolution, that themselves have delta children) that need to be processed; whenever a thread needs work, it peeks at the top of the stack and processes its next unprocessed child. If a thread finds the stack empty, it will look for more delta base roots to push on the stack instead. The main weakness of having a global work stack is that more time is spent in the mutex, but profiling has shown that most time is spent in the resolution of the deltas themselves, so this shouldn't be an issue in practice. In any case, experimentation (as described in the clone command above) shows that this patch is a net improvement. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	ee6f058384	index-pack: make resolve_delta() assume base data A subsequent commit will make the quantum of work smaller, necessitating more locking. This commit allows resolve_delta() to be called outside the lock. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	b4718cae51	index-pack: calculate {ref,ofs}_{first,last} early This is refactoring 2 of 2 to simplify struct base_data. Whenever we make a struct base_data, immediately calculate its delta children. This eliminates confusion as to when the {ref,ofs}_{first,last} fields are initialized. Before this patch, the delta children were calculated at the last possible moment. This allowed the members of struct base_data to be populated in any order, superficially useful when we have the object contents before the struct object_entry. But this makes reasoning about the state of struct base_data more complicated, hence this patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	a7f7e84a49	index-pack: remove redundant child field This is refactoring 1 of 2 to simplify struct base_data. In index-pack, each thread maintains a doubly-linked list of the delta chain that it is currently processing (the "base" and "child" pointers in struct base_data). When a thread exceeds the delta base cache limit and needs to reclaim memory, it uses the "child" pointers to traverse the lineage, reclaiming the memory of the eldest delta bases first. A subsequent patch will perform memory reclaiming in a different way and will thus no longer need the "child" pointer. Because the "child" pointer is redundant even now, remove it so that the aforementioned subsequent patch will be clearer. In the meantime, reclaim memory in the reverse order of the "base" pointers. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	46e6fb1e44	index-pack: unify threaded and unthreaded code Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	fc968e26c2	index-pack: remove redundant parameter find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but the object type is already present in the name of the function. Remove that parameter from these functions. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jeff King	fbff95b67f	index-pack: adjust default threading cap Commit `b8a2486f15` (index-pack: support multithreaded delta resolving, 2012-05-06) describes an experiment that shows that setting the number of threads for index-pack higher than 3 does not help. I repeated that experiment using a more modern version of Git and a more modern CPU and got different results. Here are timings for p5302 against linux.git run on my laptop, a Core i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16): 5302.3: index-pack 0 threads 256.28(253.41+2.79) 5302.4: index-pack 1 threads 257.03(254.03+2.91) 5302.5: index-pack 2 threads 149.39(268.34+3.06) 5302.6: index-pack 4 threads 94.96(294.10+3.23) 5302.7: index-pack 8 threads 68.12(339.26+3.89) 5302.8: index-pack 16 threads 70.90(655.03+7.21) 5302.9: index-pack default number of threads 116.91(290.05+3.21) You can see that wall-clock times continue to improve dramatically up to the number of cores, but bumping beyond that (into hyperthreading territory) does not help (and in fact hurts a little). Here's the same experiment on a machine with dual Xeon 6230's, totaling 40 cores (80 with hyperthreading): 5302.3: index-pack 0 threads 310.04(302.73+6.90) 5302.4: index-pack 1 threads 310.55(302.68+7.40) 5302.5: index-pack 2 threads 178.17(304.89+8.20) 5302.6: index-pack 5 threads 99.53(315.54+9.56) 5302.7: index-pack 10 threads 72.80(327.37+12.79) 5302.8: index-pack 20 threads 60.68(357.74+21.66) 5302.9: index-pack 40 threads 58.07(454.44+67.96) 5302.10: index-pack 80 threads 59.81(720.45+334.52) 5302.11: index-pack default number of threads 134.18(309.32+7.98) The results are similar; things stop improving at 40 threads. Curiously, going from 20 to 40 really doesn't help much, either (and increases CPU time considerably). So that may represent an actual barrier to parallelism, where we lose out due to context-switching and loss of cache locality, but don't reap the wall-clock benefits due to contention of our coarse-grained locks. So what's a good default value? It's clear that the current cap of 3 is too low; our default values are 42% and 57% slower than the best times on each machine. The results on the 40-core machine imply that 20 threads is an actual barrier regardless of the number of cores, so we'll take that as a maximum. We get the best results on these machines at half of the online-cpus value. That's presumably a result of the hyperthreading. That's common on multi-core Intel processors, but not necessarily elsewhere. But if we take it as an assumption, we can perform optimally on hyperthreaded machines and still do much better than the status quo on other machines, as long as we never half below the current value of 3. So that's what this patch does. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
brian m. carlson	586740aa6e	builtin/index-pack: add option to specify hash algorithm git index-pack is usually run in a repository, but need not be. Since packs don't contains information on the algorithm in use, instead relying on context, add an option to index-pack to tell it which one we're using in case someone runs it outside of a repository. Since using --stdin necessarily implies a repository, don't allow specifying an object format if it's provided to prevent users from passing an option that won't work. Add documentation for this option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
brian m. carlson	629dffc461	packfile: compute and use the index CRC offset Both v2 pack index files and the v3 format specified as part of the NewHash work have similar data starting at the CRC table. Much of the existing code wants to read either this table or the offset entries following it, and in doing so computes the offset each time. In order to share as much code between v2 and v3, compute the offset of the CRC table and store it when the pack is opened. Use this value to compute offsets to not only the CRC table, but to the offset entries beyond it. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	4 years ago
Jonathan Tan	db7ed7418b	promisor-remote: accept 0 as oid_nr in function There are 3 callers to promisor_remote_get_direct() that first check if the number of objects to be fetched is equal to 0. Fold that check into promisor_remote_get_direct(), and in doing so, be explicit as to what promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success, immediately). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
Jeff King	a21781011f	index-pack: downgrade twice-resolved REF_DELTA to die() When we're resolving a REF_DELTA, we compare-and-swap its type from REF_DELTA to whatever real type the base object has, as discussed in `ab791dd138` (index-pack: fix race condition with duplicate bases, 2014-08-29). If the old type wasn't a REF_DELTA, we consider that a BUG(). But as discussed in that commit, we might see this case whenever we try to resolve an object twice, which may happen because we have multiple copies of the base object. So this isn't a bug at all, but rather a sign that the input pack is broken. And indeed, this case is triggered already in t5309.5 and t5309.6, which create packs with delta cycles and duplicate bases. But we never noticed because those tests are marked expect_failure. Those tests were added by `b2ef3d9ebb` (test index-pack on packs with recoverable delta cycles, 2013-08-23), which was leaving the door open for cases that we theoretically _could_ handle. And when we see an already-resolved object like this, in theory we could keep going after confirming that the previously resolved child->real_type matches base->obj->real_type. But: - enforcing the "only resolve once" rule here saves us from an infinite loop in other parts of the code. If we keep going, then the delta cycle in t5309.5 causes us to loop infinitely, as find_ref_delta_children() doesn't realize which objects have already been resolved. So there would be more changes needed to make this case work, and in the meantime we'd be worse off. - any pack that triggers this is broken anyway. It either has a duplicate base object, or it has a cycle which causes us to bring in a duplicate via --fix-thin. In either case, we'd end up rejecting the pack in write_idx_file(), which also detects duplicates. So the tests have little value in documenting what we _could_ be doing (and have been neglected for 6+ years). Let's switch them to confirming that we handle this case cleanly (and switch out the BUG() for a more informative die() so that we do so). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
Matheus Tavares	b98d188581	sha1-file: allow check_object_signature() to handle any repo Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
Matheus Tavares	2dcde20e1c	sha1-file: pass git_hash_algo to hash_object_file() Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
Matheus Tavares	c8123e72f6	streaming: allow open_istream() to handle any repo Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
brian m. carlson	69fa337060	builtin/index-pack: replace sha1_to_hex Since sha1_to_hex is limited to SHA-1, replace it with hash_to_hex so this code works with other algorithms. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
Christian Couder	b14ed5adaf	Use promisor_remote_get_direct() and has_promisor_remote() Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	5 years ago
Jonathan Tan	8a30a1efd1	index-pack: prefetch missing REF_DELTA bases When fetching, the client sends "have" commit IDs indicating that the server does not need to send any object referenced by those commits, reducing network I/O. When the client is a partial clone, the client still sends "have"s in this way, even if it does not have every object referenced by a commit it sent as "have". If a server omits such an object, it is fine: the client could lazily fetch that object before this fetch, and it can still do so after. The issue is when the server sends a thin pack containing an object that is a REF_DELTA against such a missing object: index-pack fails to fix the thin pack. When support for lazily fetching missing objects was added in `8b4c0103a9` ("sha1_file: support lazily fetching missing objects", 2017-12-08), support in index-pack was turned off in the belief that it accesses the repo only to do hash collision checks. However, this is not true: it also needs to access the repo to resolve REF_DELTA bases. Support for lazy fetching should still generally be turned off in index-pack because it is used as part of the lazy fetching process itself (if not, infinite loops may occur), but we do need to fetch the REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that those are REF_DELTA themselves, because we do not send "have" when making such fetches.) To resolve this, prefetch all missing REF_DELTA bases before attempting to resolve them. This both ensures that all bases are attempted to be fetched, and ensures that we make only one request per index-pack invocation, and not one request per missing object. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
SZEDER Gábor	79e3aa6624	index-pack: show progress while checking objects When 'git index-pack' is run by 'git clone', its check_objects() function usually doesn't take long enough to be a concern, but I just run into a situation where it took about a minute or so: I inadvertently put some memory pressure on my tiny laptop while cloning linux.git, and then there was quite a long silence between the "Resolving deltas" and "Checking connectivity" progress bars. Show a progress bar during the loop of check_objects() to let the user know that something is still going on. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	98374a07c9	convert has_sha1_file() callers to has_object_file() The only remaining callers of has_sha1_file() actually have an object_id already. They can use the "object" variant, rather than dereferencing the hash themselves. The code changes here were completely generated by the included coccinelle patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Torsten Bögershausen	ca473cef91	Upcast size_t variables to uintmax_t when printing When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Nguyễn Thái Ngọc Duy	2094c5e582	index-pack: remove #ifdef NO_PTHREADS Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	67947c34ae	convert "hashcmp() != 0" to "!hasheq()" This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	4a7e27e957	convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	6ebd1cafe2	check_replace_refs: rename to read_replace_refs This was added as a NEEDSWORK in `c3c36d7de2` (replace-object: check_replace_refs is safe in multi repo environment, 2018-04-11), waiting for a calmer period. Since doing so now doesn't conflict with anything in 'pu', it seems as good a time as any. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Stefan Beller	da14a7ff99	blob: add repository argument to lookup_blob Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Stefan Beller	1ec5bfd24e	object: add repository argument to parse_object_buffer Add a repository argument to allow the callers of parse_object_buffer to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Junio C Hamano	3737746120	index-pack: correct install_packed_git() args The function does not start taking the repository object as a parameter before v2.18 track. Make the topic mergeable to v2.17 maintenance track by dropping it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	368b4e5906	index-pack: handle --strict checks of non-repo packs Commit `73c3f0f704` (index-pack: check .gitmodules files with --strict, 2018-05-04) added a call to add_packed_git(), with the intent that the newly-indexed objects would be available to the process when we run fsck_finish(). But that's not what add_packed_git() does. It only allocates the struct, and you must install_packed_git() on the result. So that call was effectively doing nothing (except leaking a struct). But wait, we passed all of the tests! Does that mean we don't need the call at all? For normal cases, no. When we run "index-pack --stdin" inside a repository, we write the new pack into the object directory. If fsck_finish() needs to access one of the new objects, then our initial lookup will fail to find it, but we'll follow up by running reprepare_packed_git() and looking again. That logic was meant to handle somebody else repacking simultaneously, but it ends up working for us here. But there is a case that does need this, that we were not testing. You can run "git index-pack foo.pack" on any file, even when it is not inside the object directory. Or you may not even be in a repository at all! This case fails without doing the proper install_packed_git() call. We can make this work by adding the install call. Note that we should be prepared to handle add_packed_git() failing. We can just silently ignore this case, though. If fsck_finish() later needs the objects and they're not available, it will complain itself. And if it doesn't (because we were able to resolve the whole fsck in the first pass), then it actually isn't an interesting error at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Jeff King	73c3f0f704	index-pack: check .gitmodules files with --strict Now that the internal fsck code has all of the plumbing we need, we can start checking incoming .gitmodules files. Naively, it seems like we would just need to add a call to fsck_finish() after we've processed all of the objects. And that would be enough to cover the initial test included here. But there are two extra bits: 1. We currently don't bother calling fsck_object() at all for blobs, since it has traditionally been a noop. We'd actually catch these blobs in fsck_finish() at the end, but it's more efficient to check them when we already have the object loaded in memory. 2. The second pass done by fsck_finish() needs to access the objects, but we're actually indexing the pack in this process. In theory we could give the fsck code a special callback for accessing the in-pack data, but it's actually quite tricky: a. We don't have an internal efficient index mapping oids to packfile offsets. We only generate it on the fly as part of writing out the .idx file. b. We'd still have to reconstruct deltas, which means we'd basically have to replicate all of the reading logic in packfile.c. Instead, let's avoid running fsck_finish() until after we've written out the .idx file, and then just add it to our internal packed_git list. This does mean that the objects are "in the repository" before we finish our fsck checks. But unpack-objects already exhibits this same behavior, and it's an acceptable tradeoff here for the same reason: the quarantine mechanism means that pushes will be fully protected. In addition to a basic push test in t7415, we add a sneaky pack that reverses the usual object order in the pack, requiring that index-pack access the tree and blob during the "finish" step. This already works for unpack-objects (since it will have written out loose objects), but we'll check it with this sneaky pack for good measure. Signed-off-by: Jeff King <peff@peff.net>	7 years ago
Jeff King	db5a58c1bd	index-pack: make fsck error message more specific If fsck reports an error, we say only "Error in object". This isn't quite as bad as it might seem, since the fsck code would have dumped some errors to stderr already. But it might help to give a little more context. The earlier output would not have even mentioned "fsck", and that may be a clue that the "fsck." or ".fsckObjects" config may be relevant. Signed-off-by: Jeff King <peff@peff.net>	7 years ago
Johannes Schindelin	033abf97fc	Replace all die("BUG: ...") calls by BUG() ones In `d8193743e0` (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in `588a538ae5` (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	5d9e198245	index-pack: abstract away hash function constant The code for reading certain pack v2 offsets had a hard-coded 5 representing the number of uint32_t words that we needed to skip over. Specify this value in terms of a value from the_hash_algo. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	0df8e96566	cache.h: add repository argument to oid_object_info Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	fc1395f4a4	sha1_file.c: rename to use dash in file name This is more consistent with the project style. The majority of Git's source files use dashes in preference to underscores in their file names. Signed-off-by: Stefan Beller <sbeller@google.com>	7 years ago
Stefan Beller	d807c4a01d	exec_cmd: rename to use dash in file name This is more consistent with the project style. The majority of Git's source files use dashes in preference to underscores in their file names. Signed-off-by: Stefan Beller <sbeller@google.com>	7 years ago
Derrick Stolee	f2af9f5e02	csum-file: rename hashclose() to finalize_hashfile() The hashclose() method behaves very differently depending on the flags parameter. In particular, the file descriptor is not always closed. Perform a simple rename of "hashclose()" to "finalize_hashfile()" in preparation for functional changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	a80d72db2a	object-store: move packed_git and packed_git_mru to object store In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Jonathan Tan	ffb2c0fe5c	index-pack: support checking objects but not links The index-pack command currently supports the --check-self-contained-and-connected argument, for internal use only, that instructs it to only check for broken links and not broken objects. For partial clones, we need the inverse, so add a --fsck-objects argument that checks for broken objects and not broken links, also for internal use only. This will be used by fetch-pack in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	b4f5aca40e	sha1_file: convert read_sha1_file to struct object_id Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	abef9020e3	sha1_file: convert sha1_object_info* to object_id Convert sha1_object_info and sha1_object_info_extended to take pointers to struct object_id and rename them to use "oid" instead of "sha1" in their names. Update the declaration and definition and apply the following semantic patch, plus the standard object_id transforms: @@ expression E1, E2; @@ - sha1_object_info(E1.hash, E2) + oid_object_info(&E1, E2) @@ expression E1, E2; @@ - sha1_object_info(E1->hash, E2) + oid_object_info(E1, E2) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1.hash, E2, E3) + oid_object_info_extended(&E1, E2, E3) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1->hash, E2, E3) + oid_object_info_extended(E1, E2, E3) Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	ef7b5195f1	streaming: convert open_istream to use struct object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	17e65451e3	sha1_file: convert check_sha1_signature to struct object_id Convert this function to take a pointer to struct object_id and rename it check_object_signature. Introduce temporaries to convert the return values of lookup_replace_object and lookup_replace_object_extended into struct object_id. The temporaries are needed because in order to convert lookup_replace_object, open_istream needs to be converted, and open_istream needs check_sha1_signature to be converted, causing a loop of dependencies. The temporaries will be removed in a future patch. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	af8caf33d5	builtin/index-pack: convert struct ref_delta_entry to object_id Convert this struct to use a member of type object_id. Convert various static functions as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Nguyễn Thái Ngọc Duy	95308d64ce	object.h: update flag allocation comment Since the "flags" is shared, it's a good idea to keep track of who uses what bit. When we need to use more flags in library code, we can be sure it won't be re-used for another purpose by some caller. While at there, fix the location of "5" (should be in a different column than "4" two lines down) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Brandon Williams	debca9d2fe	object: rename function 'typename' to 'type_name' Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	98a3beab6a	csum-file: rename sha1file to hashfile Rename struct sha1file to struct hashfile, along with all of its related functions. The transformation in this commit was made by global search-and-replace. Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago

1 2 3 4 5

225 Commits (71ca53e8125e36efbda17293c50027d31681a41f)