kernel/git - git - PowerEL Git System

Author	SHA1	Message	Date
Jeff King	a52ed76142	fast-import: disallow "feature import-marks" by default As with export-marks in the previous commit, import-marks can access the filesystem. This is significantly less dangerous than export-marks because it only involves reading from arbitrary paths, rather than writing them. However, it could still be surprising and have security implications (e.g., exfiltrating data from a service that accepts fast-import streams). Let's lump it (and its "if-exists" counterpart) in with export-marks, and enable the in-stream version only if --allow-unsafe-features is set. Signed-off-by: Jeff King <peff@peff.net>	5 years ago
Jeff King	68061e3470	fast-import: disallow "feature export-marks" by default The fast-import stream command "feature export-marks=<path>" lets the stream write marks to an arbitrary path. This may be surprising if you are running fast-import against an untrusted input (which otherwise cannot do anything except update Git objects and refs). Let's disallow the use of this feature by default, and provide a command-line option to re-enable it (you can always just use the command-line --export-marks as well, but the in-stream version provides an easy way for exporters to control the process). This is a backwards-incompatible change, since the default is flipping to the new, safer behavior. However, since the main users of the in-stream versions would be import/export-based remote helpers, and since we trust remote helpers already (which are already running arbitrary code), we'll pass the new option by default when reading a remote helper's stream. This should minimize the impact. Note that the implementation isn't totally simple, as we have to work around the fact that fast-import doesn't parse its command-line options until after it has read any "feature" lines from the stream. This is how it lets command-line options override in-stream. But in our case, it's important to parse the new --allow-unsafe-features first. There are three options for resolving this: 1. Do a separate "early" pass over the options. This is easy for us to do because there are no command-line options that allow the "unstuck" form (so there's no chance of us mistaking an argument for an option), though it does introduce a risk of incorrect parsing later (e.g,. if we convert to parse-options). 2. Move the option parsing phase back to the start of the program, but teach the stream-reading code never to override an existing value. This is tricky, because stream "feature" lines override each other (meaning we'd have to start tracking the source for every option). 3. Accept that we might parse a "feature export-marks" line that is forbidden, as long we don't _act_ on it until after we've parsed the command line options. This would, in fact, work with the current code, but only because the previous patch fixed the export-marks parser to avoid touching the filesystem. So while it works, it does carry risk of somebody getting it wrong in the future in a rather subtle and unsafe way. I've gone with option (1) here as simple, safe, and unlikely to cause regressions. This fixes CVE-2019-1348. Signed-off-by: Jeff King <peff@peff.net>	5 years ago
Jeff King	019683025f	fast-import: delay creating leading directories for export-marks When we parse the --export-marks option, we don't immediately open the file, but we do create any leading directories. This can be especially confusing when a command-line option overrides an in-stream one, in which case we'd create the leading directory for the in-stream file, even though we never actually write the file. Let's instead create the directories just before opening the file, which means we'll create only useful directories. Note that this could change the handling of relative paths if we chdir() in between, but we don't actually do so; the only permanent chdir is from setup_git_directory() which runs before either code path (potentially we should take the pre-setup dir into account to avoid surprising the user, but that's an orthogonal change). The test just adapts the existing "override" test to use paths with leading directories. This checks both that the correct directory is created (which worked before but was not tested), and that the overridden one is not (our new fix here). While we're here, let's also check the error result of safe_create_leading_directories(). We'd presumably notice any failure immediately after when we try to open the file itself, but we can give a more specific error message in this case. Signed-off-by: Jeff King <peff@peff.net>	5 years ago
Jeff King	e075dba372	fast-import: stop creating leading directories for import-marks When asked to import marks from "subdir/file.marks", we create the leading directory "subdir" if it doesn't exist. This makes no sense for importing marks, where we only ever open the path for reading. Most of the time this would be a noop, since if the marks file exists, then the leading directories exist, too. But if it doesn't (e.g., because --import-marks-if-exists was used), then we'd create the useless directory. This dates back to `580d5f83e7` (fast-import: always create marks_file directories, 2010-03-29). Even then it was useless, so it seems to have been added in error alongside the --export-marks case (which _is_ helpful). Signed-off-by: Jeff King <peff@peff.net>	5 years ago
Jeff King	11e934d56e	fast-import: tighten parsing of boolean command line options We parse options like "--max-pack-size=" using skip_prefix(), which makes sense to get at the bytes after the "=". However, we also parse "--quiet" and "--stats" with skip_prefix(), which allows things like "--quiet-nonsense" to behave like "--quiet". This was a mistaken conversion in `0f6927c229` (fast-import: put option parsing code in separate functions, 2009-12-04). Let's tighten this to an exact match, which was the original intent. Signed-off-by: Jeff King <peff@peff.net>	5 years ago
Elijah Newren	b8f50e5b60	fast-import: add support for new 'alias' command fast-export and fast-import have nice --import-marks flags which allow for incremental migrations. However, if there is a mark in fast-export's file of marks without a corresponding mark in the one for fast-import, then we run the risk that fast-export tries to send new objects relative to the mark it knows which fast-import does not, causing fast-import to fail. This arises in practice when there is a filter of some sort running between the fast-export and fast-import processes which prunes some commits programmatically. Provide such a filter with the ability to alias pruned commits to their most recent non-pruned ancestor. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	f73b2aba05	fast-import: allow tags to be identified by mark labels Mark identifiers are used in fast-export and fast-import to provide a label to refer to earlier content. Blobs are given labels because they need to be referenced in the commits where they first appear with a given filename, and commits are given labels because they can be the parents of other commits. Tags were never given labels, probably because they were viewed as unnecessary, but that presents two problems: 1. It leaves us without a way of referring to previous tags if we want to create a tag of a tag (or higher nestings). 2. It leaves us with no way of recording that a tag has already been imported when using --export-marks and --import-marks. Fix these problems by allowing an optional mark label for tags. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	3164e6bd24	fast-import: fix handling of deleted tags If our input stream includes a tag which is later deleted, we were not properly deleting it. We did have a step which would delete it, but we left a tag in the tag list noting that it needed to be updated, and the updating of annotated tags occurred AFTER ref deletion. So, when we record that a tag needs to be deleted, also remove it from the list of annotated tags to update. While this has likely been something that has not happened in practice, it will come up more in order to support nested tags. For nested tags, we either need to give temporary names to the intermediate tags and then delete them, or else we need to use the final name for the intermediate tags. If we use the final name for the intermediate tags, then in order to keep the sanity check that someone doesn't try to update the same tag twice, we need to delete the ref after creating the intermediate tag. So, either way nested tags imply the need to delete temporary inner tag references. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	1ebec8dfc1	fast-import: duplicate into history rather than passing ownership Fast-import's read_next_command() has somewhat odd memory ownership semantics for the command_buf strbuf. After reading a command, we copy the strbuf's pointer (without duplicating the string) into our cmd_hist array of recent commands. And then when we're about to read a new command, we clear the strbuf by calling strbuf_detach(), dropping ownership from the strbuf (leaving the cmd_hist reference as the remaining owner). This has a few surprising implications: - if the strbuf hasn't been copied into cmd_hist (e.g., because we haven't ready any commands yet), then the strbuf_detach() will leak the resulting string - any modification to command_buf risks invalidating the pointer held by cmd_hist. There doesn't seem to be any way to trigger this currently (since we tend to modify it only by detaching and reading in a new value), but it's subtly dangerous. - any pointers into an input string will remain valid as long as cmd_hist points to them. So in general, you can point into command_buf.buf and call read_next_command() up to 100 times before your string is cycled out and freed, leaving you with a dangling pointer. This makes it easy to miss bugs during testing, as they might trigger only for a sufficiently large commit (e.g., the bug fixed in the previous commit). Instead, let's make a new string to copy the command into the history array, rather than having dual ownership with the old. Then we can drop the strbuf_detach() calls entirely, and just reuse the same buffer within command_buf over and over. We'd normally have to strbuf_reset() it before using it again, but in both cases here we're using strbuf_getline(), which does it automatically for us. This fixes the leak, and it means that even a single call to read_next_command() will invalidate any held pointers, making it easier to find bugs. In fact, we can drop the extra input lines added to the test case by the previous commit, as the unfixed bug would now trigger just from reading the commit message, even without any modified files in the commit. Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	9756082b3c	fast-import: duplicate parsed encoding string We read each line of the fast-import stream into the command_buf strbuf. When reading a commit, we parse a line like "encoding foo" by storing a pointer to "foo", but not making a copy. We may then read an unbounded number of other lines (e.g., one for each modified file in the commit), each of which writes into command_buf. This works out in practice for small cases, because we hand off ownership of the heap buffer from command_buf to the cmd_hist array, and read new commands into a fresh heap buffer. And thus the pointer to "foo" remains valid as long as there aren't so many intermediate lines that we end up dropping the original "encoding" line from the history. But as the test modification shows, if we go over our default of 100 lines, we end up with our encoding string pointing into freed heap memory. This seems to fail reliably by writing garbage into the output, but running under ASan definitely detects this as a use-after-free. We can fix it by duplicating the encoding value, just as we do for other parsed lines (e.g., an author line ends up in parse_ident, which copies it to a new string). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Nguyễn Thái Ngọc Duy	d3b4705ab8	sha1-file.c: remove the_repo from read_object_with_reference() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
René Scharfe	921d49be86	use COPY_ARRAY for copying arrays Convert calls of memcpy(3) to use COPY_ARRAY, which shortens and simplifies the code a bit. Patch generated by Coccinelle and contrib/coccinelle/array.cocci. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	3edfcc65fd	fast-import: support 'encoding' commit header Since git supports commit messages with an encoding other than UTF-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	cf7b857a77	fast-import: fix erroneous handling of get-mark with empty orphan commits When get-mark was introduced in commit `28c7b1f7b7` ("fast-import: add a get-mark command", 2015-07-01), it followed the precedent of the cat-blob command to be allowed on any line other than in the middle of a data directive; see commit `777f80d742` ("fast-import: Allow cat-blob requests at arbitrary points in stream", 2010-11-28). It was useful to allow cat-blob directives in the middle of a commit to get more data that would be used in writing the current commit object. get-mark is not similarly useful since fast-import can already use either object id or mark. Further, trying to allow this command anywhere caused parsing bugs. Fix the parsing problems by only allowing get-mark commands to appear when other commands have completed. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	7ffde293f2	fast-import: only allow cat-blob requests where it makes sense In commit `777f80d742` ("fast-import: Allow cat-blob requests at arbitrary points in stream", 2010-11-28), fast-import started allowing cat-blob commands to appear on the start of any line except in the middle of a "data" command. It could be in the middle of various directives that were part of a tag command, or in the middle of checkpoints or progresses (each of which allow an optional second empty newline), or even immediately after the mark command of a blob before the data directive appeared (raising the question of what if it used the mark for the blob that just barely appeared in the stream that we do not yet have the data for). None of these locations make any sense as places to put cat-blob requests. The purpose of this change as stated in that commit message was to [save] frontends from having to loop over everything they want to commit in the next commit and cat-ing the necessary objects in advance. However, that can be achieved by simply allowing cat-blob requests to appear whenever a filemodify directive is allowed. Further, it avoids setting a bad precedent for other commands to follow (e.g. get-mark); a precedent which caused parsing problems in corner cases. Technically, inline filemodify directives add a slight wrinkle in that frontends might want to have cat-blob directives appear after the start of the filemodify and before the data directive contained within it. I think it would have been better to disallow such a case (it would be trivial to use cat-blob before the filemodify instead), but since there is evidence this was used, for backwards compatibility let's support that case too. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	5056bb7646	fast-import: check most prominent commands first This is not a very important change, and one that I expect to have no performance impact whatsoever, but reading the code bothered me. The parsing of command types in cmd_main() mostly runs in order of most common to least common commands; sure, it's hard to say for sure what the most common are without some type of study, but it seems fairly clear to mark the original four ("blob", "commit", "tag", "reset") as the most prominent. Indeed, the parsing for most other commands were added to later in the list. However, when "ls" was added, it was stuck near the top of the list, with no rationale for that particular location. Move it down to later to appease my Tourette's-like internal twitching that its former location was causing. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
brian m. carlson	ef479a12bd	fast-import: replace sha1_to_hex Replace the uses of sha1_to_hex in this function with hash_to_hex to allow the use of SHA-256 as well. Rename a variable since it is no longer limited to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
brian m. carlson	28d055bde9	fast-import: make hash-size independent Replace several uses of GIT_SHA1_HEXSZ and 40-based constants with references to the_hash_algo. Update the note handling code here to compute path sizes based on GIT_MAX_RAWSZ as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
brian m. carlson	538b152324	object-store: rename and expand packed_git's sha1 member This member is used to represent the pack checksum of the pack in question. Expand this member to be GIT_MAX_RAWSZ bytes in length so it works with longer hashes and rename it to be "hash" instead of "sha1". This transformation was made with a change to the definition and the following semantic patch: @@ struct packed_git *E1; @@ - E1->sha1 + E1->hash @@ struct packed_git E1; @@ - E1.sha1 + E1.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	a965bb3116	fast-export: add a --show-original-ids option to show original names Knowing the original names (hashes) of commits can sometimes enable post-filtering that would otherwise be difficult or impossible. In particular, the desire to rewrite commit messages which refer to other prior commits (on top of whatever other filtering is being done) is very difficult without knowing the original names of each commit. In addition, knowing the original names (hashes) of blobs can allow filtering by blob-id without requiring re-hashing the content of the blob, and is thus useful as a small optimization. Once we add original ids for both commits and blobs, we may as well add them for tags too for completeness. Perhaps someone will have a use for them. This commit teaches a new --show-original-ids option to fast-export which will make it add a 'original-oid <hash>' line to blob, commits, and tags. It also teaches fast-import to parse (and ignore) such lines. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Elijah Newren	25dd3e4889	fast-import: remove unmaintained duplicate documentation fast-import.c has started with a comment for nine and a half years re-directing the reader to Documentation/git-fast-import.txt for maintained documentation. Instead of leaving the unmaintained documentation in place, just excise it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Torsten Bögershausen	ca473cef91	Upcast size_t variables to uintmax_t when printing When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	6 years ago
Jeff King	9001dc2a74	convert "oidcmp() != 0" to "!oideq()" This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Jeff King	4a7e27e957	convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Derrick Stolee	454ea2e4d7	treewide: use get_all_packs There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Derrick Stolee	6404355657	commit.h: remove method declarations These methods are now declared in commit-reach.h. Remove them from commit.h and add new include statements in all files that require these declarations. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Mike Hommey	9d14ecf39d	fast-import: do not call diff_delta() with empty buffer We know diff_delta() returns NULL, saying "no good delta exists for it", when fed an empty data. Check the length of the data in the caller to avoid such a call. This incidentally reduces the number of attempted deltification we see in the final statistics. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	21e1ee8f4f	commit: add repository argument to lookup_commit_reference_gently Add a repository argument to allow callers of lookup_commit_reference_gently to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Martin Ågren	b227586831	lock_file: make function-local locks non-static Placing `struct lock_file`s on the stack used to be a bad idea, because the temp- and lockfile-machinery would keep a pointer into the struct. But after `076aa2cbd` (tempfile: auto-allocate tempfiles on heap, 2017-09-05), we can safely have lockfiles on the stack. (This applies even if a user returns early, leaving a locked lock behind.) These `struct lock_file`s are local to their respective functions and we can drop their staticness. For good measure, I have inspected these sites and come to believe that they always release the lock, with the possible exception of bailing out using `die()` or `exit()` or by returning from a `cmd_foo()`. As pointed out by Jeff King, it would be bad if someone held on to a `struct lock_file *` for some reason. After some grepping, I agree with his findings: no-one appears to be doing that. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	57a6a500be	packfile: add repository argument to unpack_entry Add a repository argument to allow the callers of unpack_entry to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	0df8e96566	cache.h: add repository argument to oid_object_info Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Jameson Miller	065feab4eb	mem-pool: move reusable parts of memory pool into its own file This moves the reusable parts of the memory pool logic used by fast-import.c into its own file for use by other components. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Jameson Miller	96c47d1466	fast-import: introduce mem_pool type Introduce the mem_pool type which encapsulates all the information necessary to manage a pool of memory. This change moves the existing variables in fast-import used to support the global memory pool to use this structure. It also renames variables that are no longer used by memory pools to reflect their more scoped usage. These changes allow for the multiple instances of a memory pool to exist and be reused outside of fast-import. In a future commit the mem_pool type will be moved to its own file. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Jameson Miller	7dc0656e2f	fast-import: rename mem_pool type to mp_block This is part of a patch series to extract the memory pool logic in fast-import into a more generalized version. The existing mem_pool type maps more closely to a "block of memory" (mp_block) in the more generalized memory pool. This commit renames the mem_pool to mp_block to reduce churn in future patches. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Derrick Stolee	f2af9f5e02	csum-file: rename hashclose() to finalize_hashfile() The hashclose() method behaves very differently depending on the flags parameter. In particular, the file descriptor is not always closed. Perform a simple rename of "hashclose()" to "finalize_hashfile()" in preparation for functional changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Nguyễn Thái Ngọc Duy	464416a2ea	packfile: keep prepare_packed_git() private The reason callers have to call this is to make sure either packed_git or packed_git_mru pointers are initialized since we don't do that by default. Sometimes it's hard to see this connection between where the function is called and where packed_git pointer is used (sometimes in separate functions). Keep this dependency internal because now all access to packed_git and packed_git_mru must go through get_xxx() wrappers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	6fdb4e9f5a	packfile: add repository argument to prepare_packed_git Add a repository argument to allow prepare_packed_git callers to be more specific about which repository to handle. See commit "sha1_file: add repository argument to link_alt_odb_entry" for an explanation of the #define trick. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	5babff16d9	packfile: allow install_packed_git to handle arbitrary repositories This conversion was done without the #define trick used in the earlier series refactoring to have better repository access, because this function is easy to review, as it only has one caller and all lines but the first two are converted. We must not convert 'pack_open_fds' to be a repository specific variable, as it is used to monitor resource usage of the machine that Git executes on. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Stefan Beller	a80d72db2a	object-store: move packed_git and packed_git_mru to object store In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Ramsay Jones	156e1782a8	-Wuninitialized: remove some 'init-self' workarounds The 'self-initialised' variables construct (ie <type> var = var;) has been used to silence gcc '-W[maybe-]uninitialized' warnings. This has, unfortunately, caused MSVC to issue 'uninitialized variable' warnings. Also, using clang static analysis causes complaints about an 'Assigned value is garbage or undefined'. There are six such constructs in the current codebase. Only one of the six causes gcc to issue a '-Wmaybe-uninitialized' warning (which will be addressed elsewhere). The remaining five 'init-self' gcc workarounds are noted below, along with the commit which introduced them: 1. builtin/rev-list.c: 'reaches' and 'all', see commit `457f08a030` ("git-rev-list: add --bisect-vars option.", 2007-03-21). 2. merge-recursive.c:2064 'mrtree', see commit `f120ae2a8e` ("merge- recursive.c: mrtree in merge() is not used before set", 2007-10-29). 3. fast-import.c:3023 'oe', see commit `85c62395b1` ("fast-import: let importers retrieve blobs", 2010-11-28). 4. fast-import.c:3006 'oe', see commit `28c7b1f7b7` ("fast-import: add a get-mark command", 2015-07-01). Remove the 'self-initialised' variable constructs noted above. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	b4f5aca40e	sha1_file: convert read_sha1_file to struct object_id Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	02f0547eaa	sha1_file: convert read_object_with_reference to object_id Convert read_object_with_reference to take pointers to struct object_id. Update the internals of the function accordingly. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	abef9020e3	sha1_file: convert sha1_object_info* to object_id Convert sha1_object_info and sha1_object_info_extended to take pointers to struct object_id and rename them to use "oid" instead of "sha1" in their names. Update the declaration and definition and apply the following semantic patch, plus the standard object_id transforms: @@ expression E1, E2; @@ - sha1_object_info(E1.hash, E2) + oid_object_info(&E1, E2) @@ expression E1, E2; @@ - sha1_object_info(E1->hash, E2) + oid_object_info(E1, E2) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1.hash, E2, E3) + oid_object_info_extended(&E1, E2, E3) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1->hash, E2, E3) + oid_object_info_extended(E1, E2, E3) Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
Brandon Williams	debca9d2fe	object: rename function 'typename' to 'type_name' Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	98a3beab6a	csum-file: rename sha1file to hashfile Rename struct sha1file to struct hashfile, along with all of its related functions. The transformation in this commit was made by global search-and-replace. Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	7f89428d37	fast-import: switch various uses of SHA-1 to the_hash_algo Switch various uses of explicit calls to SHA-1 to use the_hash_algo. Convert various uses of 20 and the GIT_SHA1 constants as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	7 years ago
brian m. carlson	34c290a6fc	refs: convert read_ref and read_ref_full to object_id All but two of the call sites already have parameters using the hash parameter of struct object_id, so convert them to take a pointer to the struct directly. Also convert refs_read_refs_full, the underlying implementation. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	8 years ago
brian m. carlson	89f3bbdd3b	refs: update ref transactions to use struct object_id Update the ref transaction code to use struct object_id. Remove one NULL pointer check which was previously inserted around a dereference; since we now pass a pointer to struct object_id directly through, the code we're calling handles this for us. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	8 years ago
Eric Rannaud	30e215a65c	fast-import: checkpoint: dump branches/tags/marks even if object_count==0 The checkpoint command cycles packfiles if object_count != 0, a sensible test or there would be no pack files to write. Since `820b931012`, the command also dumps branches, tags and marks, but still conditionally. However, it is possible for a command stream to modify refs or create marks without creating any new objects. For example, reset a branch (and keep fast-import running): $ git fast-import reset refs/heads/master from refs/heads/master^ checkpoint but refs/heads/master remains unchanged. Other example: a commit command that re-creates an object that already exists in the object database. The man page also states that checkpoint "updates the refs" and that "placing a progress command immediately after a checkpoint will inform the reader when the checkpoint has been completed and it can safely access the refs that fast-import updated". This wasn't always true without this patch. This fix unconditionally calls dump_{branches,tags,marks}() for all checkpoint commands. dump_branches() and dump_tags() are cheap to call in the case of a no-op. Add tests to t9300 that observe the (non-packfiles) effects of checkpoint. Signed-off-by: Eric Rannaud <e@nanocritical.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	8 years ago
Jeff King	06f46f237a	avoid "write_in_full(fd, buf, len) != len" pattern The return value of write_in_full() is either "-1", or the requested number of bytes[1]. If we make a partial write before seeing an error, we still return -1, not a partial value. This goes back to `f6aa66cb95` (write_in_full: really write in full or return error on disk full., 2007-01-11). So checking anything except "was the return value negative" is pointless. And there are a couple of reasons not to do so: 1. It can do a funny signed/unsigned comparison. If your "len" is signed (e.g., a size_t) then the compiler will promote the "-1" to its unsigned variant. This works out for "!= len" (unless you really were trying to write the maximum size_t bytes), but is a bug if you check "< len" (an example of which was fixed recently in config.c). We should avoid promoting the mental model that you need to check the length at all, so that new sites are not tempted to copy us. 2. Checking for a negative value is shorter to type, especially when the length is an expression. 3. Linus says so. In `d34cf19b89` (Clean up write_in_full() users, 2007-01-11), right after the write_in_full() semantics were changed, he wrote: I really wish every "write_in_full()" user would just check against "<0" now, but this fixes the nasty and stupid ones. Appeals to authority aside, this makes it clear that writing it this way does not have an intentional benefit. It's a historical curiosity that we never bothered to clean up (and which was undoubtedly cargo-culted into new sites). So let's convert these obviously-correct cases (this includes write_str_in_full(), which is just a wrapper for write_in_full()). [1] A careful reader may notice there is one way that write_in_full() can return a different value. If we ask write() to write N bytes and get a return value that is _larger_ than N, we could return a larger total. But besides the fact that this would imply a totally broken version of write(), it would already invoke undefined behavior. Our internal remaining counter is an unsigned size_t, which means that subtracting too many byte will wrap it around to a very large number. So we'll instantly begin reading off the end of the buffer, trying to write gigabytes (or petabytes) of data. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	8 years ago

1 2 3 4 5 ...

509 Commits (65d100c4ddbe83953870be2e08566086e4b1cd3c)