The timestamp we receive is in epoch time, so there's no need for a
timezone parameter to interpret it. The matching show_date() uses "tz"
to show dates in author local time, but relative dates show only the
absolute time difference. The author's location is irrelevant, barring
relativistic effects from using Git close to the speed of light.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return values -1 and -2 from get_oid could mean two different
things, depending on whether they were from an enum returned by
get_tree_entry_follow_symlinks, or from a different code path. This
caused 'dangling' to be printed from a git cat-file in the case of an
ambiguous (-2) result.
Unify the results of get_oid* and get_tree_entry_follow_symlinks to be
one common type, with unambiguous values.
Signed-off-by: David Turner <novalis@novalis.org>
Reported-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds --date=human, which skips the timezone if it matches the
current time-zone, and doesn't print the whole date if that matches (ie
skip printing year for dates that are "this year", but also skip the
whole date itself if it's in the last few days and we can just say what
weekday it was).
For really recent dates (same day), use the relative date stamp, while
for old dates (year doesn't match), don't bother with time and timezone.
Also add 'auto' date mode, which defaults to human if we're using the
pager. So you can do
git config --add log.date auto
and your "git log" commands will show the human-legible format unless
you're scripting things.
Note that this time format still shows the timezone for recent enough
events (but not so recent that they show up as relative dates). You can
combine it with the "-local" suffix to never show timezones for an even
more simplified view.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephen P. Smith <ischis2@cox.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught
git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes
the flag along to index_path(). However, the flags taken by
add_to_index() and the ones taken by index_path() are distinct
namespaces. We cannot take HASH_* flags in add_to_index(), because they
overlap with the ADD_CACHE_* flags we already take (in this case,
HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS).
We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using
it to set HASH_RENORMALIZE within add_to_index(). In order to make it
clear that these two flags come from distinct sets, let's also change
the name "newflags" in the function to "hash_flags".
Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some situations in which we want to store an object ID into
struct object_id without the_hash_algo necessarily being set correctly.
One such case is when cloning a repository, where we must read refs from
the remote side without having a repository from which to read the
preferred algorithm.
In this cases, we may have the_hash_algo set to SHA-1, which is the
default, but read refs into struct object_id that are SHA-256. When
copying these values, we will want to copy them completely, not just the
first 20 bytes. Consequently, make sure that oidcpy copies the maximum
number of bytes at all times, regardless of the setting of
the_hash_algo.
Since oidcpy and hashcpy are no longer functionally identical, remove
the Cocinelle object_id transformations that convert from one into the
other.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This kills the_index dependency in get_oid_with_context() but for
get_oid() and friends, they still assume the_repository (which also
means the_index).
Unfortunately the widespread use of get_oid() will make it hard to
make the conversion now. We probably will add repo_get_oid() at some
point and limit the use of get_oid() in builtin/ instead of forcing
all get_oid() call sites to carry struct repository.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_index() shares the same problem as hold_locked_index(): it
assumes $GIT_DIR/index. Move all call sites to repo_read_index()
instead. read_index_preload() and read_index_unmerged() are also
killed as a consequence.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hold_locked_index() assumes the index path at $GIT_DIR/index. This is
not good for places that take an arbitrary index_state instead of
the_index, which is basically everywhere except builtin/.
Replace it with repo_hold_locked_index(). hold_locked_index() remains
as a wrapper around repo_hold_locked_index() to reduce changes in builtin/
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the open/map/close functions for loose objects that were
recently converted, the functions for parsing the loose object stream
use the name "sha1" and a bare "unsigned char *". Let's fix that so that
unpack_sha1_header() becomes unpack_loose_header(), etc.
These conversions are less clear-cut than the file access functions.
You could argue that the they are parsing Git's canonical object format
(i.e., "type size\0contents", over which we compute the hash), which is
not strictly tied to loose storage. But in practice these functions are
used only for loose objects, and using the term "loose_header" (instead
of "object_header") distinguishes it from the object header found in
packfiles (which contains the same information in a different format).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit abef9020e3 (sha1_file: convert sha1_object_info* to object_id,
2018-03-12) renamed the function to oid_object_info(), but missed some
comments which mention it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When marking cache entries for removal, and later removing them all at
once using 'remove_marked_cache_entries()', cache entries currently
have to be invalidated manually in the cache tree and in the untracked
cache.
Add an invalidate flag to the function. With the flag set, the
function will take care of invalidating the path in the cache tree and
in the untracked cache.
Note that the current callsites already do the invalidation properly
in other places, so we're just passing 0 from there to keep the status
quo.
This will be useful in a subsequent commit.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor out the 'unlink_entry()' function from unpack-trees.c to
entry.c. It will be used in other places as well in subsequent
steps.
As it's no longer a static function, also move the documentation to
the header file to make it more discoverable.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
is_empty_file() can help to refactor a lot of code. This will be very
helpful in porting "git bisect" to C.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dade47c06c (commit-graph: add repo arg to graph readers, 2018-07-11)
brought more independence from the_repository to the commit graph, however
it was not completely independent of the_repository, as the previous
patches show.
To ensure we're not accessing the_repository by accident, we'd ideally
assign NULL to the_repository to trigger a segfault on access.
We currently have a temporary hack in cache.h, which relies on
the_hash_algo (which is a short form of the_repository->hash_algo) to
be set, so we cannot do that. The next best thing is to set all fields of
the_repository to 0, so any accidental access is more likely to be found.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SHA-1 is weak and we need to transition to a new hash function. For
some time, we have referred to this new function as NewHash. Recently,
we decided to pick SHA-256 as NewHash. The reasons behind the choice of
SHA-256 are outlined in the thread starting at [1] and in the commit
history for the hash function transition document.
Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain. Optimize it and restructure it to meet our coding
standards. Pull in the update and final functions from the SHA-1 block
implementation, as we know these function correctly with all compilers.
This implementation is slower than SHA-1, but more performant
implementations will be introduced in future commits.
Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.
Note that with this patch, it is still not possible to switch to using
SHA-256 in Git. Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.
[1] https://public-inbox.org/git/20180609224913.GC38834@genre.crustytoothpaste.net/
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is one place we need the hash algorithm block size: the HMAC code
for push certs. Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 183a638b7d ("hashcmp: assert constant hash size", 2018-08-23), we
modified hashcmp to assert that the hash size was always 20 to help it
optimize and inline calls to memcmp. In a future series, we replaced
many calls to hashcmp and oidcmp with calls to hasheq and oideq to
improve inlining further.
However, we want to support hash algorithms other than SHA-1, namely
SHA-256. When doing so, we must handle the case where these values are
32 bytes long as well as 20. Adjust hashcmp to handle two cases:
20-byte matches, and maximum-size matches. Therefore, when we include
SHA-256, we'll automatically handle it properly, while at the same time
teaching the compiler that there are only two possible options to
consider. This will allow the compiler to write the most efficient
possible code.
Copy similar code into hasheq and perform an identical transformation.
At least with GCC 8.2.0, making hasheq defer to hashcmp when there are
two branches prevents the compiler from inlining the comparison, while
the code in this patch is inlined properly. Add a comment to avoid an
accidental performance regression from well-intentioned refactoring.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer. Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant. Update the
documentation as well.
While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.
We use the variant taking the algorithm structure pointer as the
internal variant, since taking an algorithm pointer is the easiest way
to handle all of the variants in use.
Note that we maintain these functions because there are hashes which
must change based on the hash algorithm in use but are not object IDs
(such as pack checksums).
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the problems with "git checkout" is that it does so many
different things and could confuse people specially when we fail to
handle ambiguation correctly.
One way to help with that is tell the user what sort of operation is
actually carried out. When switching branches, we always print
something unless --quiet, either
- "HEAD is now at ..."
- "Reset branch ..."
- "Already on ..."
- "Switched to and reset ..."
- "Switched to a new branch ..."
- "Switched to branch ..."
Checking out paths however is silent. Print something so that if we
got the user intention wrong, they won't waste too much time to find
that out. For the remaining cases of checkout we now print either
- "Checked out ... paths out of the index"
- "Checked out ... paths out of <abbrev hash>"
Since the purpose of printing this is to help disambiguate. Only do it
when "--" is missing.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Speed up refresh_index() by utilizing preload_index() to do most of the work
spread across multiple threads. This works because most cache entries will
get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when
called from within refresh_index().
On a Windows repo with ~200K files, this drops refresh times from 6.64
seconds to 2.87 seconds for a savings of 57%.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new repo extension is added, worktreeConfig. When it is present:
- Repository config reading by default includes $GIT_DIR/config _and_
$GIT_DIR/config.worktree. "config" file remains shared in multiple
worktree setup.
- The special treatment for core.bare and core.worktree, to stay
effective only in main worktree, is gone. These config settings are
supposed to be in config.worktree.
This extension is most useful in multiple worktree setup because you
now have an option to store per-worktree config (which is either
.git/config.worktree for main worktree, or
.git/worktrees/xx/config.worktree for linked ones).
This extension can be used in single worktree mode, even though it's
pretty much useless (but this can happen after you remove all linked
worktrees and move back to single worktree).
"git config" reads from both "config" and "config.worktree" by default
(i.e. without either --user, --file...) when this extension is
present. Default writes still go to "config", not "config.worktree". A
new option --worktree is added for that (*).
Since a new repo extension is introduced, existing git binaries should
refuse to access to the repo (both from main and linked worktrees). So
they will not misread the config file (i.e. skip the config.worktree
part). They may still accidentally write to the config file anyway if
they use with "git config --file <path>".
This design places a bet on the assumption that the majority of config
variables are shared so it is the default mode. A safer move would be
default writes go to per-worktree file, so that accidental changes are
isolated.
(*) "git config --worktree" points back to "config" file when this
extension is not present and there is only one worktree so that it
works in any both single and multiple worktree setups.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
Consider the following sequence of commands updating the split index
when the shared index contains a racily clean cache entry, i.e. an
entry whose cached stat data matches with the corresponding file in
the worktree and the cached mtime matches that of the index:
echo "cached content" >file
git update-index --split-index --add file
echo "dirty worktree" >file # size stays the same!
# ... wait ...
git update-index --add other-file
Normally, when a non-split index is updated, then do_write_index()
(the function responsible for writing all kinds of indexes, "regular",
split, and shared) recognizes racily clean cache entries, and writes
them with smudged stat data, i.e. with file size set to 0. When
subsequent git commands read the index, they will notice that the
smudged stat data doesn't match with the file in the worktree, and
then go on to check the file's content and notice its dirtiness.
In the above example, however, in the second 'git update-index'
prepare_to_write_split_index() decides which cache entries stored only
in the shared index should be replaced in the new split index. Alas,
this function never looks out for racily clean cache entries, and
since the file's stat data in the worktree hasn't changed since the
shared index was written, it won't be replaced in the new split index.
Consequently, do_write_index() doesn't even get this racily clean
cache entry, and can't smudge its stat data. Subsequent git commands
will then see that the index has more recent mtime than the file and
that the (not smudged) cached stat data still matches with the file in
the worktree, and, ultimately, will erroneously consider the file
clean.
Modify prepare_to_write_split_index() to recognize racily clean cache
entries, and mark them to be added to the split index. Note that
there are two places where it should check raciness: first those cache
entries that are only stored in the shared index, and then those that
have been copied by unpack_trees() from the shared index while it
constructed a new index. This way do_write_index() will get these
racily clean cache entries as well, and will then write them with
smudged stat data to the new split index.
This change makes all tests in 't1701-racy-split-index.sh' pass, so
flip the two 'test_expect_failure' tests to success. Also add the '#'
(as in nr. of trial) to those tests' description that were omitted
when the tests expected failure.
Note that after this change if the index is split when it contains a
racily clean cache entry, then a smudged cache entry will be written
both to the new shared and to the new split indexes. This doesn't
affect regular git commands: as far as they are concerned this is just
an entry in the split index replacing an outdated entry in the shared
index. It did affect a few tests in 't1700-split-index.sh', though,
because they actually check which entries are stored in the split
index; a previous patch in this series has already made the necessary
adjustments in 't1700'. And racily clean cache entries and index
splitting are rare enough to not worry about the resulting duplicated
smudged cache entries, and the additional complexity required to
prevent them is not worth it.
Several tests failed occasionally when the test suite was run with
'GIT_TEST_SPLIT_INDEX=yes'. Here are those that I managed to trace
back to this racy split index problem, starting with those failing
more frequently, with a link to a failing Travis CI build job for
each. The highlighted line [2] shows when the racy file was written,
which is not always in the failing test but in a preceeding setup
test.
t3903-stash.sh:
https://travis-ci.org/git/git/jobs/385542084#L5858
t4024-diff-optimize-common.sh:
https://travis-ci.org/git/git/jobs/386531969#L3174
t4015-diff-whitespace.sh:
https://travis-ci.org/git/git/jobs/360797600#L8215
t2200-add-update.sh:
https://travis-ci.org/git/git/jobs/382543426#L3051
t0090-cache-tree.sh:
https://travis-ci.org/git/git/jobs/416583010#L3679
There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and
others using 'lib-read-tree-m-3way.sh', but I couldn't confirm yet.
[1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
branch 'nd/split-index', 2014-07-16).
[2] Note that those highlighted lines are in the 'after failure' fold,
and your browser might unhelpfully fold it up before you could
take a good look.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a helper function named is_writing_gitmodules_ok() to verify
that the .gitmodules file is safe to write.
The function name follows the scheme of is_staging_gitmodules_ok().
The two symbolic constants GITMODULES_INDEX and GITMODULES_HEAD are used
to get help from the C preprocessor in preventing typos, especially for
future users.
This is in preparation for a future change which teaches git how to read
.gitmodules from the index or from the current branch if the file is not
available in the working tree.
The rationale behind the check is that writing to .gitmodules requires
the file to be present in the working tree, unless a brand new
.gitmodules is being created (in which case the .gitmodules file would
not exist at all: neither in the working tree nor in the index or in the
current branch).
Expose the functionality also via a "submodule-helper config
--check-writeable" command, as git scripts may want to perform the check
before modifying submodules configuration.
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function calls do_diff_cache() which eventually needs to set this
"istate" to unpack_options->src_index [1]. This is an unfortunate fact
that unpack_trees() _will_ update [2] src_index so we can't really pass a
const index_state there. Just remove 'const'.
[1] Right now diff_cache() in diff-lib.c assigns the_index to
src_index. But the plan is to get rid of the_index, so it should
be 'istate' from here that gets assigned to src_index.
[2] Some transient bits in the source index are touched. Optional
extensions can also be removed. But other than that the source
tree should still be valid.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refreshing the index is usually very fast, but it can still take a
long time sometimes. Cold cache is one. Or copying a repo to a new
place (*). It's good to show something to let the user know "git
status" is not hanging, it's just busy doing something.
(*) In this case, all stat info in the index becomes invalid and git
falls back to rehashing all file content to see if there's any
difference between updating stat info in the index. This is quite
expensive. Even with a repo as small as git.git, it takes 3
seconds.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the partner patch to the previous one, but covering
the "hash" variants instead of "oid". Note that our
coccinelle rule is slightly more complex to avoid triggering
the call in hasheq().
I didn't bother to add a new rule to convert:
- hasheq(E1->hash, E2->hash)
+ oideq(E1, E2)
Since these are new functions, there won't be any such
existing callers. And since most of the code is already
using oideq, we're not likely to introduce new ones.
We might still see "!hashcmp(E1->hash, E2->hash)" from topics
in flight. But because our new rule comes after the existing
ones, that should first get converted to "!oidcmp(E1, E2)"
and then to "oideq(E1, E2)".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main comparison functions we provide for comparing
object ids are hashcmp() and oidcmp(). These are more
flexible than a strict equality check, since they also
express ordering. That makes them useful for sorting and
binary searching. However, it also makes them potentially
slower than a strict equality check. Consider this C code,
which is traditionally what our hashcmp has looked like:
#include <string.h>
int hashcmp(const unsigned char *a, const unsigned char *b)
{
return memcmp(a, b, 20);
}
Compiling with "gcc -O2 -S -fverbose-asm", the generated
assembly shows that we actually call memcmp(). But if we
change this to a strict equality check:
return !memcmp(a, b, 20);
we get a faster inline version:
movq (%rdi), %rax # MEM[(void *)a_4(D)], MEM[(void *)a_4(D)]
movq 8(%rdi), %rdx # MEM[(void *)a_4(D)], tmp101
xorq (%rsi), %rax # MEM[(void *)b_5(D)], tmp94
xorq 8(%rsi), %rdx # MEM[(void *)b_5(D)], tmp93
orq %rax, %rdx # tmp94, tmp93
jne .L2 #,
movl 16(%rsi), %eax # MEM[(void *)b_5(D)], tmp104
cmpl %eax, 16(%rdi) # tmp104, MEM[(void *)a_4(D)]
je .L5 #,
Obviously our hashcmp() doesn't include the "!". But because
it's an inline function, optimizing compilers are able to
see "!hashcmp(a,b)" in calling code and take advantage of
this case. So there has been no value thus far in
introducing a more restricted interface for doing strict
equality checks.
But as Git learns about more values for the_hash_algo, our
hashcmp() will grow more complicated and may even delegate
at runtime to functions optimized specifically for that hash
size. That breaks the inline connection we have, and the
compiler will have to assume that the caller really cares
about the sign and magnitude of the memcmp() result, even
though the vast majority don't.
We can solve that by introducing a hasheq() function (and
matching oideq() wrapper), which callers can use to make it
clear that they only care about equality. For now, the
implementation will literally be "!hashcmp()", but it frees
us up later to introduce code optimized specifically for the
equality check.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prior to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
constant size of 20 bytes. Some compilers were able to turn
that into a few quad-word comparisons, which is faster than
actually calling memcmp().
In 509f6f62a4, we started using the_hash_algo->rawsz
instead. Even though this will always be 20, the compiler
doesn't know that while inlining hashcmp() and ends up just
generating a call to memcmp().
Eventually we'll have to deal with multiple hash sizes, but
for the upcoming v2.19, we can restore some of the original
performance by asserting on the size. That gives the
compiler enough information to know that the memcmp will
always be called with a length of 20, and it performs the
same optimization.
Here are numbers for p0001.2 run against linux.git on a few
versions. This is using -O2 with gcc 8.2.0.
Test v2.18.0 v2.19.0-rc0 HEAD
------------------------------------------------------------------------------
0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) -1.0%
You can see that v2.19 is a little slower than v2.18. This
commit ended up slightly faster than v2.18, but there's a
fair bit of run-to-run noise (the generated code in the two
cases is basically the same). This patch does seem to be
consistently 1-2% faster than v2.19.
I tried changing hashcpy(), which was also touched by
509f6f62a4, in the same way, but couldn't measure any
speedup. Which makes sense, at least for this workload. A
traversal of the whole commit graph requires looking up
every entry of every tree via lookup_object(). That's many
multiples of the numbers of objects in the repository (most
of the lookups just return "yes, we already saw that
object").
[jn: verified using "make object.s" that the memcmp call goes away.]
Reported-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff King <peff@peff.net
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".
This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.
In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.
This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'branch_track' feels more closely related to branching, and it is
needed later in branch.h; rather than #include'ing cache.h in branch.h
for this small enum, just move the enum and the external declaration
for git_branch_track to branch.h.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The for_each_loose_object() and for_each_packed_object()
functions are meant to be part of a unified interface: they
use the same set of for_each_object_flags, and it's not
inconceivable that we might one day add a single
for_each_object() wrapper around them.
Let's put them together in a single file, so we can avoid
awkwardness like saying "the flags for this function are
over in cache.h". Moving the loose functions to packfile.h
is silly. Moving the packed functions to cache.h works, but
makes the "cache.h is a kitchen sink" problem worse. The
best place is the recently-created object-store.h, since
these are quite obviously related to object storage.
The for_each_*_in_objdir() functions do not use the same
flags, but they are logically part of the same interface as
for_each_loose_object(), and share callback signatures. So
we'll move those, as well, as they also make sense in
object-store.h.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently iterate over objects within a pack in .idx
order, which uses the object hashes. That means that it
is effectively random with respect to the location of the
object within the pack. If you're going to access the actual
object data, there are two reasons to move linearly through
the pack itself:
1. It improves the locality of access in the packfile. In
the cold-cache case, this may mean fewer disk seeks, or
better usage of disk cache.
2. We store related deltas together in the packfile. Which
means that the delta base cache can operate much more
efficiently if we visit all of those related deltas in
sequence, as the earlier items are likely to still be
in the cache. Whereas if we visit the objects in
random order, our cache entries are much more likely to
have been evicted by unrelated deltas in the meantime.
So in general, if you're going to access the object contents
pack order is generally going to end up more efficient.
But if you're simply generating a list of object names, or
if you're going to end up sorting the result anyway, you're
better off just using the .idx order, as finding the pack
order means generating the in-memory pack-revindex.
According to the numbers in 8b8dfd5132 (pack-revindex:
radix-sort the revindex, 2013-07-11), that takes about 200ms
for linux.git, and 20ms for git.git (those numbers are a few
years old but are still a good ballpark).
That makes it a good optimization for some cases (we can
save tens of seconds in git.git by having good locality of
delta access, for a 20ms cost), but a bad one for others
(e.g., right now "cat-file --batch-all-objects
--batch-check="%(objectname)" is 170ms in git.git, so adding
20ms to that is noticeable).
Hence this patch makes it an optional flag. You can't
actually do any interesting timings yet, as it's not plumbed
through to any user-facing tools like cat-file. That will
come in a later patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already mention the local/alternate behavior of these
functions, but we can help clarify a few other behaviors:
- there's no need to mention LOCAL_ONLY specifically, since
we already reference the flags by type (and as we add
more flags, we don't want to have to mention each)
- clarify that reachability doesn't matter here; this is
all accessible objects
- what ordering/uniqueness guarantees we give
- how pack-specific flags are handled for the loose case
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not wrong to pass our flags in an "unsigned", as we
know it will be at least as large as the enum. However,
using the enum in the declaration makes it more obvious
where to find the list of flags.
While we're here, let's also drop the "extern" noise-words
from the declarations, per our modern coding style.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These flags were split between cache.h and packfile.h,
because some of the flags apply only to packs. However, they
share a single numeric namespace, since both are respected
for the packed variant. Let's make sure they're defined
together so that nobody accidentally adds a new flag in one
location that duplicates the other.
While we're here, let's also put them in an enum (which
helps debugger visibility) and use "(1<<n)" rather than
counting powers of 2 manually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the rewrite of interactive rebase, the sequencer will need to
open the sequence editor to allow the user to edit the todo list.
Instead of duplicating the existing launch_editor() function, this
refactors it to a new function, launch_specified_editor(), which takes
the editor as a parameter, in addition to the path, the buffer and the
environment variables. launch_sequence_editor() is then added to launch
the sequence editor.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it is technically possible, it is confusing. Not only the user,
but also VS Code's intellisense.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was added as a NEEDSWORK in c3c36d7de2 (replace-object:
check_replace_refs is safe in multi repo environment, 2018-04-11),
waiting for a calmer period. Since doing so now doesn't conflict
with anything in 'pu', it seems as good a time as any.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit afc711b8e1 (rename read_replace_refs to
check_replace_refs, 2014-02-18) added a comment mentioning
that check_replace_refs is set in two ways:
- from user intent via --no-replace-objects, etc
- after seeing there are no replace refs to respect
Since c3c36d7de2 (replace-object: check_replace_refs is safe
in multi repo environment, 2018-04-11) the second is no
longer true. Let's drop that part of the comment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a struct repository argument to the functions in commit-graph.h that
read the commit graph. (This commit does not affect functions that write
commit graphs.)
Because the commit graph functions can now read the commit graph of any
repository, the global variable core_commit_graph has been removed.
Instead, the config option core.commitGraph is now read on the first
time in a repository that a commit is attempted to be parsed using its
commit graph.
This commit includes a test that exercises the functionality on an
arbitrary repository that is not the_repository.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of our code has been converted to use struct object_id for object
IDs. However, there are some places that still have not, and there are
a variety of places that compare equivalently sized hashes that are not
object IDs. All of these hashes are artifacts of the internal hash
algorithm in use, and when we switch to NewHash for object storage, all
of these uses will also switch.
Update the hashcpy, hashclr, and hashcmp functions to use the_hash_algo,
since they are used in a variety of places to copy and manipulate
buffers that need to move data into or out of struct object_id. This
has the effect of making the corresponding oid* functions use
the_hash_algo as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git merge-recursive` does a three-way merge between user-specified trees
base, head, and remote. Since the user is allowed to specify head, we can
not necesarily assume that head == HEAD.
Modify index_has_changes() to take an extra argument specifying the tree
to compare against. If NULL, it will compare to HEAD. We then use this
from merge-recursive to make sure we compare to the user-specified head.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>